WorldWideScience

Sample records for automated genome mining

  1. Automated genome mining for natural products

    Directory of Open Access Journals (Sweden)

    Zajkowski James

    2009-06-01

    Full Text Available Abstract Background Discovery of new medicinal agents from natural sources has largely been an adventitious process based on screening of plant and microbial extracts combined with bioassay-guided identification and natural product structure elucidation. Increasingly rapid and more cost-effective genome sequencing technologies coupled with advanced computational power have converged to transform this trend toward a more rational and predictive pursuit. Results We have developed a rapid method of scanning genome sequences for multiple polyketide, nonribosomal peptide, and mixed combination natural products with output in a text format that can be readily converted to two and three dimensional structures using conventional software. Our open-source and web-based program can assemble various small molecules composed of twenty standard amino acids and twenty two other chain-elongation intermediates used in nonribosomal peptide systems, and four acyl-CoA extender units incorporated into polyketides by reading a hidden Markov model of DNA. This process evaluates and selects the substrate specificities along the assembly line of nonribosomal synthetases and modular polyketide synthases. Conclusion Using this approach we have predicted the structures of natural products from a diverse range of bacteria based on a limited number of signature sequences. In accelerating direct DNA to metabolomic analysis, this method bridges the interface between chemists and biologists and enables rapid scanning for compounds with potential therapeutic value.

  2. Automated genome mining of ribosomal peptide natural products

    Energy Technology Data Exchange (ETDEWEB)

    Mohimani, Hosein; Kersten, Roland; Liu, Wei; Wang, Mingxun; Purvine, Samuel O.; Wu, Si; Brewer, Heather M.; Pasa-Tolic, Ljiljana; Bandeira, Nuno; Moore, Bradley S.; Pevzner, Pavel A.; Dorrestein, Pieter C.

    2014-07-31

    Ribosomally synthesized and posttranslationally modified peptides (RiPPs), especially from microbial sources, are a large group of bioactive natural products that are a promising source of new (bio)chemistry and bioactivity (1). In light of exponentially increasing microbial genome databases and improved mass spectrometry (MS)-based metabolomic platforms, there is a need for computational tools that connect natural product genotypes predicted from microbial genome sequences with their corresponding chemotypes from metabolomic datasets. Here, we introduce RiPPquest, a tandem mass spectrometry database search tool for identification of microbial RiPPs and apply it for lanthipeptide discovery. RiPPquest uses genomics to limit search space to the vicinity of RiPP biosynthetic genes and proteomics to analyze extensive peptide modifications and compute p-values of peptide-spectrum matches (PSMs). We highlight RiPPquest by connection of multiple RiPPs from extracts of Streptomyces to their gene clusters and by the discovery of a new class III lanthipeptide, informatipeptin, from Streptomyces viridochromogenes DSM 40736 as the first natural product to be identified in an automated fashion by genome mining. The presented tool is available at cy-clo.ucsd.edu.

  3. Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.

    Directory of Open Access Journals (Sweden)

    Marnix H Medema

    2014-09-01

    Full Text Available Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X.

  4. Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.

    Science.gov (United States)

    Medema, Marnix H; Paalvast, Yared; Nguyen, Don D; Melnik, Alexey; Dorrestein, Pieter C; Takano, Eriko; Breitling, Rainer

    2014-09-01

    Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X. PMID:25188327

  5. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    Science.gov (United States)

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  6. Mine hoist automation and control systems

    Energy Technology Data Exchange (ETDEWEB)

    Cock, M.J.L. [CEGELEC Projects Ltd., Rugby (United Kingdom). Mining Marine and Industrial Drives Division

    1995-06-01

    In the past control systems for mine hoists have used many technologies including analogue control, relays and static logic. The dramatic advances in technology in recent years now means that all control functions can be performed using a distributed microprocessor system which minimises training, and gives superior diagnostic information, provides very high reliability. The modern distributed microprocessor system covers all the needs of a mine hoist, from advanced control through automation sequencing to safety systems and electronic speed distance protection. The safety core remains as a proven dual line relay system, but is enhanced by comprehensive first up and status monitoring. The advantages of a distributed microprocessor control system are outlined. Details are presented of the proven MWS2000 system as applied to a cycloconvertor winder, and on the range of options available, which includes the elimination of all drum driven auxiliary shafts, cam gear units and mechanical speed distance protection. Special control techniques for deep level hoisting are incorporated in the system, including `S` shaped speed control of emergency mechanical brakes to minimise rope stress. Finally, a review is given of the latest developments in control technology, and the implications for future developments in mine hoisting. 10 figs.

  7. Automated control of mine dewatering pumps / Tinus Smith

    OpenAIRE

    Smith, Tinus

    2014-01-01

    Deep gold mines use a vast amount of water for various purposes. After use, the water is pumped back to the surface. This process is energy intensive. The control is traditionally done with manual interventions. The purpose of this study is to investigate the effects of automated control on mine dewatering pumps. Automating mine dewatering pumps may hold a great number of benefits for the client. The benefits include electricity cost savings through load shifting, as well as preventative m...

  8. Genomics Portals: integrative web-platform for mining genomics data

    Directory of Open Access Journals (Sweden)

    Ghosh Krishnendu

    2010-01-01

    Full Text Available Abstract Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc, and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  9. Text mining from ontology learning to automated text processing applications

    CERN Document Server

    Biemann, Chris

    2014-01-01

    This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects

  10. Automated design of genomic Southern blot probes

    Directory of Open Access Journals (Sweden)

    Komiyama Noboru H

    2010-01-01

    experimentally validate a number of these automated designs by Southern blotting. The majority of probes we tested performed well confirming our in silico prediction methodology and the general usefulness of the software for automated genomic Southern probe design. Conclusions Software and supplementary information are freely available at: http://www.genes2cognition.org/software/southern_blot

  11. Comparative genomics using data mining tools

    Indian Academy of Sciences (India)

    Tannistha Nandi; Chandrika B-Rao; Srinivasan Ramachandran

    2002-02-01

    We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis were Methanococcus jannaschii, Haemophilus influenzae and Saccharomyces cerevisiae. We have identified the common and different features between the three genomes in the protein evolution patterns. M. jannaschii has been seen to have a greater number of proteins with more charged amino acids whereas S. cerevisiae has been observed to have a greater number of hydrophilic proteins. Despite the differences in intrinsic compositional characteristics between the proteins from the different genomes we have also identified certain common characteristics. We have carried out exploratory Principal Component Analysis of the multivariate data on the proteins of each organism in an effort to classify the proteins into clusters. Interestingly, we found that most of the proteins in each organism cluster closely together, but there are a few ‘outliers’. We focus on the outliers for the functional investigations, which may aid in revealing any unique features of the biology of the respective organisms.

  12. Automated correction of genome sequence errors

    OpenAIRE

    Gajer, Pawel; Schatz, Michael; Salzberg, Steven L

    2004-01-01

    By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of ov...

  13. Genome Mining for antibiotics biosynthesis pathways with antiSMASH 3

    DEFF Research Database (Denmark)

    Weber, Tilmann; Kim, Hyun Uk; Blin, Kai;

    2014-01-01

    Microorganisms are the most important source of natural products with antimicrobial or antitumor activity. These natural products are the main source for anti-­‐infectives; 80% of antibiotics currently in medical use are derived from this class of compounds. In the past, functional screenings......://antismash.secondarymetabolites.org). antiSMASH3 currently is the most comprehensive automated genome mining platform for natural product biosynthetic pathways. It automatically screens genomic data of bacteria and fungi for the presence of 24 different types of secondary metabolite biosynthetic pathways. For different classes of secondary...

  14. Optimizing wireless LAN for longwall coal mine automation

    Energy Technology Data Exchange (ETDEWEB)

    Hargrave, C.O.; Ralston, J.C.; Hainsworth, D.W. [Exploration & Mining Commonwealth Science & Industrial Research Organisation, Pullenvale, Qld. (Australia)

    2007-01-15

    A significant development in underground longwall coal mining automation has been achieved with the successful implementation of wireless LAN (WLAN) technology for communication on a longwall shearer. WIreless-FIdelity (Wi-Fi) was selected to meet the bandwidth requirements of the underground data network, and several configurations were installed on operating longwalls to evaluate their performance. Although these efforts demonstrated the feasibility of using WLAN technology in longwall operation, it was clear that new research and development was required in order to establish optimal full-face coverage. By undertaking an accurate characterization of the target environment, it has been possible to achieve great improvements in WLAN performance over a nominal Wi-Fi installation. This paper discusses the impact of Fresnel zone obstructions and multipath effects on radio frequency propagation and reports an optimal antenna and system configuration. Many of the lessons learned in the longwall case are immediately applicable to other underground mining operations, particularly wherever there is a high degree of obstruction from mining equipment.

  15. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal Matoq Saeed

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  16. The evolution of genome mining in microbes - a review.

    Science.gov (United States)

    Ziemert, Nadine; Alanjary, Mohammad; Weber, Tilmann

    2016-08-27

    Covering: 2006 to 2016The computational mining of genomes has become an important part in the discovery of novel natural products as drug leads. Thousands of bacterial genome sequences are publically available these days containing an even larger number and diversity of secondary metabolite gene clusters that await linkage to their encoded natural products. With the development of high-throughput sequencing methods and the wealth of DNA data available, a variety of genome mining methods and tools have been developed to guide discovery and characterisation of these compounds. This article reviews the development of these computational approaches during the last decade and shows how the revolution of next generation sequencing methods has led to an evolution of various genome mining approaches, techniques and tools. After a short introduction and brief overview of important milestones, this article will focus on the different approaches of mining genomes for secondary metabolites, from detecting biosynthetic genes to resistance based methods and "evo-mining" strategies including a short evaluation of the impact of the development of genome mining methods and tools on the field of natural products and microbial ecology. PMID:27272205

  17. Chapter 13: Mining Electronic Health Records in the Genomics Era

    OpenAIRE

    Denny, Joshua C.

    2012-01-01

    Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DN...

  18. The MineTool Software Suite: A Novel Data Mining Palette of Tools for Automated Modeling of Space Physics Data

    Science.gov (United States)

    Sipes, T.; Karimabadi, H.; Roberts, A.

    2009-12-01

    We present a new data mining software tool called MineTool for analysis and modeling of space physics data. MineTool is a graphical user interface implementation that merges two data mining algorithms into an easy-to-use software tool: an algorithm for analysis and modeling of static data [Karimabadi et al, 2007] and MineTool-TS, an algorithm for data mining of time series data [Karimabadi et al, 2009]. By virtue of automating the modeling process and model evaluations, MineTool makes data mining and predictive modeling more accessible to non-experts. The software is entirely in Java and freeware. By ranking all inputs as predictors of the outcome before constructing a model, MineTool enables inclusion of only relevant variables as well. The technique aggregates the various stages of model building into a four-step process consisting of (i) data segmentation and sampling, (ii) variable pre-selection and transform generation, (iii) predictive model estimation and validation, and (iv) final model selection. Optimal strategies are chosen for each modeling step. A notable feature of the technique is that the final model is always in closed analytical form rather than “black box” form characteristic of some other techniques. Having the analytical model enables deciphering the importance of various variables to affecting the outcome. MineTool suite also provides capabilities for data preparation for data mining as well as visualization of the datasets. MineTool has successfully been used to develop models for automated detection of flux transfer events (FTEs) at Earth’s magnetopause in the Cluster spacecraft time series data and 3D magnetopause modeling. In this presentation, we demonstrate the ease of use of the software through examples including how it was used in the FTE problem.

  19. Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

    OpenAIRE

    Jordan, Rick; Visweswaran, Shyam; Gopalakrishnan, Vanathi

    2014-01-01

    Background Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology A positive set of abstracts was defined by the terms ‘breast cancer’ and ‘lung cance...

  20. Highlights of recent articles on data mining in genomics & proteomics

    Science.gov (United States)

    This editorial elaborates on investigations consisting of different “OMICS” technologies and their application to biological sciences. In addition, advantages and recent development of the proteomic, genomic and data mining technologies are discussed. This information will be useful to scientists ...

  1. WormBase: methods for data mining and comparative genomics.

    Science.gov (United States)

    Harris, Todd W; Stein, Lincoln D

    2006-01-01

    WormBase is a comprehensive repository for information on Caenorhabditis elegans and related nematodes. Although the primary web-based interface of WormBase (http:// www.wormbase.org/) is familiar to most C. elegans researchers, WormBase also offers powerful data-mining features for addressing questions of comparative genomics, genome structure, and evolution. In this chapter, we focus on data mining at WormBase through the use of flexible web interfaces, custom queries, and scripts. The intended audience includes users wishing to query the database beyond the confines of the web interface or fetch data en masse. No knowledge of programming is necessary or assumed, although users with intermediate skills in the Perl scripting language will be able to utilize additional data-mining approaches. PMID:16988424

  2. AGAPE (Automated Genome Analysis PipelinE for pan-genome analysis of Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Giltae Song

    Full Text Available The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  3. Chapter 13: Mining electronic health records in the genomics era.

    Science.gov (United States)

    Denny, Joshua C

    2012-01-01

    The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This

  4. Chapter 13: Mining electronic health records in the genomics era.

    Directory of Open Access Journals (Sweden)

    Joshua C Denny

    Full Text Available The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped

  5. Digital Coal Mine Integrated Automation System Based on ControlNet

    Institute of Scientific and Technical Information of China (English)

    CHEN Jin-yun; ZHANG Shen; ZUO Wei-ran

    2007-01-01

    A three-layer model for digital communication in a mine is proposed. Two basic platforms are discussed: A uniform transmission network and a uniform data warehouse. An actual, ControlNet based, transmission network platform suitable for the Jining No.3 coal mine is presented. This network is an information superhighway intended to integrate all existing and new automation subsystems. Its standard interface can be used with future subsystems. The network, data structure and management decision-making all employ this uniform hardware and software. This effectively avoids the problems of system and information islands seen in traditional mine-automation systems. The construction of the network provides a stable foundation for digital communication in the Jining No.3 coal mine.

  6. Building predictive models for feature selection in genomic mining

    OpenAIRE

    Figini, Silvia; Giudici, Paolo

    2006-01-01

    Building predictive models for genomic mining requires feature selection, as an essential preliminary step to reduce the large number of variable available. Feature selection is a process to select a subset of features which is the most essential for the intended tasks such as classification, clustering or regression analysis. In gene expression microarray data, being able to select a few genes not only makes data analysis efficient but also helps their biological interpretation. Microarray d...

  7. Automating Knowledge Discovery for Toxicity Prediction Using Jumping Emerging Pattern Mining

    OpenAIRE

    Sherhod, R.; Gillet, V.J.; Judson, P.N.; Vessey, J.D.

    2012-01-01

    : The design of new alerts, that is, collections of structural features observed to result in toxicological activity, can be a slow process and may require significant input from toxicology and chemistry experts. A method has therefore been developed to help automate alert identification by mining descriptions of activating structural features directly from toxicity data sets. The method is based on jumping emerging pattern mining which is applied to a set of toxic and nontoxic compo...

  8. Data mining and the human genome

    Energy Technology Data Exchange (ETDEWEB)

    Abarbanel, Henry [The MITRE Corporation, McLean, VA (US). JASON Program Office; Callan, Curtis [The MITRE Corporation, McLean, VA (US). JASON Program Office; Dally, William [The MITRE Corporation, McLean, VA (US). JASON Program Office; Dyson, Freeman [The MITRE Corporation, McLean, VA (US). JASON Program Office; Hwa, Terence [The MITRE Corporation, McLean, VA (US). JASON Program Office; Koonin, Steven [The MITRE Corporation, McLean, VA (US). JASON Program Office; Levine, Herbert [The MITRE Corporation, McLean, VA (US). JASON Program Office; Rothaus, Oscar [The MITRE Corporation, McLean, VA (US). JASON Program Office; Schwitters, Roy [The MITRE Corporation, McLean, VA (US). JASON Program Office; Stubbs, Christopher [The MITRE Corporation, McLean, VA (US). JASON Program Office; Weinberger, Peter [The MITRE Corporation, McLean, VA (US). JASON Program Office

    2000-01-07

    As genomics research moves from an era of data acquisition to one of both acquisition and interpretation, new methods are required for organizing and prioritizing the data. These methods would allow an initial level of data analysis to be carried out before committing resources to a particular genetic locus. This JASON study sought to delineate the main problems that must be faced in bioinformatics and to identify information technologies that can help to overcome those problems. While the current influx of data greatly exceeds what biologists have experienced in the past, other scientific disciplines and the commercial sector have been handling much larger datasets for many years. Powerful datamining techniques have been developed in other fields that, with appropriate modification, could be applied to the biological sciences.

  9. Automated training for algorithms that learn from genomic data.

    Science.gov (United States)

    Cilingir, Gokcen; Broschat, Shira L

    2015-01-01

    Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable. PMID:25695053

  10. Data mining approaches for information retrieval from genomic databases

    Science.gov (United States)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  11. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  12. DESIGN AND IMPLEMENTATION FOR AUTOMATED NETWORK TROUBLESHOOTING USING DATA MINING

    Directory of Open Access Journals (Sweden)

    Eleni Rozaki

    2015-05-01

    Full Text Available The efficient and effective monitoring of mobile networks is vital given the number of users who rely on such networks and the importance of those networks. The purpose of this paper is to present a monitoring scheme for mobile networks based on the use of rules and decision tree data mining classifiers to upgrade fault detection and handling. The goal is to have optimisation rules that improve anomaly detection. In addition, a monitoring scheme that relies on Bayesian classifiers was also implemented for the purpose of fault isolation and localisation. The data mining techniques described in this paper are intended to allow a system to be trained to actually learn network fault rules. The results of the tests that were conducted allowed for the conclusion that the rules were highly effective to improve network troubleshooting.

  13. DESIGN AND IMPLEMENTATION FOR AUTOMATED NETWORK TROUBLESHOOTING USING DATA MINING

    OpenAIRE

    Eleni Rozaki

    2015-01-01

    The efficient and effective monitoring of mobile networks is vital given the number of users who rely on such networks and the importance of those networks. The purpose of this paper is to present a monitoring scheme for mobile networks based on the use of rules and decision tree data mining classifiers to upgrade fault detection and handling. The goal is to have optimisation rules that improve anomaly detection. In addition, a monitoring scheme that relies on Bayesian classifiers...

  14. Design and implementation for automated network troubleshooting using data mining

    OpenAIRE

    Rozaki, Eleni

    2015-01-01

    The efficient and effective monitoring of mobile networks is vital given the number of users who rely on such networks and the importance of those networks. The purpose of this paper is to present a monitoring scheme for mobile networks based on the use of rules and decision tree data mining classifiers to upgrade fault detection and handling. The goal is to have optimisation rules that improve anomaly detection. In addition, a monitoring scheme that relies on Bayesian classifiers was also im...

  15. Automated Comparative Auditing of NCIT Genomic Roles Using NCBI

    Science.gov (United States)

    Cohen, Barry; Oren, Marc; Min, Hua; Perl, Yehoshua; Halper, Michael

    2008-01-01

    Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT’s Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information’s (NCBI’s) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes plays a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance. PMID:18486558

  16. Amalgamation of Automated Testing and Data Mining: A Novel Approach in Software Testing

    Directory of Open Access Journals (Sweden)

    Sarita Sharma

    2011-09-01

    Full Text Available Software engineering comprehends several disciplines devoted to prevent and remedy malfunctions and to warrant adequate behavior. Testing is a widespread validation approach in industry, but it is still largely ad hoc, expensive, and unpredictably effective. In today's industry, the design of software tests is mostly based on the testers' expertise, while test automation tools are limited to execution of pre-planned tests only. Evaluation of test outputs is also associated with a considerable effort by human testers who often have improper knowledge of the requirements specification. This manual approach to software testing results in heavy losses to the world's economy. This paper proposes the potential use of data mining algorithms for automated induction of functional requirements from execution data. The induced data mining models of tested software can be utilized for recovering missing and incomplete specifications, designing a minimal set of regression tests, and evaluating the correctness of software outputs when testing new, potentially inconsistent releases of the system.

  17. Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit

    OpenAIRE

    Du Toit, Jan Valentine

    2006-01-01

    In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing mod...

  18. Clinic-Genomic Association Mining for Colorectal Cancer Using Publicly Available Datasets

    OpenAIRE

    Fang Liu; Yaning Feng; Zhenye Li; Chao Pan; Yuncong Su; Rui Yang; Liying Song; Huilong Duan; Ning Deng

    2014-01-01

    In recent years, a growing number of researchers began to focus on how to establish associations between clinical and genomic data. However, up to now, there is lack of research mining clinic-genomic associations by comprehensively analysing available gene expression data for a single disease. Colorectal cancer is one of the malignant tumours. A number of genetic syndromes have been proven to be associated with colorectal cancer. This paper presents our research on mining clinic-genomic assoc...

  19. Bioactivity-guided genome mining reveals the lomaiviticin biosynthetic gene cluster in Salinispora tropica

    OpenAIRE

    Kersten, Roland D.; Lane, Amy L.; Nett, Markus; Richter, Taylor K. S.; Duggan, Brendan M.; Dorrestein, Pieter C.; Moore, Bradley S.

    2013-01-01

    The use of genome sequences has become routine in guiding the discovery and identification of microbial natural products and their biosynthetic pathways. In silico prediction of molecular features, such as metabolic building blocks, physico-chemical properties or biological functions, from orphan gene clusters has opened up the characterization of many new chemo- and genotypes in genome mining approaches. Here, we guided our genome mining of two predicted enediyne pathways in Salinispora trop...

  20. Joint Genome Institute's Automation Approach and History

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Simon

    2006-07-05

    Department of Energy/Joint Genome Institute (DOE/JGI) collaborates with DOE national laboratories and community users, to advance genome science in support of the DOE missions of clean bio-energy, carbon cycling, and bioremediation.

  1. Automated Data Mining of A Proprietary Database System for Physician Quality Improvement

    International Nuclear Information System (INIS)

    Purpose: Physician practice quality improvement is a subject of intense national debate. This report describes using a software data acquisition program to mine an existing, commonly used proprietary radiation oncology database to assess physician performance. Methods and Materials: Between 2003 and 2004, a manual analysis was performed of electronic portal image (EPI) review records. Custom software was recently developed to mine the record-and-verify database and the review process of EPI at our institution. In late 2006, a report was developed that allowed for immediate review of physician completeness and speed of EPI review for any prescribed period. Results: The software extracted >46,000 EPIs between 2003 and 2007, providing EPI review status and time to review by each physician. Between 2003 and 2007, the department EPI review improved from 77% to 97% (range, 85.4-100%), with a decrease in the mean time to review from 4.2 days to 2.4 days. The initial intervention in 2003 to 2004 was moderately successful in changing the EPI review patterns; it was not repeated because of the time required to perform it. However, the implementation in 2006 of the automated review tool yielded a profound change in practice. Using the software, the automated chart review required ∼1.5 h for mining and extracting the data for the 4-year period. Conclusion: This study quantified the EPI review process as it evolved during a 4-year period at our institution and found that automation of data retrieval and review simplified and facilitated physician quality improvement

  2. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    Science.gov (United States)

    Cherry, J Michael

    2015-12-01

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided. PMID:26631124

  3. Lunar surface mining for automated acquisition of helium-3: Methods, processes, and equipment

    Science.gov (United States)

    Li, Y. T.; Wittenberg, L. J.

    1992-01-01

    In this paper, several techniques considered for mining and processing the regolith on the lunar surface are presented. These techniques have been proposed and evaluated based primarily on the following criteria: (1) mining operations should be relatively simple; (2) procedures of mineral processing should be few and relatively easy; (3) transferring tonnages of regolith on the Moon should be minimized; (4) operations outside the lunar base should be readily automated; (5) all equipment should be maintainable; and (6) economic benefit should be sufficient for commercial exploitation. The economic benefits are not addressed in this paper; however, the energy benefits have been estimated to be between 250 and 350 times the mining energy. A mobile mining scheme is proposed that meets most of the mining objectives. This concept uses a bucket-wheel excavator for excavating the regolith, several mechanical electrostatic separators for beneficiation of the regolith, a fast-moving fluidized bed reactor to heat the particles, and a palladium diffuser to separate H2 from the other solar wind gases. At the final stage of the miner, the regolith 'tailings' are deposited directly into the ditch behind the miner and cylinders of the valuable solar wind gases are transported to a central gas processing facility. During the production of He-3, large quantities of valuable H2, H2O, CO, CO2, and N2 are produced for utilization at the lunar base. For larger production of He-3 the utilization of multiple-miners is recommended rather than increasing their size. Multiple miners permit operations at more sites and provide redundancy in case of equipment failure.

  4. Automated Data Preparation and Physics Mining Tools for Space Weather Studies (Invited)

    Science.gov (United States)

    Karimabadi, H.; Sipes, T.

    2009-12-01

    Heliophysics is a data centric field which relies heavily on the use of spacecraft data for further advances. The prevalent approach to analysis of spacecraft data is based on visual inspection of data. As a result, the vast majority of the collected data from various missions has gone unexplored. The computer aided algorithmic approach to data analysis as facilitated through data mining techniques are essential for analysis of large data sets and enable discovery of hidden information and patterns in the data. Many data analysis problems in space weather stand to benefit from the application of data mining techniques. Examples include identifying spacecraft charging signatures in plasma detectors, identifying plasma frequency lines in wave spectrograms (and hence density), detecting and classifying substorm infection features, among others (R. Friedel, private communication). Thus while the need for advanced algorithmic approach to data exploration and knowledge discovery is generally recognized by experimentalists, the adoption of such techniques (“data mining”) has been slow. This has been partly due to the steep learning curve of some of the techniques and/or the requirement to have a working knowledge of statistics. Another factor is the existence of a plethora of data mining approaches, and it is often a daunting task for a scientist to determine the appropriate technique. Our goal has been to make such tools accessible to non-experts and remove it from gee-whiz domain to a practical tool that will become part of the standard arsenal of data analysis. To this end, we have developed an automated data mining technique called MineTool. Its first deployment to analysis of Cluster has been very successful (Karimabadi et al., JGR, 114, A06216 , 2009) and this tool is gaining adoption among experimentalists. In this talk, we will provide an overview of this tool, illustrate its use through examples, and discuss future directions of research.

  5. Hal: an Automated Pipeline for Phylogenetic Analyses of Genomic Data

    OpenAIRE

    Robbertse, Barbara; Yoder, Ryan J.; Boyd, Alex; Reeves, John; Spatafora, Joseph W.

    2011-01-01

    The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into ind...

  6. Investigating the Control of Chlorophyll Degradation by Genomic Correlation Mining.

    Science.gov (United States)

    Ghandchi, Frederick P; Caetano-Anolles, Gustavo; Clough, Steven J; Ort, Donald R

    2016-01-01

    Chlorophyll degradation is an intricate process that is critical in a variety of plant tissues at different times during the plant life cycle. Many of the photoactive chlorophyll degradation intermediates are exceptionally cytotoxic necessitating that the pathway be carefully coordinated and regulated. The primary regulatory step in the chlorophyll degradation pathway involves the enzyme pheophorbide a oxygenase (PAO), which oxidizes the chlorophyll intermediate pheophorbide a, that is eventually converted to non-fluorescent chlorophyll catabolites. There is evidence that PAO is differentially regulated across different environmental and developmental conditions with both transcriptional and post-transcriptional components, but the involved regulatory elements are uncertain or unknown. We hypothesized that transcription factors modulate PAO expression across different environmental conditions, such as cold and drought, as well as during developmental transitions to leaf senescence and maturation of green seeds. To test these hypotheses, several sets of Arabidopsis genomic and bioinformatic experiments were investigated and re-analyzed using computational approaches. PAO expression was compared across varied environmental conditions in the three separate datasets using regression modeling and correlation mining to identify gene elements co-expressed with PAO. Their functions were investigated as candidate upstream transcription factors or other regulatory elements that may regulate PAO expression. PAO transcript expression was found to be significantly up-regulated in warm conditions, during leaf senescence, and in drought conditions, and in all three conditions significantly positively correlated with expression of transcription factor Arabidopsis thaliana activating factor 1 (ATAF1), suggesting that ATAF1 is triggered in the plant response to these processes or abiotic stresses and in result up-regulates PAO expression. The proposed regulatory network includes the

  7. Automating the Analysis of Spatial Grids A Practical Guide to Data Mining Geospatial Images for Human & Environmental Applications

    CERN Document Server

    Lakshmanan, Valliappa

    2012-01-01

    The ability to create automated algorithms to process gridded spatial data is increasingly important as remotely sensed datasets increase in volume and frequency. Whether in business, social science, ecology, meteorology or urban planning, the ability to create automated applications to analyze and detect patterns in geospatial data is increasingly important. This book provides students with a foundation in topics of digital image processing and data mining as applied to geospatial datasets. The aim is for readers to be able to devise and implement automated techniques to extract information from spatial grids such as radar, satellite or high-resolution survey imagery.

  8. Mining association rule bases from integrated genomic data and annotations

    OpenAIRE

    Martinez, Ricardo; Pasquier, Nicolas; Pasquier, Claude

    2008-01-01

    International audience During the last decade, several clustering and association rule mining techniques have been applied to identify groups of co-regulated genes in gene expression data. Nowadays, integrating biological knowledge and gene expression data into a single framework has become a major challenge to improve the relevance of mined patterns and simplify their interpretation by the biologists. The GenMiner approach was developed for mining association rules showing gene groups tha...

  9. Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis

    DEFF Research Database (Denmark)

    Bakke, Peter; Carney, Nick; DeLoache, Will;

    2009-01-01

    Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in...... databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology...... and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species...

  10. PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

    Directory of Open Access Journals (Sweden)

    Zhao Xuechun

    2007-02-01

    Full Text Available Abstract Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1 query and target sequence database management, (2 automated high-throughput BLAST searching, (3 indexing and searching of results, (4 filtering results online, (5 managing results of personal interest in favorite categories, (6 automated sequence annotation (such as NCBI NR and ontology-based annotation. PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results

  11. First Science Results from Solar Data Mining Using Automated Feature Detection

    Science.gov (United States)

    Martens, P. C.

    2014-12-01

    The SDO Feature Finding Team (FFT) has produced 16 automated feature tracking modules for data from SDO, LASCO, and ground-based H-alpha observatories. The metadata produced by those modules and others are available from the Heliophysics Events Knowledgebase (HEK) and the Virtual Solar Observatory (VSO). Having metadata available for large amounts of events and phenomena, obtained with consistent detection criteria unlike catalogs produced by human observers, allows researchers to effectively search solar data for patterns. I will show a number of science results obtained recently. Not surprisingly several of the patterns are well known (e.g. flares occur mostly in active regions), but some really surprising new trends have been discovered as well, in at least one case upending scientific consensus. These results show the power and promise that systematic feature recognition and data mining holds for solar physics.

  12. A graph-based algorithm for mining multi-level patterns in genomic data.

    Science.gov (United States)

    Lam, Winnie W M; Chan, Keith C C; Chiu, David K Y; Wong, Andrew K C

    2010-10-01

    Comparative genomics is concerned with the study of genome structure and function of different species. It can provide useful information for the derivation of evolutionary and functional relationships between genomes. Previous work on genome comparison focuses mainly on comparing the entire genomes for visualization without further analysis. As many interesting patterns may exist between genomes and may lead to the discovering of functional gene segments (groups of genes), we propose an algorithm called Multi-Level Genome Comparison Algorithm (MGC) that can be used to facilitate the analysis of genomes at multi-levels during the comparison process to discover sequential and regional consistency in gene segments. Different genomes may have common sub-sequences that differ from each other due to mutations, lateral gene transfers, gene rearrangements, etc., and these sub-sequences are usually not easily identified. Not all the genes can have a perfect one-to-one matching with each other. It is quite possible for one-to-many or many-to-many ambiguous relationships to exist between them. To perform the tasks effectively, MGC takes such ambiguity into consideration during genome comparison by representing genomes in a graph and then make use of a graph mining algorithm called the Multi-Level Attributed Graph Mining Algorithm (MAGMA) to build a hierarchical multi-level graph structure to facilitate genome comparison. To determine the effectiveness of these proposed algorithms, experiments were performed using intra- and inter-species of Microbial genomes. The results show that the proposed algorithms are able to discover multiple level matching patterns that show the similarities and dissimilarities among different genomes, in addition to confirming the specific role of the genes in the genomes. PMID:20981888

  13. Automated quality control for genome wide association studies

    Science.gov (United States)

    Ellingson, Sally R.; Fardo, David W.

    2016-01-01

    This paper provides details on the necessary steps to assess and control data in genome wide association studies (GWAS) using genotype information on a large number of genetic markers for large number of individuals. Due to varied study designs and genotyping platforms between multiple sites/projects as well as potential genotyping errors, it is important to ensure high quality data. Scripts and directions are provided to facilitate others in this process.

  14. Mining and characterization of two amidase signature family amidases from Brevibacterium epidermidis ZJB-07021 by an efficient genome mining approach.

    Science.gov (United States)

    Ruan, Li-Tao; Zheng, Ren-Chao; Zheng, Yu-Guo

    2016-10-01

    Amidases have received increasing attention for their significant potential in the production of valuable carboxylic acids. In this study, two amidases belonging to amidase signature family (BeAmi2 and BeAmi4) were identified and mined from genomic DNA of Brevibacterium epidermidis ZJB-07021 by an efficient strategy combining comparative analysis of genomes and identification of unknown region by high-efficiency thermal asymmetric interlaced PCR (HiTAIL-PCR). The deduced amino acid sequences of BeAmi2 and BeAmi4 showed low identity (derivatives. PMID:27180252

  15. Application of the Deformation Information System for automated analysis and mapping of mining terrain deformations - case study from SW Poland

    Science.gov (United States)

    Blachowski, Jan; Grzempowski, Piotr; Milczarek, Wojciech; Nowacka, Anna

    2015-04-01

    Monitoring, mapping and modelling of mining induced terrain deformations are important tasks for quantifying and minimising threats that arise from underground extraction of useful minerals and affect surface infrastructure, human safety, the environment and security of the mining operation itself. The number of methods and techniques used for monitoring and analysis of mining terrain deformations is wide and expanding with the progress in geographical information technologies. These include for example: terrestrial geodetic measurements, Global Navigation Satellite Systems, remote sensing, GIS based modelling and spatial statistics, finite element method modelling, geological modelling, empirical modelling using e.g. the Knothe theory, artificial neural networks, fuzzy logic calculations and other. The presentation shows the results of numerical modelling and mapping of mining terrain deformations for two cases of underground mining sites in SW Poland, hard coal one (abandoned) and copper ore (active) using the functionalities of the Deformation Information System (DIS) (Blachowski et al, 2014 @ http://meetingorganizer.copernicus.org/EGU2014/EGU2014-7949.pdf). The functionalities of the spatial data modelling module of DIS have been presented and its applications in modelling, mapping and visualising mining terrain deformations based on processing of measurement data (geodetic and GNSS) for these two cases have been characterised and compared. These include, self-developed and implemented in DIS, automation procedures for calculating mining terrain subsidence with different interpolation techniques, calculation of other mining deformation parameters (i.e. tilt, horizontal displacement, horizontal strain and curvature), as well as mapping mining terrain categories based on classification of the values of these parameters as used in Poland. Acknowledgments. This work has been financed from the National Science Centre Project "Development of a numerical method of

  16. An Automated Approach for the Determination of the Seismic Moment Tensor in Mining Environments

    Science.gov (United States)

    Wamboldt, Lawrence R.

    A study was undertaken to evaluate an automated process to invert for seismic moment tensors from seismic data recorded in mining environments. The data for this study was recorded at Nickel Rim South mine, Sudbury, Ontario. The mine has a seismic monitoring system manufactured by ESG Solutions that performs continuous monitoring of seismicity. On average, approximately 400 seismic events are recorded each day. Currently, data are automatically processed by ESG Solution's software suite during acquisition. The automatic processors pick the P- and/or S-wave arrivals, locate the events and solve for certain source parameters, excluding the seismic moment tensor. In order to solve for the moment tensor, data must be manually processed, which is laborious and therefore seldom performed. This research evaluates an automatic seismic moment tensor inversion method and demonstrates some of the difficulties (through inversions of real and synthetic seismic data) of the inversion process. Results using the method are also compared to the inversion method currently available from ESG Solutions, which requires the manual picking of first-motion polarities for every event. As a result of the extensive synthetic testing of the automatic inversion program, as well as the inversion of real seismic data, it is apparent that there are key parameters requiring greater accuracy in order to increase the reliability of the automation. These parameters include the source time function definition, source location (in turn requiring more accurate and precise knowledge of the earth media), arrival time picks and an attenuation model to account for ray-path dependent filtering of the source time function. In order to improve the automatic method three key pieces of research are needed: (1) studying various location algorithms (and the effects of increasing earth model intricacy) and automatic time picking to improve source location methods, (2) studying how the source time pulse can be

  17. Discovery of Defense- and Neuropeptides in Social Ants by Genome-Mining

    OpenAIRE

    Gruber, Christian W.; Markus Muttenthaler

    2012-01-01

    Natural peptides of great number and diversity occur in all organisms, but analyzing their peptidome is often difficult. With natural product drug discovery in mind, we devised a genome-mining approach to identify defense- and neuropeptides in the genomes of social ants from Atta cephalotes (leaf-cutter ant), Camponotus floridanus (carpenter ant) and Harpegnathos saltator (basal genus). Numerous peptide-encoding genes of defense peptides, in particular defensins, and neuropeptides or regulato...

  18. SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.

    Science.gov (United States)

    Wilson, Derek; Pethica, Ralph; Zhou, Yiduo; Talbot, Charles; Vogel, Christine; Madera, Martin; Chothia, Cyrus; Gough, Julian

    2009-01-01

    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site. PMID:19036790

  19. Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies

    Directory of Open Access Journals (Sweden)

    Jia-Fu Chang

    2013-01-01

    Full Text Available Background: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC tissue examination of populations of tumors. Natural language processing (NLP techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. Materials and Methods: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. Results: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. Conclusions: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems.

  20. Ask and Ye Shall Receive? Automated Text Mining of Michigan Capital Facility Finance Bond Election Proposals to Identify Which Topics Are Associated with Bond Passage and Voter Turnout

    Science.gov (United States)

    Bowers, Alex J.; Chen, Jingjing

    2015-01-01

    The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…

  1. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  2. GenMiner: mining informative association rules from genomic data

    OpenAIRE

    Martinez, Ricardo; Pasquier, Nicolas; Pasquier, Claude

    2007-01-01

    International audience GENMINER is a smart adaptation of closed itemsets based association rules extraction to genomic data. It takes advantage of the novel NORDI discretization method and of the JCLOSE algorithm to efficiently generate minimal non-redundant association rules. GENMINER facilitates the integration of numerous sources of biological information such as gene expressions and annotations, and can tacitly integrate qualitative information on biological conditions (age, sex, etc.)....

  3. GroopM: an automated tool for the recovery of population genomes from related metagenomes

    Directory of Open Access Journals (Sweden)

    Michael Imelfort

    2014-09-01

    Full Text Available Metagenomic binning methods that leverage differential population abundances in microbial communities (differential coverage are emerging as a complementary approach to conventional composition-based binning. Here we introduce GroopM, an automated binning tool that primarily uses differential coverage to obtain high fidelity population genomes from related metagenomes. We demonstrate the effectiveness of GroopM using synthetic and real-world metagenomes, and show that GroopM produces results comparable with more time consuming, labor-intensive methods.

  4. Recent advances in genome mining of secondary metabolites in Aspergillus terreus

    Directory of Open Access Journals (Sweden)

    Clay Chia Chun Wang

    2014-12-01

    Full Text Available Filamentous fungi are rich resources of secondary metabolites (SMs with a variety of interesting biological activities. Recent advances in genome sequencing and techniques in genetic manipulation have enabled researchers to study the biosynthetic genes of these SMs. Aspergillus terreus is the well-known producer of lovastatin, a cholesterol-lowering drug. This fungus also produces other SMs, including acetylaranotin, butyrolactones and territram, with interesting bioactivities. This review will cover recent progress in genome mining of SMs identified in this fungus. The identification and characterization of the gene cluster for these SMs, as well as the proposed biosynthetic pathways, will be discussed in depth.

  5. SNP-RFLPing: restriction enzyme mining for SNPs in genomes

    OpenAIRE

    Cheng Yu-Huei; Chang Phei-Lang; Yang Cheng-Hong; Chang Hsueh-Wei; Chuang Li-Yeh

    2006-01-01

    Abstract Background The restriction fragment length polymorphism (RFLP) is a common laboratory method for the genotyping of single nucleotide polymorphisms (SNPs). Here, we describe a web-based software, named SNP-RFLPing, which provides the restriction enzyme for RFLP assays on a batch of SNPs and genes from the human, rat, and mouse genomes. Results Three user-friendly inputs are included: 1) NCBI dbSNP "rs" or "ss" IDs; 2) NCBI Entrez gene ID and HUGO gene name; 3) any formats of SNP-in-se...

  6. A Novel CalB-Type Lipase Discovered by Fungal Genomes Mining

    OpenAIRE

    Vaquero, Maria E.; de Eugenio, Laura I.; Martínez, Maria J.; Jorge Barriuso

    2015-01-01

    The fungus Pseudozyma antarctica produces a lipase (CalB) with broad substrate specificity, stability, high regio- and enantio-selectivity. It is active in non-aqueous organic solvents and at elevated temperatures. Hence, CalB is a robust biocatalyst for chemical conversions on an industrial scale. Here we report the in silico mining of public metagenomes and fungal genomes to discover novel lipases with high homology to CalB. The candidates were selected taking into account homology and cons...

  7. Genome mining offers a new starting point for parasitology research.

    Science.gov (United States)

    Lv, Zhiyue; Wu, Zhongdao; Zhang, Limei; Ji, Pengyu; Cai, Yifeng; Luo, Shiqi; Wang, Hongxi; Li, Hao

    2015-02-01

    Parasites including helminthes, protozoa, and medical arthropod vectors are a major cause of global infectious diseases, affecting one-sixth of the world's population, which are responsible for enormous levels of morbidity and mortality important and remain impediments to economic development especially in tropical countries. Prevalent drug resistance, lack of highly effective and practical vaccines, as well as specific and sensitive diagnostic markers are proving to be challenging problems in parasitic disease control in most parts of the world. The impressive progress recently made in genome-wide analysis of parasites of medical importance, including trematodes of Clonorchis sinensis, Opisthorchis viverrini, Schistosoma haematobium, S. japonicum, and S. mansoni; nematodes of Brugia malayi, Loa loa, Necator americanus, Trichinella spiralis, and Trichuris suis; cestodes of Echinococcus granulosus, E. multilocularis, and Taenia solium; protozoa of Babesia bovis, B. microti, Cryptosporidium hominis, Eimeria falciformis, E. histolytica, Giardia intestinalis, Leishmania braziliensis, L. donovani, L. major, Plasmodium falciparum, P. vivax, Trichomonas vaginalis, Trypanosoma brucei and T. cruzi; and medical arthropod vectors of Aedes aegypti, Anopheles darlingi, A. sinensis, and Culex quinquefasciatus, have been systematically covered in this review for a comprehensive understanding of the genetic information contained in nuclear, mitochondrial, kinetoplast, plastid, or endosymbiotic bacterial genomes of parasites, further valuable insight into parasite-host interactions and development of promising novel drug and vaccine candidates and preferable diagnostic tools, thereby underpinning the prevention and control of parasitic diseases. PMID:25563615

  8. Chapter 10: Mining genome-wide genetic markers.

    Directory of Open Access Journals (Sweden)

    Xiang Zhang

    Full Text Available Genome-wide association study (GWAS aims to discover genetic factors underlying phenotypic traits. The large number of genetic factors poses both computational and statistical challenges. Various computational approaches have been developed for large scale GWAS. In this chapter, we will discuss several widely used computational approaches in GWAS. The following topics will be covered: (1 An introduction to the background of GWAS. (2 The existing computational approaches that are widely used in GWAS. This will cover single-locus, epistasis detection, and machine learning methods that have been recently developed in biology, statistic, and computer science communities. This part will be the main focus of this chapter. (3 The limitations of current approaches and future directions.

  9. metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research.

    Science.gov (United States)

    Lyne, Mike; Smith, Richard N; Lyne, Rachel; Aleksic, Jelena; Hu, Fengyuan; Kalderimis, Alex; Stepan, Radek; Micklem, Gos

    2013-01-01

    Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first

  10. Genome Mining in Sorangium cellulosum So ce56

    Science.gov (United States)

    Ewen, Kerstin Maria; Hannemann, Frank; Khatri, Yogan; Perlova, Olena; Kappl, Reinhard; Krug, Daniel; Hüttermann, Jürgen; Müller, Rolf; Bernhardt, Rita

    2009-01-01

    Myxobacteria, especially members of the genus Sorangium, are known for their biotechnological potential as producers of pharmaceutically valuable secondary metabolites. The biosynthesis of several of those myxobacterial compounds includes cytochrome P450 activity. Although class I cytochrome P450 enzymes occur wide-spread in bacteria and rely on ferredoxins and ferredoxin reductases as essential electron mediators, the study of these proteins is often neglected. Therefore, we decided to search in the Sorangium cellulosum So ce56 genome for putative interaction partners of cytochromes P450. In this work we report the investigation of eight myxobacterial ferredoxins and two ferredoxin reductases with respect to their activity in cytochrome P450 systems. Intriguingly, we found not only one, but two ferredoxins whose ability to sustain an endogenous So ce56 cytochrome P450 was demonstrated by CYP260A1-dependent conversion of nootkatone. Moreover, we could demonstrate that the two ferredoxins were able to receive electrons from both ferredoxin reductases. These findings indicate that S. cellulosum can alternate between different electron transport pathways to sustain cytochrome P450 activity. PMID:19696019

  11. Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.

    Science.gov (United States)

    Bolser, Dan; Staines, Daniel M; Pritchard, Emily; Kersey, Paul

    2016-01-01

    Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for a growing number of sequenced plant species (currently 33). Data provided includes genome sequence, gene models, functional annotation, and polymorphic loci. Various additional information are provided for variation data, including population structure, individual genotypes, linkage, and phenotype data. In each release, comparative analyses are performed on whole genome and protein sequences, and genome alignments and gene trees are made available that show the implied evolutionary history of each gene family. Access to the data is provided through a genome browser incorporating many specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These access routes are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests, and pollinators.Ensembl Plants is updated 4-5 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.org ). PMID:26519403

  12. A Framework: Cluster Detection and Multidimensional Visualization of Automated Data Mining Using Intelligent Agents

    Directory of Open Access Journals (Sweden)

    R. Jayabrabu

    2012-01-01

    Full Text Available Data Mining techniques plays a vital role like extraction of required knowledge, finding unsuspected information to make strategic decision in a novel way which in term understandable by domain experts. A generalized frame work is proposed by considering non – domain experts during mining process for better understanding, making better decision and better finding new patters in case of selecting suitable data mining techniques based on the user profile by means of intelligent agents.

  13. Automated realtime detection of mining induced seismicity in the Ruhr coal mining district, Germany, using master waveforms

    Science.gov (United States)

    Fischer, Kasper D.; Wlecklik, Dennis; Friederich, Wolfgang; Wehling-Benatelli, Sebastian

    2016-04-01

    The exploitation of the subsurface by mining, geothermal or petroleum production causes seismic events in the surrounding areas. Shallow focal depths can lead to perceptible ground motions in densely populated areas and in rare cases to damages even for small events (magnitude smaller than 3.5). Thus, the monitoring of this kind of activities is necessary and increasingly requested by governmental agencies. A reliable detection and localisation of small events generally requires a dense and therefore expensive local seismic station network. At the end of 2014 and beginning of 2015, a dense seismic network of 12 stations was set up as a test case in the area of the black coal mine Prosper-Haniel in the Ruhr district, Germany. This network was capable of detecting almost 400 events within 4 weeks. A cluster analysis identified 135 events of magnitude -0.7 or higher, which could be located. This cluster analysis was also used to construct master events for running a real-time single-station cross-correlation detector in the Seiscomp3 software. The results of the real-time cross-correlation detector are compared to the results of the cluster analysis with respect to the number, magnitudes and locations of the events. This two-step monitoring of the source area provides a cost efficient way for long term monitoring of the mining activity.

  14. Automated Data Integration, Cleaning and Analysis Using Data Mining and SPSS Tool For Technical School in Malaysia

    Directory of Open Access Journals (Sweden)

    Tajul Rosli Razak

    2012-08-01

    Full Text Available Students’ performance plays major role in determining the quality of our education system. Sijil Pelajaran Malaysia (SPM is a public examination compulsory to be taken by Form 5 students in Malaysia. The performance gap is not only a school and classroom issue but also a national issue that must be addressed properly. This study aims to integrate, clean and analysis through automated data mining techniques. Using data mining techniques is one of the processes of transferring raw data from current educational system to meaningful information that can be used to help the school community to make a right decision to achieve much better results. This proved DM provides means to assist both educators and students, and improve the quality of education. The result and findings in the study show that automated system will give the same result compare with manual system of integration and analysis and also could be used by the management to make faster and more efficient decision in order to map or plan efficient teaching approach for students in the future.

  15. Perspective: NanoMine: A material genome approach for polymer nanocomposites analysis and design

    Science.gov (United States)

    Zhao, He; Li, Xiaolin; Zhang, Yichi; Schadler, Linda S.; Chen, Wei; Brinson, L. Catherine

    2016-05-01

    Polymer nanocomposites are a designer class of materials where nanoscale particles, functional chemistry, and polymer resin combine to provide materials with unprecedented combinations of physical properties. In this paper, we introduce NanoMine, a data-driven web-based platform for analysis and design of polymer nanocomposite systems under the material genome concept. This open data resource strives to curate experimental and computational data on nanocomposite processing, structure, and properties, as well as to provide analysis and modeling tools that leverage curated data for material property prediction and design. With a continuously expanding dataset and toolkit, NanoMine encourages community feedback and input to construct a sustainable infrastructure that benefits nanocomposite material research and development.

  16. Genome mining reveals unlocked bioactive potential of marine Gram-negative bacteria

    DEFF Research Database (Denmark)

    Machado, Henrique; Sonnenschein, Eva; Melchiorsen, Jette;

    2015-01-01

    - and Gammaproteobacteria collected during the Galathea 3 expedition were sequenced and mined for natural product encoding gene clusters. Results: Independently of genome size, bacteria of all tested genera carried a large number of clusters encoding different potential bioactivities, especially within the Vibrionaceae...... and Pseudoalteromonas species that commonly live in close association with eukaryotic organisms in the environment. Chitin regulation by the ChiS histidine-kinase seems to be a general trait of the Vibrionaceae family, however it is absent in the Pseudomonadaceae. Hence, the degree to which chitin influences secondary...

  17. An Integrated Metabolomic and Genomic Mining Workflow to Uncover the Biosynthetic Potential of Bacteria

    DEFF Research Database (Denmark)

    Månsson, Maria; Vynne, Nikolaj Grønnegaard; Klitgaard, Andreas;

    2016-01-01

    Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in...... bacteria and mine the associated chemical diversity. Thirteen strains closely related to Pseudoalteromonas luteoviolacea isolated from all over the Earth were analyzed using an untargeted metabolomics strategy, and metabolomic profiles were correlated with whole-genome sequences of the strains. We found......, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes....

  18. Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining To Discover Web-Based Scholarly Research Works.

    Science.gov (United States)

    Nicholson, Scott

    2003-01-01

    Discusses quality issues regarding Web sites and describes research that created an intelligent agent for automated collection development in a digital academic library setting, which uses a predictive model based on facets of each Web page to select scholarly works. Describes the use of bibliomining, or data mining for libraries. (Author/LRW)

  19. Discovery of pentangular polyphenols hexaricins A-C from marine Streptosporangium sp. CGMCC 4.7309 by genome mining.

    Science.gov (United States)

    Tian, Jun; Chen, Haiyan; Guo, Zhengyan; Liu, Ning; Li, Jine; Huang, Ying; Xiang, Wensheng; Chen, Yihua

    2016-05-01

    Many novel microbial nature products were discovered from Actinobacteria by genome mining methods. However, only a few number of genome mining works were carried out in rare actinomycetes. An important reason precluding the genome mining efforts in rare actinomycetes is that most of them are recalcitrant to genetic manipulation. Herein, we chose the rare marine actinomycete Streptosporangium sp. CGMCC 4.7309 to explore its secondary metabolite diversity by genome mining. The genetic manipulation method has never been established for Streptosporangium strains. At first, we set up the genetic system of Streptosporangium sp. CGMCC 4.7309 unprecedentedly. The draft genome sequencing of Streptosporangium sp. CGMCC 4.7309 revealed that it contains more than 20 cryptic secondary metabolite biosynthetic clusters. A type II polyketide synthases-containing cluster (the hex cluster) was predicted to encode compounds with a pentangular polyphenol scaffold by in silico analysis. The products of the hex cluster were uncovered by comparing the metabolic profile of Streptosporangium sp. CGMCC 4.7309 with that of the hex30 inactivated mutant, in which a key ketoreductase gene was disrupted. Finally, three pentangular polyphenols were isolated and named as hexaricins A (1), B (2), and C (3). The inconsistency of the stereochemistry of C-15 in hexaricins A, B, and C indicates a branch point in their biosynthesis. Finally, the biosynthetic pathway of the hexaricins was proposed based on bioinformatics analysis. PMID:26754814

  20. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

    Science.gov (United States)

    2014-01-01

    Background Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. Results We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. Conclusions The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of

  1. Automated analysis for large amount gaseous fission product gamma-scanning spectra from nuclear power plant and its data mining

    International Nuclear Information System (INIS)

    Based on the Linssi database and UniSampo/Shaman software, an automated analysis platform has been setup for the analysis of large amounts of gamma-spectra from the primary coolant monitoring systems of a CANDU reactor. Thus, a database inventory of gaseous and volatile fission products in the primary coolant of a CANDU reactor has been established. This database is comprised of 15,000 spectra of radioisotope analysis records. Records from the database inventory were retrieved by a specifically designed data-mining module and subjected to further analysis. Results from the analysis were subsequently used to identify the reactor coolant half-life of 135Xe and 133Xe, as well as the correlations of 135Xe and 88Kr activities. (author)

  2. Mining clinical attributes of genomic variants through assisted literature curation in Egas.

    Science.gov (United States)

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M; Mort, Matthew; Cooper, David N; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/. PMID:27278817

  3. Data Mining on Survival Prediction after Chemotherapy for Diffuse Large-B-Cell Lymphoma and Genomics of Metastasis Cancer

    Directory of Open Access Journals (Sweden)

    Shen Lu

    2014-12-01

    Full Text Available This research pertains to the applications of data mining of microarray databases for large-B-cell Lymphoma and metastasis cancer, the latter of which little has been known about the genomic events that regulate the transformation of a tumor into a metastatic phenotype.

  4. Draft Genome Sequence of Plant Growth-Promoting Rhizobium Mesorhizobium amorphae, Isolated from Zinc-Lead Mine Tailings

    OpenAIRE

    Hao, Xiuli; Lin, Yanbing; Johnstone, Laurel; Baltrus, David A; Miller, Susan J.; Wei, Gehong; Rensing, Christopher

    2012-01-01

    Here, we describe the draft genome sequence of Mesorhizobium amorphae strain CCNWGS0123, isolated from nodules of Robinia pseudoacacia growing on zinc-lead mine tailings. A large number of metal(loid) resistance genes, as well as genes reported to promote plant growth, were identified, presenting a great future potential for aiding phytoremediation in metal(loid)-contaminated soil.

  5. Draft Genome Sequence of Plant Growth-Promoting Rhizobium Mesorhizobium amorphae, Isolated from Zinc-Lead Mine Tailings

    Science.gov (United States)

    Hao, Xiuli; Lin, Yanbing; Johnstone, Laurel; Baltrus, David A.; Miller, Susan J.

    2012-01-01

    Here, we describe the draft genome sequence of Mesorhizobium amorphae strain CCNWGS0123, isolated from nodules of Robinia pseudoacacia growing on zinc-lead mine tailings. A large number of metal(loid) resistance genes, as well as genes reported to promote plant growth, were identified, presenting a great future potential for aiding phytoremediation in metal(loid)-contaminated soil. PMID:22247533

  6. Mining human Behaviors: automated behavioral Analysis from small to big Data

    OpenAIRE

    Staiano, Jacopo

    2014-01-01

    This research thesis aims to address complex problems in Human Behavior Understanding from a computational standpoint: to develop novel methods for enabling machines to capture not only what their sensors are perceiving but also how and why the situation they are presented with is evolving in a certain manner. Touching several fields, from Computer Vision to Social Psychology through Natural Language Processing and Data Mining, we will move from more to less constrained scenarios, descr...

  7. Metro Maps of Plant Disease Dynamics—Automated Mining of Differences Using Hyperspectral Images

    OpenAIRE

    Wahabzada, Mirwaes; Mahlein, Anne-Katrin; Bauckhage, Christian; Steiner, Ulrike; Oerke, Erich-Christian; Kersting, Kristian

    2015-01-01

    Understanding the response dynamics of plants to biotic stress is essential to improve management practices and breeding strategies of crops and thus to proceed towards a more sustainable agriculture in the coming decades. In this context, hyperspectral imaging offers a particularly promising approach since it provides non-destructive measurements of plants correlated with internal structure and biochemical compounds. In this paper, we present a cascade of data mining techniques for fast and ...

  8. Improved processing-string fusion-approach investigation for automated sea-mine classification in shallow water

    Science.gov (United States)

    Aridgides, Tom; Fernandez, Manuel F.; Dobeck, Gerald J.

    2004-09-01

    An improved sea mine computer-aided-detection/computer-aided-classification (CAD/CAC) processing string has been developed. This robust automated processing string involves the fusion of the outputs of unique mine classification algorithms. The overall CAD/CAC processing string consists of pre-processing, adaptive clutter filtering (ACF), normalization, detection, feature extraction, optimal subset feature selection, feature orthogonalization, classification and fusion processing blocks. The range-dimension ACF is matched both to average highlight and shadow information, while also adaptively suppressing background clutter. For each detected object, features are extracted and processed through an orthogonalization transformation, enabling an efficient application of the optimal log-likelihood-ratio-test (LLRT) classification rule, in the orthogonal feature space domain. The classified objects of 4 distinct processing strings are fused using the classification confidence values as features and "M-out-of-N", or LLRT-based fusion rules. The utility of the overall processing strings and their fusion was demonstrated with new shallow water high-resolution sonar imagery data. The processing string detection and classification parameters were tuned and the string classification performance was optimized, by appropriately selecting a subset of the original feature set. Two significant improvements were made to the CAD/CAC processing string by employing sub-image adaptive clutter filtering (SACF) and utilizing a repeated application of the subset feature selection/feature orthogonalization/LLRT classification blocks. It was shown that LLRT-based fusion of the CAD/CAC processing strings outperforms the "M-out-of-N" algorithms and results in up to a seven-fold false alarm rate reduction, compared to the best single CAD/CAC processing string results, while maintaining a high correct mine classification probability. Alternately, the fusion of the processing strings enabled

  9. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.

    Science.gov (United States)

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2016-01-01

    Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/. PMID:27067514

  10. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  11. Discovery of defense- and neuropeptides in social ants by genome-mining.

    Directory of Open Access Journals (Sweden)

    Christian W Gruber

    Full Text Available Natural peptides of great number and diversity occur in all organisms, but analyzing their peptidome is often difficult. With natural product drug discovery in mind, we devised a genome-mining approach to identify defense- and neuropeptides in the genomes of social ants from Atta cephalotes (leaf-cutter ant, Camponotus floridanus (carpenter ant and Harpegnathos saltator (basal genus. Numerous peptide-encoding genes of defense peptides, in particular defensins, and neuropeptides or regulatory peptide hormones, such as allatostatins and tachykinins, were identified and analyzed. Most interestingly we annotated genes that encode oxytocin/vasopressin-related peptides (inotocins and their putative receptors. This is the first piece of evidence for the existence of this nonapeptide hormone system in ants (Formicidae and supports recent findings in Tribolium castaneum (red flour beetle and Nasonia vitripennis (parasitoid wasp, and therefore its confinement to some basal holometabolous insects. By contrast, the absence of the inotocin hormone system in Apis mellifera (honeybee, another closely-related member of the eusocial Hymenoptera clade, establishes the basis for future studies on the molecular evolution and physiological function of oxytocin/vasopressin-related peptides (vasotocin nonapeptide family and their receptors in social insects. Particularly the identification of ant inotocin and defensin peptide sequences will provide a basis for future pharmacological characterization in the quest for potent and selective lead compounds of therapeutic value.

  12. Discovery of defense- and neuropeptides in social ants by genome-mining.

    Science.gov (United States)

    Gruber, Christian W; Muttenthaler, Markus

    2012-01-01

    Natural peptides of great number and diversity occur in all organisms, but analyzing their peptidome is often difficult. With natural product drug discovery in mind, we devised a genome-mining approach to identify defense- and neuropeptides in the genomes of social ants from Atta cephalotes (leaf-cutter ant), Camponotus floridanus (carpenter ant) and Harpegnathos saltator (basal genus). Numerous peptide-encoding genes of defense peptides, in particular defensins, and neuropeptides or regulatory peptide hormones, such as allatostatins and tachykinins, were identified and analyzed. Most interestingly we annotated genes that encode oxytocin/vasopressin-related peptides (inotocins) and their putative receptors. This is the first piece of evidence for the existence of this nonapeptide hormone system in ants (Formicidae) and supports recent findings in Tribolium castaneum (red flour beetle) and Nasonia vitripennis (parasitoid wasp), and therefore its confinement to some basal holometabolous insects. By contrast, the absence of the inotocin hormone system in Apis mellifera (honeybee), another closely-related member of the eusocial Hymenoptera clade, establishes the basis for future studies on the molecular evolution and physiological function of oxytocin/vasopressin-related peptides (vasotocin nonapeptide family) and their receptors in social insects. Particularly the identification of ant inotocin and defensin peptide sequences will provide a basis for future pharmacological characterization in the quest for potent and selective lead compounds of therapeutic value. PMID:22448224

  13. Computational Mining and Genome Wide Distribution of Microsatellite in Fusarium oxysporum f. sp. lycopersici

    Directory of Open Access Journals (Sweden)

    Sudheer KUMAR

    2012-11-01

    Full Text Available Simple sequence repeat (SSR is currently the most preferred molecular marker system owing to their highly desirable properties viz., abundance, hyper-variability, and suitability for high-throughput analysis. Hence, in present study an attempt was made to mine and analyze microsatellite dynamics in whole genome of Fusarium oxysporum f. sp. lycopersici. The distribution pattern of different SSR motifs provides the evidence of greater accumulation of tetra-nucleotide (3837 repeats followed by tri-nucleotide (3367 repeats. Maximum frequency distribution in coding region was shown by mono-nucleotide SSR motifs (34.8%, where as minimum frequency is observed for penta-nucleotide SSR (0.87%. Highest relative abundance (1023 SSR/Mb and density of SSRs (114.46 bp/Mb were observed on chromosome 1, while least density of SSR motifs was recorded on chromosome 11 (7.40 bp/Mb and 12 (7.41 bp/Mb, respectively. Maximum trinucleotide (34.24% motifs code for glutamic acid (GAA while GT/CT were the most frequent repeat of dinucleotide SSRs. Most common and highly repeated SSR motifs were identified as (A64, (T48, (GT24, (GAA31, (TTTC24, (TTTCT28 and (AACCAG27. Overall, the generated information may serve as baseline information for developing SSR markers that could find applications in genomic analysis of F. oxysporum f. sp. lycopersici for better understanding of evolution, diversity analysis, population genetics, race identification and acquisition of new virulence.

  14. Automated integration of genomic physical mapping data via parallel simulated annealing

    Energy Technology Data Exchange (ETDEWEB)

    Slezak, T.

    1994-06-01

    The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via simulated annealing performed on a network of 40+ Unix machines in parallel, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the parallel net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.

  15. O-miner: an integrative platform for automated analysis and mining of -omics data

    OpenAIRE

    Cutts, Rosalind J.; Dayem Ullah, Abu Z; Sangaralingam, Ajanthah; Gadaleta, Emanuela; Lemoine, Nicholas R.; Chelala, Claude

    2012-01-01

    High-throughput profiling has generated massive amounts of data across basic, clinical and translational research fields. However, open source comprehensive web tools for analysing data obtained from different platforms and technologies are still lacking. To fill this gap and the unmet computational needs of ongoing research projects, we developed O-miner, a rapid, comprehensive, efficient web tool that covers all the steps required for the analysis of both transcriptomic and genomic data sta...

  16. RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome.

    Science.gov (United States)

    Thongjuea, Supat; Ruanjaichon, Vinitchan; Bruskiewich, Richard; Vanavichit, Apichart

    2009-01-01

    RiceGeneThresher is a public online resource for mining genes underlying genome regions of interest or quantitative trait loci (QTL) in rice genome. It is a compendium of rice genomic resources consisting of genetic markers, genome annotation, expressed sequence tags (ESTs), protein domains, gene ontology, plant stress-responsive genes, metabolic pathways and prediction of protein-protein interactions. RiceGeneThresher system integrates these diverse data sources and provides powerful web-based applications, and flexible tools for delivering customized set of biological data on rice. Its system supports whole-genome gene mining for QTL by querying using DNA marker intervals or genomic loci. RiceGeneThresher provides biologically supported evidences that are essential for targeting groups or networks of genes involved in controlling traits underlying QTL. Users can use it to discover and to assign the most promising candidate genes in preparation for the further gene function validation analysis. The web-based application is freely available at http://rice.kps.ku.ac.th. PMID:18820292

  17. Automated physician order recommendations and outcome predictions by data-mining electronic medical records.

    Science.gov (United States)

    Chen, Jonathan H; Altman, Russ B

    2014-01-01

    The meaningful use of electronic medical records (EMR) will come from effective clinical decision support (CDS) applied to physician orders, the concrete manifestation of clinical decision making. CDS development is currently limited by a top-down approach, requiring manual production and limited end-user awareness. A statistical data-mining alternative automatically extracts expertise as association statistics from structured EMR data (>5.4M data elements from >19K inpatient encounters). This powers an order recommendation system analogous to commercial systems (e.g., Amazon.com's "Customers who bought this…"). Compared to a standard benchmark, the association method improves order prediction precision from 26% to 37% (psystem also predicts clinical outcomes, such as 30 day mortality and 1 week ICU intervention, with ROC AUC of 0.88 and 0.78 respectively, comparable to state-of-the-art prognosis scores. PMID:25717414

  18. Recent processing string and fusion algorithm improvements for automated sea mine classification in shallow water

    Science.gov (United States)

    Aridgides, Tom; Fernandez, Manuel F.; Dobeck, Gerald J.

    2003-09-01

    A novel sea mine computer-aided-detection / computer-aided-classification (CAD/CAC) processing string has been developed. The overall CAD/CAC processing string consists of pre-processing, adaptive clutter filtering (ACF), normalization, detection, feature extraction, feature orthogonalization, optimal subset feature selection, classification and fusion processing blocks. The range-dimension ACF is matched both to average highlight and shadow information, while also adaptively suppressing background clutter. For each detected object, features are extracted and processed through an orthogonalization transformation, enabling an efficient application of the optimal log-likelihood-ratio-test (LLRT) classification rule, in the orthogonal feature space domain. The classified objects of 4 distinct processing strings are fused using the classification confidence values as features and logic-based, "M-out-of-N", or LLRT-based fusion rules. The utility of the overall processing strings and their fusion was demonstrated with new shallow water high-resolution sonar imagery data. The processing string detection and classification parameters were tuned and the string classification performance was optimized, by appropriately selecting a subset of the original feature set. A significant improvement was made to the CAD/CAC processing string by utilizing a repeated application of the subset feature selection / LLRT classification blocks. It was shown that LLRT-based fusion algorithms outperform the logic based and the "M-out-of-N" ones. The LLRT-based fusion of the CAD/CAC processing strings resulted in up to a nine-fold false alarm rate reduction, compared to the best single CAD/CAC processing string results, while maintaining a constant correct mine classification probability.

  19. Volterra fusion of processing strings for automated sea mine classification in shallow water

    Science.gov (United States)

    Aridgides, Tom; Fernandez, Manuel; Dobeck, Gerald j.

    2005-06-01

    An improved sea mine computer-aided-detection / computer-aided-classification (CAD/CAC) processing string has been developed. The overall CAD/CAC processing string consists of pre-processing, adaptive clutter filtering (ACF), normalization, detection, feature extraction, optimal subset feature selection, feature orthogonalization, classification and fusion processing blocks. The range-dimension ACF is matched both to average highlight and shadow information, while also adaptively suppressing background clutter. For each detected object, features are extracted and processed through an orthogonalization transformation, enabling an efficient application of the optimal log-likelihood-ratio-test (LLRT) classification rule, in the orthogonal feature space domain. The classified objects of 4 distinct processing strings are fused using the classification confidence values as features and either "M-out-of-N" or LLRT-based fusion rules. The utility of the overall processing strings and their fusion was demonstrated with new shallow water high-resolution sonar imagery data. The processing string detection and classification parameters were tuned and the string classification performance was optimized, by appropriately selecting a subset of the original feature set. Two significant improvements were made to the CAD/CAC processing string by employing sub-image adaptive clutter filtering (SACF) and utilizing a repeated application of the subset feature selection / feature orthogonalization / LLRT classification blocks. A new nonlinear (Volterra) feature LLRT fusion algorithm was developed. It was shown that this Volterra feature LLRT fusion of the CAD/CAC processing strings outperforms the "M-out-of-N" and baseline LLRT algorithms, yielding significant improvements over the best single CAD/CAC processing string results, and providing the capability to correctly call all mine targets while maintaining a very low false alarm rate.

  20. Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products

    NARCIS (Netherlands)

    Medema, Marnix; Paalvast, Yared; Nguyen, D.D.; Melnik, A.; Dorrestein, P.C.; Takano, Eriko; Breitling, Rainer

    2014-01-01

    Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strateg

  1. EST Pipeline System: Detailed and Automated EST Data Processing and Mining

    Institute of Scientific and Technical Information of China (English)

    Hao Xu; Liang Zhang; Hong Yu; Yan Zhou; Ling He; Yuanzhong Zhu; Wei Huang; Lijun Fang; Lin Tao; Yuedong Zhu; Lin Cai; Huayong Xu

    2003-01-01

    Expressed sequence tags (ESTs) are widely used in gene survey research these years. The EST Pipeline System, software developed by Hangzhou Genomics Institute (HGI), can automatically analyze different scalar EST sequences by suitable methods. All the analysis reports, including those of vector masking, sequence assembly, gene annotation, Gene Ontology classification, and some other analyses,can be browsed and searched as well as downloaded in the Excel format from the web interface, saving research efforts from routine data processing for biological rules embedded in the data.

  2. Automated gamma knife radiosurgery treatment planning with image registration, data-mining, and Nelder-Mead simplex optimization

    International Nuclear Information System (INIS)

    Gamma knife treatments are usually planned manually, requiring much expertise and time. We describe a new, fully automatic method of treatment planning. The treatment volume to be planned is first compared with a database of past treatments to find volumes closely matching in size and shape. The treatment parameters of the closest matches are used as starting points for the new treatment plan. Further optimization is performed with the Nelder-Mead simplex method: the coordinates and weight of the isocenters are allowed to vary until a maximally conformal plan specific to the new treatment volume is found. The method was tested on a randomly selected set of 10 acoustic neuromas and 10 meningiomas. Typically, matching a new volume took under 30 seconds. The time for simplex optimization, on a 3 GHz Xeon processor, ranged from under a minute for small volumes (30 000 cubic mm,>20 isocenters). In 8/10 acoustic neuromas and 8/10 meningiomas, the automatic method found plans with conformation number equal or better than that of the manual plan. In 4/10 acoustic neuromas and 5/10 meningiomas, both overtreatment and undertreatment ratios were equal or better in automated plans. In conclusion, data-mining of past treatments can be used to derive starting parameters for treatment planning. These parameters can then be computer optimized to give good plans automatically

  3. The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes

    OpenAIRE

    Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N.; Perna, Nicole T; Tisserat, Ned; Leach, Jan E.; Lévesque, C. André; Buell, C. Robin

    2011-01-01

    The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from th...

  4. High-frequency, long-duration water sampling in acid mine drainage studies: a short review of current methods and recent advances in automated water samplers

    Science.gov (United States)

    Chapin, Thomas

    2015-01-01

    Hand-collected grab samples are the most common water sampling method but using grab sampling to monitor temporally variable aquatic processes such as diel metal cycling or episodic events is rarely feasible or cost-effective. Currently available automated samplers are a proven, widely used technology and typically collect up to 24 samples during a deployment. However, these automated samplers are not well suited for long-term sampling in remote areas or in freezing conditions. There is a critical need for low-cost, long-duration, high-frequency water sampling technology to improve our understanding of the geochemical response to temporally variable processes. This review article will examine recent developments in automated water sampler technology and utilize selected field data from acid mine drainage studies to illustrate the utility of high-frequency, long-duration water sampling.

  5. Metro maps of plant disease dynamics--automated mining of differences using hyperspectral images.

    Directory of Open Access Journals (Sweden)

    Mirwaes Wahabzada

    Full Text Available Understanding the response dynamics of plants to biotic stress is essential to improve management practices and breeding strategies of crops and thus to proceed towards a more sustainable agriculture in the coming decades. In this context, hyperspectral imaging offers a particularly promising approach since it provides non-destructive measurements of plants correlated with internal structure and biochemical compounds. In this paper, we present a cascade of data mining techniques for fast and reliable data-driven sketching of complex hyperspectral dynamics in plant science and plant phenotyping. To achieve this, we build on top of a recent linear time matrix factorization technique, called Simplex Volume Maximization, in order to automatically discover archetypal hyperspectral signatures that are characteristic for particular diseases. The methods were applied on a data set of barley leaves (Hordeum vulgare diseased with foliar plant pathogens Pyrenophora teres, Puccinia hordei and Blumeria graminis hordei. Towards more intuitive visualizations of plant disease dynamics, we use the archetypal signatures to create structured summaries that are inspired by metro maps, i.e. schematic diagrams of public transport networks. Metro maps of plant disease dynamics produced on several real-world data sets conform to plant physiological knowledge and explicitly illustrate the interaction between diseases and plants. Most importantly, they provide an abstract and interpretable view on plant disease progression.

  6. Metro maps of plant disease dynamics--automated mining of differences using hyperspectral images.

    Science.gov (United States)

    Wahabzada, Mirwaes; Mahlein, Anne-Katrin; Bauckhage, Christian; Steiner, Ulrike; Oerke, Erich-Christian; Kersting, Kristian

    2015-01-01

    Understanding the response dynamics of plants to biotic stress is essential to improve management practices and breeding strategies of crops and thus to proceed towards a more sustainable agriculture in the coming decades. In this context, hyperspectral imaging offers a particularly promising approach since it provides non-destructive measurements of plants correlated with internal structure and biochemical compounds. In this paper, we present a cascade of data mining techniques for fast and reliable data-driven sketching of complex hyperspectral dynamics in plant science and plant phenotyping. To achieve this, we build on top of a recent linear time matrix factorization technique, called Simplex Volume Maximization, in order to automatically discover archetypal hyperspectral signatures that are characteristic for particular diseases. The methods were applied on a data set of barley leaves (Hordeum vulgare) diseased with foliar plant pathogens Pyrenophora teres, Puccinia hordei and Blumeria graminis hordei. Towards more intuitive visualizations of plant disease dynamics, we use the archetypal signatures to create structured summaries that are inspired by metro maps, i.e. schematic diagrams of public transport networks. Metro maps of plant disease dynamics produced on several real-world data sets conform to plant physiological knowledge and explicitly illustrate the interaction between diseases and plants. Most importantly, they provide an abstract and interpretable view on plant disease progression. PMID:25621489

  7. Integration of Automated Decision Support Systems with Data Mining Abstract: A Client Perspective

    Directory of Open Access Journals (Sweden)

    Abdullah Saad AL-Malaise

    2013-03-01

    Full Text Available Customer’s behavior and satisfaction are always play important role to increase organization’s growth and market value. Customers are on top priority for the growing organization to build up their businesses. In this paper presents the architecture of Decision Support Systems (DSS in connection to deal with the customer’s enquiries and requests. Main purpose behind the proposed model is to enhance the customer’s satisfaction and behavior using DSS. We proposed model by extension in traditional DSS concepts with integration of Data Mining (DM abstract. The model presented in this paper shows the comprehensive architecture to work on the customer requests using DSS and knowledge management (KM for improving the customer’s behavior and satisfaction. Furthermore, DM abstract provides more methods and techniques; to understand the contacted customer’s data, to classify the replied answers in number of classes, and to generate association between the same type of queries, and finally to maintain the KM for future correspondence.

  8. Future planning and evaluation for automated adaptive minehunting: a roadmap for mine countermeasures theory modernization

    Science.gov (United States)

    Garcia, Gregory A.; Wettergren, Thomas A.

    2012-06-01

    This paper presents a discussion of U.S. naval mine countermeasures (MCM) theory modernization in light of advances in the areas of autonomy, tactics, and sensor processing. The unifying theme spanning these research areas concerns the capability for in situ adaptation of processing algorithms, plans, and vehicle behaviors enabled through run-time situation assessment and performance estimation. Independently, each of these technology developments impact the MCM Measures of Effectiveness1 [MOE(s)] of time and risk by improving one or more associated Measures of Performance2 [MOP(s)]; the contribution of this paper is to outline an integrated strategy for realizing the cumulative benefits of these technology enablers to the United States Navy's minehunting capability. An introduction to the MCM problem is provided to frame the importance of the foundational research and the ramifications of the proposed strategy on the MIW community. We then include an overview of current and future adaptive capability research in the aforementioned areas, highlighting a departure from the existing rigid assumption-based approaches while identifying anticipated technology acceptance issues. Consequently, the paper describes an incremental strategy for transitioning from the current minehunting paradigm where tactical decision aids rely on a priori intelligence and there is little to no in situ adaptation or feedback to a future vision where unmanned systems3, equipped with a representation of the commander's intent, are afforded the authority and ability to adapt to environmental perturbations with minimal human-in-the-loop supervision. The discussion concludes with an articulation of the science and technology issues which the MCM research community must continue to address.

  9. Mining metagenomic whole genome sequences revealed subdominant but constant Lactobacillus population in the human gut microbiota.

    Science.gov (United States)

    Rossi, Maddalena; Martínez-Martínez, Daniel; Amaretti, Alberto; Ulrici, Alessandro; Raimondi, Stefano; Moya, Andrés

    2016-06-01

    The genus Lactobacillus includes over 215 species that colonize plants, foods, sewage and the gastrointestinal tract (GIT) of humans and animals. In the GIT, Lactobacillus population can be made by true inhabitants or by bacteria occasionally ingested with fermented or spoiled foods, or with probiotics. This study longitudinally surveyed Lactobacillus species and strains in the feces of a healthy subject through whole genome sequencing (WGS) data-mining, in order to identify members of the permanent or transient populations. In three time-points (0, 670 and 700 d), 58 different species were identified, 16 of them being retrieved for the first time in human feces. L. rhamnosus, L. ruminis, L. delbrueckii, L. plantarum, L. casei and L. acidophilus were the most represented, with estimated amounts ranging between 6 and 8 Log (cells g(-1) ), while the other were detected at 4 or 5 Log (cells g(-1) ). 86 Lactobacillus strains belonging to 52 species were identified. 43 seemingly occupied the GIT as true residents, since were detected in a time span of almost 2 years in all the three samples or in 2 samples separated by 670 or 700 d. As a whole, a stable community of lactobacilli was disclosed, with wide and understudied biodiversity. PMID:27043715

  10. Mining the genome of Rhodococcus fascians, a plant growth-promoting bacterium gone astray.

    Science.gov (United States)

    Francis, Isolde M; Stes, Elisabeth; Zhang, Yucheng; Rangel, Diana; Audenaert, Kris; Vereecke, Danny

    2016-09-25

    Rhodococcus fascians is a phytopathogenic Gram-positive Actinomycete with a very broad host range encompassing especially dicotyledonous herbaceous perennials, but also some monocots, such as the Liliaceae and, recently, the woody crop pistachio. The pathogenicity of R. fascians strain D188 is known to be encoded by the linear plasmid pFiD188 and to be dictated by its capacity to produce a mixture of cytokinins. Here, we show that D188-5, the nonpathogenic plasmid-free derivative of the wild-type strain D188 actually has a plant growth-promoting effect. With the availability of the genome sequence of R. fascians, the chromosome of strain D188 was mined for putative plant growth-promoting functions and the functionality of some of these activities was tested. This analysis together with previous results suggests that the plant growth-promoting activity of R. fascians is due to production of plant growth modulators, such as auxin and cytokinin, combined with degradation of ethylene through 1-amino-cyclopropane-1-carboxylic acid deaminase. Moreover, R. fascians has several functions that could contribute to efficient colonization and competitiveness, but there is little evidence for a strong impact on plant nutrition. Possibly, the plant growth promotion encoded by the D188 chromosome is imperative for the epiphytic phase of the life cycle of R. fascians and prepares the plant to host the bacteria, thus ensuring proper continuation into the pathogenic phase. PMID:26877150

  11. Draft Genome Sequence of Sinorhizobium meliloti CCNWSX0020, a Nitrogen-Fixing Symbiont with Copper Tolerance Capability Isolated from Lead-Zinc Mine Tailings

    Science.gov (United States)

    Li, Zhefei; Ma, Zhanqiang; Hao, Xiuli

    2012-01-01

    Sinorhizobium meliloti CCNWSX0020 was isolated from Medicago lupulina plants growing in lead-zinc mine tailings, which can establish a symbiotic relationship with Medicago species. Also, the genome of this bacterium contains a number of protein-coding sequences related to metal tolerance. We anticipate that the genomic sequence provides valuable information to explore environmental bioremediation. PMID:22328762

  12. Genome-wide mining, characterization, and development of microsatellite markers in Marsupenaeus japonicus by genome survey sequencing

    Science.gov (United States)

    Lu, Xia; Luan, Sheng; Kong, Jie; Hu, Longyang; Mao, Yong; Zhong, Shengping

    2015-12-01

    The kuruma prawn, Marsupenaeus japonicus, is one of the most cultivated and consumed species of shrimp. However, very few molecular genetic/genomic resources are publically available for it. Thus, the characterization and distribution of simple sequence repeats (SSRs) remains ambiguous and the use of SSR markers in genomic studies and marker-assisted selection is limited. The goal of this study is to characterize and develop genome-wide SSR markers in M. japonicus by genome survey sequencing for application in comparative genomics and breeding. A total of 326 945 perfect SSRs were identifi ed, among which dinucleotide repeats were the most frequent class (44.08%), followed by mononucleotides (29.67%), trinucleotides (18.96%), tetranucleotides (5.66%), hexanucleotides (1.07%), and pentanucleotides (0.56%). In total, 151 541 SSR loci primers were successfully designed. A subset of 30 SSR primer pairs were synthesized and tested in 42 individuals from a wild population, of which 27 loci (90.0%) were successfully amplifi ed with specifi c products and 24 (80.0%) were polymorphic. For the amplifi ed polymorphic loci, the alleles ranged from 5 to 17 (with an average of 9.63), and the average PIC value was 0.796. A total of 58 256 SSR-containing sequences had signifi cant Gene Ontology annotation; these are good functional molecular marker candidates for association studies and comparative genomic analysis. The newly identifi ed SSRs signifi cantly contribute to the M. japonicus genomic resources and will facilitate a number of genetic and genomic studies, including high density linkage mapping, genome-wide association analysis, marker-aided selection, comparative genomics analysis, population genetics, and evolution.

  13. Self-organizing Approach for Automated Gene Identification in Whole Genomes

    CERN Document Server

    Gorban, A N; Popova, T G; Gorban, Alexander N.; Zinovyev, Andrey Yu.; Popova, Tatyana G.

    2001-01-01

    An approach based on using the idea of distinguished coding phase in explicit form for identification of protein-coding regions (exons) in whole genome has been proposed. For several genomes an optimal window length for averaging GC-content function and calculating codon frequencies has been found. Self-training procedure based on clustering in multidimensional space of triplet frequencies is proposed. For visualization of data in the space of triplet requiencies method of elastic maps was applied.

  14. Self-organizing Approach for Automated Gene Identification in Whole Genomes

    OpenAIRE

    Gorban, Alexander N; Zinovyev, Andrey Yu.; Popova, Tatyana G.

    2001-01-01

    An approach based on using the idea of distinguished coding phase in explicit form for identification of protein-coding regions (exons) in whole genome has been proposed. For several genomes an optimal window length for averaging GC-content function and calculating codon frequencies has been found. Self-training procedure based on clustering in multidimensional space of triplet frequencies is proposed. For visualization of data in the space of triplet requiencies method of elastic maps was ap...

  15. The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

    OpenAIRE

    Stevens Fred J; Johnson Seth; Desai Valmik; Zavaljevski Nela; Yu Chenggang; Reifman Jaques

    2008-01-01

    Abstract Background Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeli...

  16. Data mining of high density genomic variant data for prediction of Alzheimer's disease risk

    Directory of Open Access Journals (Sweden)

    Briones Natalia

    2012-01-01

    Full Text Available Abstract Background The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs at genes implicated in immune response, cholesterol/lipid metabolism, and cell membrane processes have been confirmed by genome-wide association studies (GWAS to be associated with late-onset Alzheimer's disease (LOAD, a percentage of AD heritability continues to be unexplained. We try to find other genetic variants that may influence LOAD risk utilizing data mining methods. Methods Two different approaches were devised to select SNPs associated with LOAD in a publicly available GWAS data set consisting of three cohorts. In both approaches, single-locus analysis (logistic regression was conducted to filter the data with a less conservative p-value than the Bonferroni threshold; this resulted in a subset of SNPs used next in multi-locus analysis (random forest (RF. In the second approach, we took into account prior biological knowledge, and performed sample stratification and linkage disequilibrium (LD in addition to logistic regression analysis to preselect loci to input into the RF classifier construction step. Results The first approach gave 199 SNPs mostly associated with genes in calcium signaling, cell adhesion, endocytosis, immune response, and synaptic function. These SNPs together with APOE and GAB2 SNPs formed a predictive subset for LOAD status with an average error of 9.8% using 10-fold cross validation (CV in RF modeling. Nineteen variants in LD with ST5, TRPC1, ATG10, ANO3, NDUFA12, and NISCH respectively, genes linked directly or indirectly with neurobiology, were identified with the second approach. These variants were part of a model that included APOE and GAB2 SNPs to predict LOAD

  17. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    Science.gov (United States)

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. PMID:26589635

  18. In silico mining of putative microsatellite markers from whole genome sequence of water buffalo (Bubalus bubalis) and development of first BuffSatDB

    OpenAIRE

    Sarika,; Arora Vasu; Iquebal Mir; Rai Anil; Kumar Dinesh

    2013-01-01

    Abstract Background Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and “finishing” expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such mar...

  19. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  20. A framework for automated enrichment of functionally significant inverted repeats in whole genomes

    Directory of Open Access Journals (Sweden)

    Frank Ronald L

    2010-10-01

    Full Text Available Abstract Background RNA transcripts from genomic sequences showing dyad symmetry typically adopt hairpin-like, cloverleaf, or similar structures that act as recognition sites for proteins. Such structures often are the precursors of non-coding RNA (ncRNA sequences like microRNA (miRNA and small-interfering RNA (siRNA that have recently garnered more functional significance than in the past. Genomic DNA contains hundreds of thousands of such inverted repeats (IRs with varying degrees of symmetry. But by collecting statistically significant information from a known set of ncRNA, we can sort these IRs into those that are likely to be functional. Results A novel method was developed to scan genomic DNA for partially symmetric inverted repeats and the resulting set was further refined to match miRNA precursors (pre-miRNA with respect to their density of symmetry, statistical probability of the symmetry, length of stems in the predicted hairpin secondary structure, and the GC content of the stems. This method was applied on the Arabidopsis thaliana genome and validated against the set of 190 known Arabidopsis pre-miRNA in the miRBase database. A preliminary scan for IRs identified 186 of the known pre-miRNA but with 714700 pre-miRNA candidates. This large number of IRs was further refined to 483908 candidates with 183 pre-miRNA identified and further still to 165371 candidates with 171 pre-miRNA identified (i.e. with 90% of the known pre-miRNA retained. Conclusions 165371 candidates for potentially functional miRNA is still too large a set to warrant wet lab analyses, such as northern blotting, on all of them. Hence additional filters are needed to further refine the number of candidates while still retaining most of the known miRNA. These include detection of promoters and terminators, homology analyses, location of candidate relative to coding regions, and better secondary structure prediction algorithms. The software developed is designed to easily

  1. Chicken genome mapping - Constructing part of a road map for mining this bird's DNA

    NARCIS (Netherlands)

    Aerts, J.

    2005-01-01

    The aim of the research presented in this thesis was to aid in the international chicken genome mapping effort. To this purpose, a significant contribution was made to the construction of the chicken whole-genome BAC-based physical map (presented in Chapter A). An important aspect of this constructi

  2. Discovery of phosphonic acid natural products by mining the genomes of 10,000 actinomycetes

    Science.gov (United States)

    Although natural products have been a particularly rich source of human medicines, the rate at which new molecules are being discovered is declining precipitously. Based on the large number of natural product biosynthetic genes in microbial genomes, many have suggested “genome mining” as an approach...

  3. CisMiner: genome-wide in-silico cis-regulatory module prediction by fuzzy itemset mining.

    Directory of Open Access Journals (Sweden)

    Carmen Navarro

    Full Text Available Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs. However, these tools present at least one of the following limitations: 1 scope limited to promoter or conserved regions of the genome; 2 do not allow to identify combinations involving more than two motifs; 3 require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding

  4. Draft Genome Sequence of Halomonas sp. Strain HAL1, a Moderately Halophilic Arsenite-Oxidizing Bacterium Isolated from Gold-Mine Soil

    OpenAIRE

    Lin, Yanbing; Fan, Haoxin; Hao, Xiuli; Johnstone, Laurel; Hu, Yao; Wei, Gehong; Alwathnani, Hend A.; Wang, Gejiao; Rensing, Christopher

    2012-01-01

    We report the draft genome sequence of arsenite-oxidizing Halomonas sp. strain HAL1, isolated from the soil of a gold mine. Genes encoding proteins involved in arsenic resistance and transformation, phosphate utilization and uptake, and betaine biosynthesis were identified. Their identification might help in understanding how arsenic and phosphate metabolism are intertwined.

  5. Evaluating the Strengths and Weaknesses of Mining Audit Data for Automated Models for Intrusion Detection in Tcpdump and Basic Security Module Data

    Directory of Open Access Journals (Sweden)

    A. Arul Lawrence Selvakumar

    2012-01-01

    Full Text Available Problem statement: Intrusion Detection System (IDS have become an important component of infrastructure protection mechanism to secure the current and emerging networks, its services and applications by detecting, alerting and taking necessary actions against the malicious activities. The network size, technology diversities and security policies make networks more challenging and hence there is a requirement for IDS which should be very accurate, adaptive, extensible and more reliable. Although there exists the novel framework for this requirement namely Mining Audit Data for Automated Models for Intrusion Detection (MADAM ID, it is having some performance shortfalls in processing the audit data. Approach: Few experiments were conducted on tcpdump data of DARPA and BCM audit files by applying the algorithms and tools of MADAM ID in the processing of audit data, mine patterns, construct features and build RIPPER classifiers. By putting it all together, four main categories of attacks namely DOS, R2L, U2R and PROBING attacks were simulated. Results: This study outlines the experimentation results of MADAM ID in testing the DARPA and BSM data on a simulated network environment. Conclusion: The strengths and weakness of MADAM ID has been identified thru the experiments conducted on tcpdump data and also on Pascal based audit files of Basic Security Module (BSM. This study also gives some additional directions about the future applications of MADAM ID.

  6. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Bisaria, Anjali [ORNL; Tuskan, Gerald A [ORNL; Kalluri, Udaya C [ORNL

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additional genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.

  7. Genomic analyses of metal resistance genes in three plant growth promoting bacteria of legume plants in Northwest mine tailings, China

    Institute of Scientific and Technical Information of China (English)

    Pin Xie; Xiuli Hao; Martin Herzberg; Yantao Luo; Dietrich H.Nies; Gehong Wei

    2015-01-01

    To better understand the diversity of metal resistance genetic determinant from microbes that survived at metal tailings in northwest of China,a highly elevated level of heavy metal containing region,genomic analyses was conducted using genome sequence of three native metal-resistant plant growth promoting bacteria (PGPB).It shows that:Mesorhizobium amorphae CCNWGS0123 contains metal ~nsporters from P-type ATPase,CDF (Cation Diffusion Facilitator),HupE/UreJ and CHR (chromate ion transporter) family involved in copper,zinc,nickel as well as chromate resistance and homeostasis.Meanwhile,the putative CopA/CueO system is expected to mediate copper resistance in Sinorhizobium meliloti CCNWSX0020 while ZntA transporter,assisted with putative CzcD,determines zinc tolerance in Agrobacterium tumefaciens CCNWGS0286.The greenhouse experiment provides the consistent evidence of the plant growth promoting effects of these microbes on their hosts by nitrogen fixation and/or indoleacetic acid (IAA) secretion,indicating a potential in-site phytoremediation usage in the mining tailing regions of China.

  8. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

    Science.gov (United States)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

    2015-07-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579

  9. Process mining

    DEFF Research Database (Denmark)

    van der Aalst, W.M.P.; Rubin, V.; Verbeek, H.M.W.;

    2010-01-01

    Process mining includes the automated discovery of processes from event logs. Based on observed events (e.g., activities being executed or messages being exchanged) a process model is constructed. One of the essential problems in process mining is that one cannot assume to have seen all possible...... behavior. At best, one has seen a representative subset. Therefore, classical synthesis techniques are not suitable as they aim at finding a model that is able to exactly reproduce the log. Existing process mining techniques try to avoid such “overfitting” by generalizing the model to allow for more...

  10. Machine learning and data mining in complex genomic data-a review on the lessons learned in Genetic Analysis Workshop 19.

    Science.gov (United States)

    König, Inke R; Auerbach, Jonathan; Gola, Damian; Held, Elizabeth; Holzinger, Emily R; Legault, Marc-André; Sun, Rui; Tintle, Nathan; Yang, Hsin-Chou

    2016-01-01

    In the analysis of current genomic data, application of machine learning and data mining techniques has become more attractive given the rising complexity of the projects. As part of the Genetic Analysis Workshop 19, approaches from this domain were explored, mostly motivated from two starting points. First, assuming an underlying structure in the genomic data, data mining might identify this and thus improve downstream association analyses. Second, computational methods for machine learning need to be developed further to efficiently deal with the current wealth of data.In the course of discussing results and experiences from the machine learning and data mining approaches, six common messages were extracted. These depict the current state of these approaches in the application to complex genomic data. Although some challenges remain for future studies, important forward steps were taken in the integration of different data types and the evaluation of the evidence. Mining the data for underlying genetic or phenotypic structure and using this information in subsequent analyses proved to be extremely helpful and is likely to become of even greater use with more complex data sets. PMID:26866367

  11. Genomic insights into a new acidophilic, copper-resistant Desulfosporosinus isolate from the oxidized tailings area of an abandoned gold mine.

    Science.gov (United States)

    Mardanov, Andrey V; Panova, Inna A; Beletsky, Alexey V; Avakyan, Marat R; Kadnikov, Vitaly V; Antsiferov, Dmitry V; Banks, David; Frank, Yulia A; Pimenov, Nikolay V; Ravin, Nikolai V; Karnachuk, Olga V

    2016-08-01

    Microbial sulfate reduction in acid mine drainage is still considered to be confined to anoxic conditions, although several reports have shown that sulfate-reducing bacteria occur under microaerophilic or aerobic conditions. We have measured sulfate reduction rates of up to 60 nmol S cm(-3) day(-1) in oxidized layers of gold mine tailings in Kuzbass (SW Siberia). A novel, acidophilic, copper-tolerant Desulfosporosinus sp. I2 was isolated from the same sample and its genome was sequenced. The genomic analysis and physiological data indicate the involvement of transporters and additional mechanisms to tolerate metals, such as sequestration by polyphosphates. Desulfosporinus sp. I2 encodes systems for a metabolically versatile life style. The genome possessed a complete Embden-Meyerhof pathway for glycolysis and gluconeogenesis. Complete oxidation of organic substrates could be enabled by the complete TCA cycle. Genomic analysis found all major components of the electron transfer chain necessary for energy generation via oxidative phosphorylation. Autotrophic CO2 fixation could be performed through the Wood-Ljungdahl pathway. Multiple oxygen detoxification systems were identified in the genome. Taking into account the metabolic activity and genomic analysis, the traits of the novel isolate broaden our understanding of active sulfate reduction and associated metabolism beyond strictly anaerobic niches. PMID:27222219

  12. EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

    Directory of Open Access Journals (Sweden)

    Nuez Fernando

    2008-01-01

    Full Text Available Abstract Background Expressed sequence tag (EST collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. Results We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. Conclusion The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at http

  13. Banking biological collections: data warehousing, data mining, and data dilemmas in genomics and global health policy.

    Science.gov (United States)

    Blatt, R J R

    2000-01-01

    While DNA databases may offer the opportunity to (1) assess population-based prevalence of specific genes and variants, (2) simplify the search for molecular markers, (3) improve targeted drug discovery and development for disease management, (4) refine strategies for disease prevention, and (5) provide the data necessary for evidence-based decision-making, serious scientific and social questions remain. Whether samples are identified, coded, or anonymous, biological banking raises profound ethical and legal issues pertaining to access, informed consent, privacy and confidentiality of genomic information, civil liberties, patenting, and proprietary rights. This paper provides an overview of key policy issues and questions pertaining to biological banking, with a focus on developments in specimen collection, transnational distribution, and public health and academic-industry research alliances. It highlights the challenges posed by the commercialization of genomics, and proposes the need for harmonization of biological banking policies. PMID:11878344

  14. Data Mining Approaches for Genome-Wide Association of Mood Disorders

    OpenAIRE

    Pirooznia, Mehdi; Seifuddin, Fayaz; Judy, Jennifer; Mahon, Pamela B; James B Potash; Zandi, Peter P.

    2012-01-01

    Mood disorders are highly heritable forms of major mental illness. A major breakthrough in elucidating the genetic architecture of mood disorders was anticipated with the advent of genome-wide association studies (GWAS). However, to date few susceptibility loci have been conclusively identified. The genetic etiology of mood disorders appears to be quite complex, and as a result, alternative approaches for analyzing GWAS data are needed. Recently, a polygenic scoring approach that captures the...

  15. Recent advances in genome mining of secondary metabolites in Aspergillus terreus

    OpenAIRE

    Guo, Chun-Jun; Wang, Clay C. C.

    2014-01-01

    Filamentous fungi are rich resources of secondary metabolites (SMs) with a variety of interesting biological activities. Recent advances in genome sequencing and techniques in genetic manipulation have enabled researchers to study the biosynthetic genes of these SMs. Aspergillus terreus is the well-known producer of lovastatin, a cholesterol-lowering drug. This fungus also produces other SMs, including acetylaranotin, butyrolactones, and territram, with interesting bioactivities. This review ...

  16. Nuclear Species-Diagnostic SNP Markers Mined from 454 Amplicon Sequencing Reveal Admixture Genomic Structure of Modern Citrus Varieties

    Science.gov (United States)

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  17. Nuclear species-diagnostic SNP markers mined from 454 amplicon sequencing reveal admixture genomic structure of modern citrus varieties.

    Science.gov (United States)

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  18. PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

    OpenAIRE

    Zhao Xuechun; Dai Xinbin; He Ji

    2007-01-01

    Abstract Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software...

  19. Databank based mining on the track of antimicrobial weapons in plant genomes.

    Science.gov (United States)

    Belarmino, Luis C; Benko-Iseppon, Ana M

    2010-05-01

    The expressive amount of nucleotide sequences from diverse plant species in databanks enables the use of computational approaches to discovery still unidentified genes and to infer about their function, structure and role in some biological processes. Of special interest are the antimicrobial peptides (AMP), whose functionalities have a very important role in defense against microbial infection in multicellular eukaryotes, being considered less susceptible to bacterial resistance than traditional antibiotics, with potential to develop a new class of therapeutic agents. Recent computational developments have provided various algorithms and resources to profit from the overwhelming information in data banks for biomining such peptides. This review focuses on the computational and bioinformatic approaches so far used for the identification of antimicrobial peptides in plant systems, highlighting alternative means of mining the entire plant peptide space that has recently become available. PMID:20088774

  20. CGMIM: Automated text-mining of Online Mendelian Inheritance in Man (OMIM to identify genetically-associated cancers and candidate genes

    Directory of Open Access Journals (Sweden)

    Jones Steven

    2005-03-01

    Full Text Available Abstract Background Online Mendelian Inheritance in Man (OMIM is a computerized database of information about genes and heritable traits in human populations, based on information reported in the scientific literature. Our objective was to establish an automated text-mining system for OMIM that will identify genetically-related cancers and cancer-related genes. We developed the computer program CGMIM to search for entries in OMIM that are related to one or more cancer types. We performed manual searches of OMIM to verify the program results. Results In the OMIM database on September 30, 2004, CGMIM identified 1943 genes related to cancer. BRCA2 (OMIM *164757, BRAF (OMIM *164757 and CDKN2A (OMIM *600160 were each related to 14 types of cancer. There were 45 genes related to cancer of the esophagus, 121 genes related to cancer of the stomach, and 21 genes related to both. Analysis of CGMIM results indicate that fewer than three gene entries in OMIM should mention both, and the more than seven-fold discrepancy suggests cancers of the esophagus and stomach are more genetically related than current literature suggests. Conclusion CGMIM identifies genetically-related cancers and cancer-related genes. In several ways, cancers with shared genetic etiology are anticipated to lead to further etiologic hypotheses and advances regarding environmental agents. CGMIM results are posted monthly and the source code can be obtained free of charge from the BC Cancer Research Centre website http://www.bccrc.ca/ccr/CGMIM.

  1. The discovery of putative urine markers for the specific detection of prostate tumor by integrative mining of public genomic profiles.

    Directory of Open Access Journals (Sweden)

    Min Chen

    Full Text Available Urine has emerged as an attractive biofluid for the noninvasive detection of prostate cancer (PCa. There is a strong imperative to discover candidate urinary markers for the clinical diagnosis and prognosis of PCa. The rising flood of various omics profiles presents immense opportunities for the identification of prospective biomarkers. Here we present a simple and efficient strategy to derive candidate urine markers for prostate tumor by mining cancer genomic profiles from public databases. Prostate, bladder and kidney are three major tissues from which cellular matters could be released into urine. To identify urinary markers specific for PCa, upregulated entities that might be shed in exosomes of bladder cancer and kidney cancer are first excluded. Through the ontology-based filtering and further assessment, a reduced list of 19 entities encoding urinary proteins was derived as putative PCa markers. Among them, we have found 10 entities closely associated with the process of tumor cell growth and development by pathway enrichment analysis. Further, using the 10 entities as seeds, we have constructed a protein-protein interaction (PPI subnetwork and suggested a few urine markers as preferred prognostic markers to monitor the invasion and progression of PCa. Our approach is amenable to discover and prioritize potential markers present in a variety of body fluids for a spectrum of human diseases.

  2. Analysis of regulatory protease sequences identified through bioinformatic data mining of the Schistosoma mansoni genome

    Directory of Open Access Journals (Sweden)

    Minchella Dennis J

    2009-10-01

    Full Text Available Abstract Background New chemotherapeutic agents against Schistosoma mansoni, an etiological agent of human schistosomiasis, are a priority due to the emerging drug resistance and the inability of current drug treatments to prevent reinfection. Proteases have been under scrutiny as targets of immunological or chemotherapeutic anti-Schistosoma agents because of their vital role in many stages of the parasitic life cycle. Function has been established for only a handful of identified S. mansoni proteases, and the vast majority of these are the digestive proteases; very few of the conserved classes of regulatory proteases have been identified from Schistosoma species, despite their vital role in numerous cellular processes. To that end, we identified protease protein coding genes from the S. mansoni genome project and EST library. Results We identified 255 protease sequences from five catalytic classes using predicted proteins of the S. mansoni genome. The vast majority of these show significant similarity to proteins in KEGG and the Conserved Domain Database. Proteases include calpains, caspases, cytosolic and mitochondrial signal peptidases, proteases that interact with ubiquitin and ubiquitin-like molecules, and proteases that perform regulated intramembrane proteolysis. Comparative analysis of classes of important regulatory proteases find conserved active site domains, and where appropriate, signal peptides and transmembrane helices. Phylogenetic analysis provides support for inferring functional divergence among regulatory aspartic, cysteine, and serine proteases. Conclusion Numerous proteases are identified for the first time in S. mansoni. We characterized important regulatory proteases and focus analysis on these proteases to complement the growing knowledge base of digestive proteases. This work provides a foundation for expanding knowledge of proteases in Schistosoma species and examining their diverse function and potential as targets

  3. Genomic mining for novel FADH₂-dependent halogenases in marine sponge-associated microbial consortia.

    Science.gov (United States)

    Bayer, Kristina; Scheuermayer, Matthias; Fieseler, Lars; Hentschel, Ute

    2013-02-01

    Many marine sponges (Porifera) are known to contain large amounts of phylogenetically diverse microorganisms. Sponges are also known for their large arsenal of natural products, many of which are halogenated. In this study, 36 different FADH₂-dependent halogenase gene fragments were amplified from various Caribbean and Mediterranean sponges using newly designed degenerate PCR primers. Four unique halogenase-positive fosmid clones, all containing the highly conserved amino acid motif "GxGxxG", were identified in the microbial metagenome of Aplysina aerophoba. Sequence analysis of one halogenase-bearing fosmid revealed notably two open reading frames with high homologies to efflux and multidrug resistance proteins. Single cell genomic analysis allowed for a taxonomic assignment of the halogenase genes to specific symbiotic lineages. Specifically, the halogenase cluster S1 is predicted to be produced by a deltaproteobacterial symbiont and halogenase cluster S2 by a poribacterial sponge symbiont. An additional halogenase gene is possibly produced by an actinobacterial symbiont of marine sponges. The identification of three novel, phylogenetically, and possibly also functionally distinct halogenase gene clusters indicates that the microbial consortia of sponges are a valuable resource for novel enzymes involved in halogenation reactions. PMID:22562484

  4. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

    Science.gov (United States)

    Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc. PMID:27322403

  5. Detailed investigation of cascaded Volterra fusion of processing strings for automated sea mine classification in very shallow water

    Science.gov (United States)

    Aridgides, Tom; Fernández, Manuel

    2006-05-01

    An improved sea mine computer-aided-detection/computer-aided- classification (CAD/CAC) processing string has been developed. The overall CAD/CAC processing string consists of pre-processing, subimage adaptive clutter filtering (SACF), normalization, detection, feature extraction, repeated application of optimal subset feature selection, feature orthogonalization and log-likelihood-ratio-test (LLRT) classification processing, and fusion processing blocks. The classified objects of 3 distinct processing strings are fused using the classification confidence values as features and either "M-out-of-N" or LLRT-based fusion rules. The utility of the overall processing strings and their fusion was demonstrated with new very shallow water high-resolution sonar imagery data. The processing string detection and classification parameters were tuned and the string classification performance was optimized, by appropriately selecting a subset of the original feature set. Two significant fusion algorithm improvements were made. First, a new nonlinear (Volterra) feature LLRT fusion algorithm was developed. Second, a repeated application of the subset Volterra feature selection/feature orthogonalization/LLRT fusion block was utilized. It was shown that this cascaded Volterra feature LLRT fusion of the CAD/CAC processing strings outperforms the "M-out- of-N," the baseline LLRT and single-stage Volterra feature LLRT fusion algorithms, and also yields an improvement over the best single CAD/CAC processing string, providing a significant reduction in the false alarm rate. Additionally, the robustness of cascade Volterra feature fusion was demonstrated, by showing that the algorithm yields similar performance with the training and test sets.

  6. A novel data mining method to identify assay-specific signatures in functional genomic studies

    Directory of Open Access Journals (Sweden)

    Guidarelli Jack W

    2006-08-01

    Full Text Available Abstract Background: The highly dimensional data produced by functional genomic (FG studies makes it difficult to visualize relationships between gene products and experimental conditions (i.e., assays. Although dimensionality reduction methods such as principal component analysis (PCA have been very useful, their application to identify assay-specific signatures has been limited by the lack of appropriate methodologies. This article proposes a new and powerful PCA-based method for the identification of assay-specific gene signatures in FG studies. Results: The proposed method (PM is unique for several reasons. First, it is the only one, to our knowledge, that uses gene contribution, a product of the loading and expression level, to obtain assay signatures. The PM develops and exploits two types of assay-specific contribution plots, which are new to the application of PCA in the FG area. The first type plots the assay-specific gene contribution against the given order of the genes and reveals variations in distribution between assay-specific gene signatures as well as outliers within assay groups indicating the degree of importance of the most dominant genes. The second type plots the contribution of each gene in ascending or descending order against a constantly increasing index. This type of plots reveals assay-specific gene signatures defined by the inflection points in the curve. In addition, sharp regions within the signature define the genes that contribute the most to the signature. We proposed and used the curvature as an appropriate metric to characterize these sharp regions, thus identifying the subset of genes contributing the most to the signature. Finally, the PM uses the full dataset to determine the final gene signature, thus eliminating the chance of gene exclusion by poor screening in earlier steps. The strengths of the PM are demonstrated using a simulation study, and two studies of real DNA microarray data – a study of

  7. Burkholderia genome mining for nonribosomal peptide synthetases reveals a great potential for novel siderophores and lipopeptides synthesis.

    Science.gov (United States)

    Esmaeel, Qassim; Pupin, Maude; Kieu, Nam Phuong; Chataigné, Gabrielle; Béchet, Max; Deravel, Jovana; Krier, François; Höfte, Monica; Jacques, Philippe; Leclère, Valérie

    2016-06-01

    Burkholderia is an important genus encompassing a variety of species, including pathogenic strains as well as strains that promote plant growth. We have carried out a global strategy, which combined two complementary approaches. The first one is genome guided with deep analysis of genome sequences and the second one is assay guided with experiments to support the predictions obtained in silico. This efficient screening for new secondary metabolites, performed on 48 gapless genomes of Burkholderia species, revealed a total of 161 clusters containing nonribosomal peptide synthetases (NRPSs), with the potential to synthesize at least 11 novel products. Most of them are siderophores or lipopeptides, two classes of products with potential application in biocontrol. The strategy led to the identification, for the first time, of the cluster for cepaciachelin biosynthesis in the genome of Burkholderia ambifaria AMMD and a cluster corresponding to a new malleobactin-like siderophore, called phymabactin, was identified in Burkholderia phymatum STM815 genome. In both cases, the siderophore was produced when the strain was grown in iron-limited conditions. Elsewhere, the cluster for the antifungal burkholdin was detected in the genome of B. ambifaria AMMD and also Burkholderia sp. KJ006. Burkholderia pseudomallei strains harbor the genetic potential to produce a novel lipopeptide called burkhomycin, containing a peptidyl moiety of 12 monomers. A mixture of lipopeptides produced by Burkholderia rhizoxinica lowered the surface tension of the supernatant from 70 to 27 mN·m(-1) . The production of nonribosomal secondary metabolites seems related to the three phylogenetic groups obtained from 16S rRNA sequences. Moreover, the genome-mining approach gave new insights into the nonribosomal synthesis exemplified by the identification of dual C/E domains in lipopeptide NRPSs, up to now essentially found in Pseudomonas strains. PMID:27060604

  8. Quantification of Operational Risk Using A Data Mining

    Science.gov (United States)

    Perera, J. Sebastian

    1999-01-01

    What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process

  9. Characterization of the alkaline laccase Ssl1 from Streptomyces sviceus with unusual properties discovered by genome mining.

    Directory of Open Access Journals (Sweden)

    Matthias Gunne

    Full Text Available Fungal laccases are well investigated enzymes with high potential in diverse applications like bleaching of waste waters and textiles, cellulose delignification, and organic synthesis. However, they are limited to acidic reaction conditions and require eukaryotic expression systems. This raises a demand for novel laccases without these constraints. We have taken advantage of the laccase engineering database LccED derived from genome mining to identify and clone the laccase Ssl1 from Streptomyces sviceus which can circumvent the limitations of fungal laccases. Ssl1 belongs to the family of small laccases that contains only few characterized enzymes. After removal of the twin-arginine signal peptide Ssl1 was readily expressed in E. coli. Ssl1 is a small laccase with 32.5 kDa, consists of only two cupredoxin-like domains, and forms trimers in solution. Ssl1 oxidizes 2,2'-azino-bis(3-ethylbenzthiazoline-6-sulfonic acid (ABTS and phenolic substrates like 2,6-dimethoxy phenol, guaiacol, and syringaldazine. The k(cat value for ABTS oxidation was at least 20 times higher than for other substrates. The optimal pH for oxidation reactions is substrate dependent: for phenolic substrates the highest activities were detected at alkaline conditions (pH 9.0 for 2,6-dimethoxy phenol and guaiacol and pH 8.0 for syringaldazine, while the highest reaction rates with ABTS were observed at pH 4.0. Though originating from a mesophilic organism, Ssl demonstrates remarkable stability at elevated temperatures (T(1/2,60°C = 88 min and in a wide pH range (pH 5.0 to 11.0. Notably, the enzyme retained 80% residual activity after 5 days of incubation at pH 11. Detergents and organic co-solvents do not affect Ssl1 stability. The described robustness makes Ssl1 a potential candidate for industrial applications, preferably in processes that require alkaline reaction conditions.

  10. Design and development of Java-based office automation system for coal mine enterprises%基于Java的煤矿企业办公自动化系统设计与开发

    Institute of Scientific and Technical Information of China (English)

    刘红霞; 张慧

    2015-01-01

    为满足煤矿企业办公信息化需要,将传统办公管理模式逐步向自动化办公管理模式转变,系统采用Java,JSP, SQL Server 2005等技术,基于B/S 结构设计开发煤矿企业办公自动化系统。结果表明,该系统结合煤矿企业的办公现状,为企业提供了一个科学、开放、先进的信息化办公平台,有效地降低了办公成本,提升了办公效率,推动了企业的信息化发展。%In order to meet the needs of office informatization for the coal mine enterprises,and gradually transform the tra⁃ditional office management mode to the office automation management mode,a B/S structure based office automation system of coal mining enterprises was designed and developed with Java,JSP,SQL Server 2005 and other technologies. The application results show that the system combines the present situation of the office system for coal mining enterprises,provides a scientific, open and advanced informatization working platform for coal mine enterprises,reduces the cost of office effectively,improve the office efficiency,and promote the informatization development of enterprises.

  11. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies

    OpenAIRE

    Rollins Derrick K; Teh AiLing

    2010-01-01

    Abstract Background Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the develo...

  12. Software Testing and Documenting Automation

    OpenAIRE

    Tsybin, Anton; Lyadova, Lyudmila

    2008-01-01

    This article describes some approaches to problem of testing and documenting automation in information systems with graphical user interface. Combination of data mining methods and theory of finite state machines is used for testing automation. Automated creation of software documentation is based on using metadata in documented system. Metadata is built on graph model. Described approaches improve performance and quality of testing and documenting processes.

  13. Coal Mine Integrated Automation System Based on Internet of Things Technology%基于物联网技术的煤矿综合自动化系统

    Institute of Scientific and Technical Information of China (English)

    黄成玉; 李学哲; 张全柱

    2012-01-01

    设计了基于物联网的煤矿综合自动化系统,并建立了基于Web的综合自动化软件平台,对全矿的人员、物资、设备和基础设施等进行实时有效监控和管理,具有智能告警、多媒体报警、查询、确认、数据采集、硬盘录像等功能,监控监视SCADA系统设备,远程监控矿井环境参数,人员定位和管理,实现了地面监控中心对井下电设备的遥测、遥控、遥调。%Integrated automation system is designed based on internet of things, and integrated automation software platform is set based on Web. The system and software platform not only realize the real time monitoring and management for staff, material, equipment and infrastructure, but also realize intelligent alarm, multimedia alarm, inquring, confirming, date acquisition, harddisk video and other functions. Monitoring SCADA system equipment, remote minitoring for environment parameters of mine, personnel orientation and management realize that ground monitoring centre can remotely monitor, control and modulate electrical installation of mine.

  14. Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data

    Directory of Open Access Journals (Sweden)

    Minucci Saverio

    2011-10-01

    Full Text Available Abstract Background High-throughput sequencing is generating massive amounts of data at a pace that largely exceeds the throughput of data analysis routines. Here we introduce Fish the ChIPs (FC, a computational pipeline aimed at a broad public of users and designed to perform complete ChIP-Seq data analysis of an unlimited number of samples, thus increasing throughput, reproducibility and saving time. Results Starting from short read sequences, FC performs the following steps: 1 quality controls, 2 alignment to a reference genome, 3 peak calling, 4 genomic annotation, 5 generation of raw signal tracks for visualization on the UCSC and IGV genome browsers. FC exploits some of the fastest and most effective tools today available. Installation on a Mac platform requires very basic computational skills while configuration and usage are supported by a user-friendly graphic user interface. Alternatively, FC can be compiled from the source code on any Unix machine and then run with the possibility of customizing each single parameter through a simple configuration text file that can be generated using a dedicated user-friendly web-form. Considering the execution time, FC can be run on a desktop machine, even though the use of a computer cluster is recommended for analyses of large batches of data. FC is perfectly suited to work with data coming from Illumina Solexa Genome Analyzers or ABI SOLiD and its usage can potentially be extended to any sequencing platform. Conclusions Compared to existing tools, FC has two main advantages that make it suitable for a broad range of users. First of all, it can be installed and run by wet biologists on a Mac machine. Besides it can handle an unlimited number of samples, being convenient for large analyses. In this context, computational biologists can increase reproducibility of their ChIP-Seq data analyses while saving time for downstream analyses. Reviewers This article was reviewed by Gavin Huttley, George

  15. Draft Genome of Streptomyces zinciresistens K42, a Novel Metal-Resistant Species Isolated from Copper-Zinc Mine Tailings

    Science.gov (United States)

    Lin, Yanbing; Hao, Xiuli; Johnstone, Laurel; Miller, Susan J.; Baltrus, David A.; Rensing, Christopher; Wei, Gehong

    2011-01-01

    A draft genome sequence of Streptomyces zinciresistens K42, a novel Streptomyces species displaying a high level of resistance to zinc and cadmium, is presented here. The genome contains a large number of genes encoding proteins predicted to be involved in conferring metal resistance. Many of these genes appear to have been acquired through horizontal gene transfer. PMID:22038968

  16. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases

    OpenAIRE

    Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

    2014-01-01

    Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount...

  17. Mining plant genome browsers as a means for efficient connection of physical, genetic and cytogenetic mapping: an example using soybean

    Directory of Open Access Journals (Sweden)

    Luis C. Belarmino

    2012-01-01

    Full Text Available Physical maps are important tools to uncover general chromosome structure as well as to compare different plant lineages and species, helping to elucidate genome structure, evolution and possibilities regarding synteny and colinearity. The increasing production of sequence data has opened an opportunity to link information from mapping studies to the underlying sequences. Genome browsers are invaluable platforms that provide access to these sequences, including tools for genome analysis, allowing the integration of multivariate information, and thus aiding to explain the emergence of complex genomes. The present work presents a tutorial regarding the use of genome browsers to develop targeted physical mapping, providing also a general overview and examples about the possibilities regarding the use of Fluorescent In Situ Hybridization (FISH using bacterial artificial chromosomes (BAC, simple sequence repeats (SSR and rDNA probes, highlighting the potential of such studies for map integration and comparative genetics. As a case study, the available genome of soybean was accessed to show how the physical and in silico distribution of such sequences may be compared at different levels. Such evaluations may also be complemented by the identification of sequences beyond the detection level of cytological methods, here using members of the aquaporin gene family as an example. The proposed approach highlights the complementation power of the combination of molecular cytogenetics and computational approaches for the anchoring of coding or repetitive sequences in plant genomes using available genome browsers, helping in the determination of sequence location, arrangement and number of repeats, and also filling gaps found in computational pseudochromosome assemblies.

  18. Draft genome sequence of extremely acidophilic bacterium Acidithiobacillus ferrooxidans DLC-5 isolated from acid mine drainage in Northeast China

    Directory of Open Access Journals (Sweden)

    Peng Chen

    2015-12-01

    Full Text Available Acidithiobacillus ferrooxidans type strain DLC-5, isolated from Wudalianchi in Heihe of Heilongjiang Province, China. Here, we present the draft genome of strain DLC-5 which contains 4,232,149 bp in 2745 contigs with 57.628% GC content and includes 32,719 protein-coding genes and 64 tRNA-encoding genes. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. JNNH00000000.1.

  19. Identification of novel target genes for safer and more specific control of root-knot nematodes from a pan-genome mining.

    Directory of Open Access Journals (Sweden)

    Etienne G J Danchin

    2013-10-01

    Full Text Available Root-knot nematodes are globally the most aggressive and damaging plant-parasitic nematodes. Chemical nematicides have so far constituted the most efficient control measures against these agricultural pests. Because of their toxicity for the environment and danger for human health, these nematicides have now been banned from use. Consequently, new and more specific control means, safe for the environment and human health, are urgently needed to avoid worldwide proliferation of these devastating plant-parasites. Mining the genomes of root-knot nematodes through an evolutionary and comparative genomics approach, we identified and analyzed 15,952 nematode genes conserved in genomes of plant-damaging species but absent from non target genomes of chordates, plants, annelids, insect pollinators and mollusks. Functional annotation of the corresponding proteins revealed a relative abundance of putative transcription factors in this parasite-specific set compared to whole proteomes of root-knot nematodes. This may point to important and specific regulators of genes involved in parasitism. Because these nematodes are known to secrete effector proteins in planta, essential for parasitism, we searched and identified 993 such effector-like proteins absent from non-target species. Aiming at identifying novel targets for the development of future control methods, we biologically tested the effect of inactivation of the corresponding genes through RNA interference. A total of 15 novel effector-like proteins and one putative transcription factor compatible with the design of siRNAs were present as non-redundant genes and had transcriptional support in the model root-knot nematode Meloidogyne incognita. Infestation assays with siRNA-treated M. incognita on tomato plants showed significant and reproducible reduction of the infestation for 12 of the 16 tested genes compared to control nematodes. These 12 novel genes, showing efficient reduction of parasitism when

  20. Community Genomic and Proteomic Analyses of Chemoautotrophic Iron-Oxidizing "Leptospirillum rubarum" (Group II) and "Leptospirillum ferrodiazotrophum" (Group III) Bacteria in Acid Mine Drainage Biofilms

    Energy Technology Data Exchange (ETDEWEB)

    Goltsman, Daniela [University of California, Berkeley; Denef, Vincent [University of California, Berkeley; Singer, Steven [Lawrence Livermore National Laboratory (LLNL); Verberkmoes, Nathan C [ORNL; Lefsrud, Mark G [ORNL; Mueller, Ryan [University of California, Berkeley; Dick, Gregory J. [University of California, Berkeley; Sun, Christine [University of California, Berkeley; Wheeler, Korin [Lawrence Livermore National Laboratory (LLNL); Zelma, Adam [Lawrence Livermore National Laboratory (LLNL); Baker, Brett J. [University of California, Berkeley; Hauser, Loren John [ORNL; Land, Miriam L [ORNL; Shah, Manesh B [ORNL; Thelen, Michael P. [University of California, Berkeley; Hettich, Robert {Bob} L [ORNL; Banfield, Jillian F. [University of California, Berkeley

    2009-01-01

    We analyzed near-complete population (composite) genomic sequences for coexisting acidophilic iron-oxidizing Leptospirillum group II and III bacteria (phylum Nitrospirae) and an extrachromosomal plasmid from a Richmond Mine, Iron Mountain, CA, acid mine drainage biofilm. Community proteomic analysis of the genomically characterized sample and two other biofilms identified 64.6% and 44.9% of the predicted proteins of Leptospirillum groups II and III, respectively, and 20% of the predicted plasmid proteins. The bacteria share 92% 16S rRNA gene sequence identity and >60% of their genes, including integrated plasmid-like regions. The extrachromosomal plasmid carries conjugation genes with detectable sequence similarity to genes in the integrated conjugative plasmid, but only those on the extrachromosomal element were identified by proteomics. Both bacterial groups have genes for community-essential functions, including carbon fixation and biosynthesis of vitamins, fatty acids, and biopolymers (including cellulose); proteomic analyses reveal these activities. Both Leptospirillum types have multiple pathways for osmotic protection. Although both are motile, signal transduction and methyl-accepting chemotaxis proteins are more abundant in Leptospirillum group III, consistent with its distribution in gradients within biofilms. Interestingly, Leptospirillum group II uses a methyl-dependent and Leptospirillum group III a methyl-independent response pathway. Although only Leptospirillum group III can fix nitrogen, these proteins were not identified by proteomics. The abundances of core proteins are similar in all communities, but the abundance levels of unique and shared proteins of unknown function vary. Some proteins unique to one organism were highly expressed and may be key to the functional and ecological differentiation of Leptospirillum groups II and III.

  1. Community genomic and proteomic analysis of chemoautotrophic, iron-oxidizing "Leptospirillum rubarum" (Group II) and Leptospirillum ferrodiazotrophum (Group III) in acid mine drainage biofilms

    Energy Technology Data Exchange (ETDEWEB)

    Goltsman, Daniela [University of California, Berkeley; Denef, Vincent [University of California, Berkeley; Singer, Steven [Lawrence Livermore National Laboratory (LLNL); Verberkmoes, Nathan C [ORNL; Lefsrud, Mark G [McGill University, Montreal, Quebec; Mueller, Ryan [University of California, Berkeley; Dick, Gregory J. [University of California, Berkeley; Sun, Christine [University of California, Berkeley; Wheeler, Korin [Lawrence Livermore National Laboratory (LLNL); Zelma, Adam [Lawrence Livermore National Laboratory (LLNL); Baker, Brett J. [University of California, Berkeley; Hauser, Loren John [ORNL; Land, Miriam L [ORNL; Shah, Manesh B [ORNL; Thelen, Michael P. [University of California, Berkeley; Hettich, Robert {Bob} L [ORNL; Banfield, Jillian F. [University of California, Berkeley

    2009-01-01

    We analyzed near-complete population (composite) genomic sequences for coexisting acidophilic iron-oxidizing Leptospirillum Groups II and III bacteria (phylum Nitrospirae) and an extrachromosomal plasmid from a Richmond Mine, CA acid mine drainage (AMD) biofilm. Community proteomic analysis of the genomically characterized sample and two other biofilms identified 64.6% and 44.9% of the predicted proteins of Leptospirillum Groups II and III, respectively and 20% of the predicted plasmid proteins. The bacteria share 92% 16S rRNA gene sequence identity and > 60% of their genes, including integrated plasmid-like regions. The extrachromosomal plasmid encodes conjugation genes with detectable sequence similarity to genes in the integrated conjugative plasmid, but only those on the extrachromosomal element were identified by proteomics. Both bacteria have genes for community-essential functions, including carbon fixation, biosynthesis of vitamins, fatty acids and biopolymers (including cellulose); proteomic analyses reveal these activities. Both Leptospirillum types have multiple pathways for osmotic protection. Although both are motile, signal transduction and methyl-accepting chemotaxis proteins are more abundant in Leptospirillum Group III, consistent with its distribution in gradients within biofilms. Interestingly, Leptospirillum Group II uses a methyl-dependent and Leptospirillum Group III a methyl-independent response pathway. Although only Leptospirillum Group III can fix nitrogen, these proteins were not identified by proteomics. Abundances of core proteins are similar in all communities, but abundance levels of unique and shared proteins of unknown function vary. Some proteins unique to one organism were highly expressed and may be key to the functional and ecological differentiation of Leptospirillum Groups II and III.

  2. Semantic text mining support for lignocellulose research

    Directory of Open Access Journals (Sweden)

    Meurs Marie-Jean

    2012-04-01

    Full Text Available Abstract Background Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties. Results Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources. Conclusions Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information.

  3. High-recovery visual identification and single-cell retrieval of circulating tumor cells for genomic analysis using a dual-technology platform integrated with automated immunofluorescence staining

    International Nuclear Information System (INIS)

    Circulating tumor cells (CTCs) are malignant cells that have migrated from solid cancers into the blood, where they are typically present in rare numbers. There is great interest in using CTCs to monitor response to therapies, to identify clinically actionable biomarkers, and to provide a non-invasive window on the molecular state of a tumor. Here we characterize the performance of the AccuCyte® – CyteFinder® system, a comprehensive, reproducible and highly sensitive platform for collecting, identifying and retrieving individual CTCs from microscopic slides for molecular analysis after automated immunofluorescence staining for epithelial markers. All experiments employed a density-based cell separation apparatus (AccuCyte) to separate nucleated cells from the blood and transfer them to microscopic slides. After staining, the slides were imaged using a digital scanning microscope (CyteFinder). Precisely counted model CTCs (mCTCs) from four cancer cell lines were spiked into whole blood to determine recovery rates. Individual mCTCs were removed from slides using a single-cell retrieval device (CytePicker™) for whole genome amplification and subsequent analysis by PCR and Sanger sequencing, whole exome sequencing, or array-based comparative genomic hybridization. Clinical CTCs were evaluated in blood samples from patients with different cancers in comparison with the CellSearch® system. AccuCyte – CyteFinder presented high-resolution images that allowed identification of mCTCs by morphologic and phenotypic features. Spike-in mCTC recoveries were between 90 and 91%. More than 80% of single-digit spike-in mCTCs were identified and even a single cell in 7.5 mL could be found. Analysis of single SKBR3 mCTCs identified presence of a known TP53 mutation by both PCR and whole exome sequencing, and confirmed the reported karyotype of this cell line. Patient sample CTC counts matched or exceeded CellSearch CTC counts in a small feasibility cohort. The AccuCyte

  4. Single-cell PCR of genomic DNA enabled by automated single-cell printing for cell isolation.

    Science.gov (United States)

    Stumpf, F; Schoendube, J; Gross, A; Rath, C; Niekrawietz, S; Koltay, P; Roth, G

    2015-07-15

    Single-cell analysis has developed into a key topic in cell biology with future applications in personalized medicine, tumor identification as well as tumor discovery (Editorial, 2013). Here we employ inkjet-like printing to isolate individual living single human B cells (Raji cell line) and load them directly into standard PCR tubes. Single cells are optically detected in the nozzle of the microfluidic piezoelectric dispenser chip to ensure printing of droplets with single cells only. The printing process has been characterized by using microbeads (10µm diameter) resulting in a single bead delivery in 27 out of 28 cases and relative positional precision of ±350µm at a printing distance of 6mm between nozzle and tube lid. Process-integrated optical imaging enabled to identify the printing failure as void droplet and to exclude it from downstream processing. PCR of truly single-cell DNA was performed without pre-amplification directly from single Raji cells with 33% success rate (N=197) and Cq values of 36.3±2.5. Additionally single cell whole genome amplification (WGA) was employed to pre-amplify the single-cell DNA by a factor of >1000. This facilitated subsequent PCR for the same gene yielding a success rate of 64% (N=33) which will allow more sophisticated downstream analysis like sequencing, electrophoresis or multiplexing. PMID:25771302

  5. Genome mining of the sordarin biosynthetic gene cluster from Sordaria araneosa Cain ATCC 36386: characterization of cycloaraneosene synthase and GDP-6-deoxyaltrose transferase.

    Science.gov (United States)

    Kudo, Fumitaka; Matsuura, Yasunori; Hayashi, Takaaki; Fukushima, Masayuki; Eguchi, Tadashi

    2016-07-01

    Sordarin is a glycoside antibiotic with a unique tetracyclic diterpene aglycone structure called sordaricin. To understand its intriguing biosynthetic pathway that may include a Diels-Alder-type [4+2]cycloaddition, genome mining of the gene cluster from the draft genome sequence of the producer strain, Sordaria araneosa Cain ATCC 36386, was carried out. A contiguous 67 kb gene cluster consisting of 20 open reading frames encoding a putative diterpene cyclase, a glycosyltransferase, a type I polyketide synthase, and six cytochrome P450 monooxygenases were identified. In vitro enzymatic analysis of the putative diterpene cyclase SdnA showed that it catalyzes the transformation of geranylgeranyl diphosphate to cycloaraneosene, a known biosynthetic intermediate of sordarin. Furthermore, a putative glycosyltransferase SdnJ was found to catalyze the glycosylation of sordaricin in the presence of GDP-6-deoxy-d-altrose to give 4'-O-demethylsordarin. These results suggest that the identified sdn gene cluster is responsible for the biosynthesis of sordarin. Based on the isolated potential biosynthetic intermediates and bioinformatics analysis, a plausible biosynthetic pathway for sordarin is proposed. PMID:27072286

  6. Probe on Service Level Agreement of Integrated Mine Wide Automation System%全矿井综合自动化系统服务级别协议的探究

    Institute of Scientific and Technical Information of China (English)

    陈建伟; 竺金光

    2011-01-01

    针对目前全矿井综合自动化系统服务级别混乱和服务周期长的问题,提出采用服务级别协议的方式来分类分级管理售中和售后服务的方案,从安装调试类服务、系统推广类服务、定制升级类服务、技术支持服务和日常运营服务5个方面详细介绍了服务级别协议的分类管理及内容.服务级别协议将IT服务管理的思想贯彻到了全矿井综合自动化系统的合同签订及实施过程中,对合同的实施周期、项目的验收和合同回款都将起到显著的作用.%In view of problems of jumbled service level and long service period of integrated mine wide automation system, the paper put forward a scheme of using service level agreement to classify and grade in-sale services and after-sale services, and introduced categorization management and content of service level agreement in terms of five aspects of installation and debugging services, system promoton services,customizing and upgrading services, technique supporting services and annual operating services in details.The service level agreement can carry out the idea of IT service management in contract signing and implementation process of integrated mine wide automation system, and can play a big role in service period, project acceptance and received payments of contract.

  7. Bioinformatics and genomic medicine.

    Science.gov (United States)

    Kim, Ju Han

    2002-01-01

    Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational science. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolution in both bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high-throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever, in much the same way that biochemistry did a generation ago. This paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization and filtering, primary pattern analysis, and machine-learning algorithms are discussed. Use of integrative biochip informatics technologies, including multivariate data projection, gene-metabolic pathway mapping, automated biomolecular annotation, text mining of factual and literature databases, and the integrated management of biomolecular databases, are also discussed. PMID:12544491

  8. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  9. Automation of plasma-process fultext bibliography databases. An on-line data-collection, data-mining and data-input system

    International Nuclear Information System (INIS)

    Searching for relevant data, information retrieval, data extraction and data input are time- and resource-consuming activities in most data centers. Here we develop a Linux system automating the process in case of bibliography, abstract and fulltext databases. The present system is an open-source free-software low-cost solution that connects the target and provider databases in cyberspace through various web publishing formats. The abstract/fulltext relevance assessment is interfaced to external software modules. (author)

  10. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies

    Directory of Open Access Journals (Sweden)

    Rollins Derrick K

    2010-12-01

    Full Text Available Abstract Background Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR and statistical power (SP which is the ability to correctly identify important genes. Results This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i exposing E. coli cells to two different ethanol levels; (ii application of myostatin to two groups of mice; and (iii a simulated data study derived from the properties of (ii. The proposed method (PM effectively identified critical genes in these studies based on comparison with the current method (CM. The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. Conclusions PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

  11. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems.

    Directory of Open Access Journals (Sweden)

    Sophie S Abby

    Full Text Available Biologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose.Macromolecular System Finder (MacSyFinder provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate "Cas-finder" using publicly available protein profiles.MacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher. It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The "Cas-finder" (models and HMM profiles is distributed as a compressed tarball archive as Supporting Information.

  12. High-quality draft genome sequence of Kocuria marina SO9-6, an actinobacterium isolated from a copper mine

    Directory of Open Access Journals (Sweden)

    Daniel B.A. Castro

    2015-09-01

    Full Text Available An actinobacterial strain, designated SO9-6, was isolated from a copper iron sulfide mineral. The organism is Gram-positive, facultatively anaerobic, and coccoid. Chemotaxonomic and phylogenetic properties were consistent with its classification in the genus Kocuria. Here, we report the first draft genome sequence of Kocuria marina SO9-6 under accession JROM00000000 (http://www.ncbi.nlm.nih.gov/nuccore/725823918, which provides insights for heavy metal bioremediation and production of compounds of biotechnological interest.

  13. Data, text and web mining for business intelligence: a survey

    OpenAIRE

    Abdul-Aziz Rashid Al-Azmi

    2013-01-01

    The Information and Communication Technologies revolution brought a digital world with huge amounts of data available. Enterprises use mining technologies to search vast amounts of data for vital insight and knowledge. Mining tools such as data mining, text mining, and web mining are used to find hidden knowledge in large databases or the Internet. Mining tools are automated software tools used to achieve business intelligence by finding hidden relations, and predicting future eve...

  14. Mining the genome for susceptibility to diabetic nephropathy: the role of large-scale studies and consortia.

    Science.gov (United States)

    Iyengar, Sudha K; Freedman, Barry I; Sedor, John R

    2007-03-01

    Approximately 30% of individuals with type 1 and type 2 diabetes develop persistent albuminuria, lose renal function, and are at increased risk for cardiovascular and other microvascular complications. Diabetes and kidney diseases rank within the top 10 causes of death in Westernized countries and cause significant morbidity. Given these observations, genetic, genomic, and proteomic investigations have been initiated to better define basic mechanisms for disease initiation and progression, to identify individuals at risk for diabetic complications, and to develop more efficacious therapies. In this review we have focused on linkage analyses of candidate genes or chromosomal regions, or coarse genome-wide scans, which have mapped either categorical (chronic kidney disease or end-stage renal disease) or quantitative kidney traits (albuminuria/proteinuria or glomerular filtration rate). Most loci identified to date have not been replicated, however, several linked chromosomal regions are concordant between independent samples, suggesting the presence of a diabetic nephropathy gene. Two genes, carnosinase (CNDP1) on 18q, and engulfment and cell motility 1 (ELMO1) on 7p14, have been identified as diabetic nephropathy susceptibility genes, but these results require authentication. The availability of patient data sets with large sample sizes, improvements in informatics, genotyping technology, and statistical methodologies should accelerate the discovery of valid diabetic nephropathy susceptibility genes. PMID:17418689

  15. Distributed Framework for Data Mining As a Service on Private Cloud

    OpenAIRE

    Shraddha Masih; Sanjay Tanwani

    2014-01-01

    Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data. Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is not optimal when the data is distributed over the net...

  16. CSIR: Mining Technology annual review 1996/97

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    CSIR: Mining Technology works in close collaboration and strategic partnership with the mining industry, government institutions and employee organizations by acquiring, developing and transferring technologies to improve the safety and health of their employees, and to improve the profitability of the mining industry. The annual report describes achievements over the year in the areas of: rock engineering (including rockburst control, mine layout, stope and gully support, coal mining); environmental safety and health on topics such as occupational hygiene services, methane explosions, blasting techniques; and mining systems (orebody information, hydraulic transport mine mechanization, engineering design and automation, mine services). A list of Mining Technology`s 1996/97 publications is given.

  17. Biosynthesis of Antibiotic Leucinostatins in Bio-control Fungus Purpureocillium lilacinum and Their Inhibition on Phytophthora Revealed by Genome Mining

    Science.gov (United States)

    Li, Erfeng; Mao, Zhenchuan; Ling, Jian; Yang, Yuhong; Yin, Wen-Bing; Xie, Bingyan

    2016-01-01

    Purpureocillium lilacinum of Ophiocordycipitaceae is one of the most promising and commercialized agents for controlling plant parasitic nematodes, as well as other insects and plant pathogens. However, how the fungus functions at the molecular level remains unknown. Here, we sequenced two isolates (PLBJ-1 and PLFJ-1) of P. lilacinum from different places Beijing and Fujian. Genomic analysis showed high synteny of the two isolates, and the phylogenetic analysis indicated they were most related to the insect pathogen Tolypocladium inflatum. A comparison with other species revealed that this fungus was enriched in carbohydrate-active enzymes (CAZymes), proteases and pathogenesis related genes. Whole genome search revealed a rich repertoire of secondary metabolites (SMs) encoding genes. The non-ribosomal peptide synthetase LcsA, which is comprised of ten C-A-PCP modules, was identified as the core biosynthetic gene of lipopeptide leucinostatins, which was specific to P. lilacinum and T. ophioglossoides, as confirmed by phylogenetic analysis. Furthermore, gene expression level was analyzed when PLBJ-1 was grown in leucinostatin-inducing and non-inducing medium, and 20 genes involved in the biosynthesis of leucionostatins were identified. Disruption mutants allowed us to propose a putative biosynthetic pathway of leucinostatin A. Moreover, overexpression of the transcription factor lcsF increased the production (1.5-fold) of leucinostatins A and B compared to wild type. Bioassays explored a new bioactivity of leucinostatins and P. lilacinum: inhibiting the growth of Phytophthora infestans and P. capsici. These results contribute to our understanding of the biosynthetic mechanism of leucinostatins and may allow us to utilize P. lilacinum better as bio-control agent. PMID:27416025

  18. iSubgraph: integrative genomics for subgroup discovery in hepatocellular carcinoma using graph mining and mixture models.

    Directory of Open Access Journals (Sweden)

    Bahadir Ozdemir

    Full Text Available The high tumor heterogeneity makes it very challenging to identify key tumorigenic pathways as therapeutic targets. The integration of multiple omics data is a promising approach to identify driving regulatory networks in patient subgroups. Here, we propose a novel conceptual framework to discover patterns of miRNA-gene networks, observed frequently up- or down-regulated in a group of patients and to use such networks for patient stratification in hepatocellular carcinoma (HCC. We developed an integrative subgraph mining approach, called iSubgraph, and identified altered regulatory networks frequently observed in HCC patients. The miRNA and gene expression profiles were jointly analyzed in a graph structure. We defined a method to transform microarray data into graph representation that encodes miRNA and gene expression levels and the interactions between them as well. The iSubgraph algorithm was capable to detect cooperative regulation of miRNAs and genes even if it occurred only in some patients. Next, the miRNA-mRNA modules were used in an unsupervised class prediction model to discover HCC subgroups via patient clustering by mixture models. The robustness analysis of the mixture model showed that the class predictions are highly stable. Moreover, the Kaplan-Meier survival analysis revealed that the HCC subgroups identified by the algorithm have different survival characteristics. The pathway analyses of the miRNA-mRNA co-modules identified by the algorithm demonstrate key roles of Myc, E2F1, let-7, TGFB1, TNF and EGFR in HCC subgroups. Thus, our method can integrate various omics data derived from different platforms and with different dynamic scales to better define molecular tumor subtypes. iSubgraph is available as MATLAB code at http://www.cs.umd.edu/~ozdemir/isubgraph/.

  19. Home Automation

    OpenAIRE

    Ahmed, Zeeshan

    2010-01-01

    In this paper I briefly discuss the importance of home automation system. Going in to the details I briefly present a real time designed and implemented software and hardware oriented house automation research project, capable of automating house's electricity and providing a security system to detect the presence of unexpected behavior.

  20. Carotenoid metabolic profiling and transcriptome-genome mining reveal functional equivalence among blue-pigmented copepods and appendicularia

    KAUST Repository

    Mojib, Nazia

    2014-06-01

    The tropical oligotrophic oceanic areas are characterized by high water transparency and annual solar radiation. Under these conditions, a large number of phylogenetically diverse mesozooplankton species living in the surface waters (neuston) are found to be blue pigmented. In the present study, we focused on understanding the metabolic and genetic basis of the observed blue phenotype functional equivalence between the blue-pigmented organisms from the phylum Arthropoda, subclass Copepoda (Acartia fossae) and the phylum Chordata, class Appendicularia (Oikopleura dioica) in the Red Sea. Previous studies have shown that carotenoid–protein complexes are responsible for blue coloration in crustaceans. Therefore, we performed carotenoid metabolic profiling using both targeted and nontargeted (high-resolution mass spectrometry) approaches in four different blue-pigmented genera of copepods and one blue-pigmented species of appendicularia. Astaxanthin was found to be the principal carotenoid in all the species. The pathway analysis showed that all the species can synthesize astaxanthin from β-carotene, ingested from dietary sources, via 3-hydroxyechinenone, canthaxanthin, zeaxanthin, adonirubin or adonixanthin. Further, using de novo assembled transcriptome of blue A. fossae (subclass Copepoda), we identified highly expressed homologous β-carotene hydroxylase enzymes and putative carotenoid-binding proteins responsible for astaxanthin formation and the blue phenotype. In blue O. dioica (class Appendicularia), corresponding putative genes were identified from the reference genome. Collectively, our data provide molecular evidences for the bioconversion and accumulation of blue astaxanthin–protein complexes underpinning the observed ecological functional equivalence and adaptive convergence among neustonic mesozooplankton.

  1. Genome mining of astaxanthin biosynthetic genes from Sphingomonas sp. ATCC 55669 for heterologous overproduction in Escherichia coli.

    Science.gov (United States)

    Ma, Tian; Zhou, Yuanjie; Li, Xiaowei; Zhu, Fayin; Cheng, Yongbo; Liu, Yi; Deng, Zixin; Liu, Tiangang

    2016-02-01

    As a highly valued keto-carotenoid, astaxanthin is widely used in nutritional supplements and pharmaceuticals. Therefore, the demand for biosynthetic astaxanthin and improved efficiency of astaxanthin biosynthesis has driven the investigation of metabolic engineering of native astaxanthin producers and heterologous hosts. However, microbial resources for astaxanthin are limited. In this study, we found that the α-Proteobacterium Sphingomonas sp. ATCC 55669 could produce astaxanthin naturally. We used whole-genome sequencing to identify the astaxanthin biosynthetic pathway using a combined PacBio-Illumina approach. The putative astaxanthin biosynthetic pathway in Sphingomonas sp. ATCC 55669 was predicted. For further confirmation, a high-efficiency targeted engineering carotenoid synthesis platform was constructed in E. coli for identifying the functional roles of candidate genes. All genes involved in astaxanthin biosynthesis showed discrete distributions on the chromosome. Moreover, the overexpression of exogenous E. coli idi in Sphingomonas sp. ATCC 55669 increased astaxanthin production by 5.4-fold. This study described a new astaxanthin producer and provided more biosynthesis components for bioengineering of astaxanthin in the future. PMID:26580858

  2. Mineralogical characterization of mine waste

    International Nuclear Information System (INIS)

    Highlights: • Mineral characterization in mine waste is critical for prediction of metal leaching. • Analytical methods and examples are reviewed, including advanced techniques. • Quantitative mineralogy is described as a promising new method. • Example shows that understanding secondary arsenic minerals improves risk assessment. - Abstract: The application of mineralogical characterization to mine waste has the potential to improve risk assessment, guide appropriate mine planning for planned and active mines and optimize remediation design at closed or abandoned mines. Characterization of minerals, especially sulphide and carbonate phases, is particularly important for predicting the potential for acidic drainage and metal(loid) leaching. Another valuable outcome from mineralogical studies of mine waste is an understanding of the stability of reactive and metal(loid)-bearing minerals under various redox conditions. This paper reviews analytical methods that have been used to study mine waste mineralogy, including conventional methods such as X-ray diffraction and scanning electron microscopy, and advanced methods such as synchrotron-based microanalysis and automated mineralogy. We recommend direct collaboration between researchers and mining companies to choose the optimal mineralogical techniques to solve complex problems, to co-publish the results, and to ensure that mineralogical knowledge is used to inform mine waste management at all stages of the mining life cycle. A case study of arsenic-bearing gold mine tailings from Nova Scotia is presented to demonstrate the application of mineralogical techniques to improve human health risk assessment and the long-term management of historical mine wastes

  3. Two non-synonymous markers in PTPN21, identified by genome-wide association study data-mining and replication, are associated with schizophrenia.

    LENUS (Irish Health Repository)

    Chen, Jingchun

    2011-09-01

    We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r(2)>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10 × 10(-3) and rs2274736, P=1.21 × 10(-3)). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86-0.97, P=5.45 × 10(-3) and rs2401751, OR=0.92, 95% CI: 0.86-0.97, P=5.29 × 10(-3)). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR=1.08, 95% CI: 1.02-1.14, P=6.43 × 10(-3)). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736\\/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.

  4. Herbarium genomics

    DEFF Research Database (Denmark)

    Bakker, Freek T.; Lei, Di; Yu, Jiaying;

    2016-01-01

    Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...

  5. Genome-wide analysis of the rice and arabidopsis non-specific lipid transfer protein (nsLtp) gene families and identification of wheat nsLtp genes by EST data mining

    OpenAIRE

    Chantret Nathalie; Boutrot Freddy; Gautier Marie-Françoise

    2008-01-01

    Abstract Background Plant non-specific lipid transfer proteins (nsLTPs) are encoded by multigene families and possess physiological functions that remain unclear. Our objective was to characterize the complete nsLtp gene family in rice and arabidopsis and to perform wheat EST database mining for nsLtp gene discovery. Results In this study, we carried out a genome-wide analysis of nsLtp gene families in Oryza sativa and Arabidopsis thaliana and identified 52 rice nsLtp genes and 49 arabidopsis...

  6. Genome-wide analysis of the rice and arabidopsis non-specific lipid transfer protein (nsLtp) gene families and identification of wheat nsLtp genes by EST data mining

    OpenAIRE

    Boutrot, Freddy; Chantret, Nathalie; Gautier, Marie Francoise

    2008-01-01

    Plant non-specific lipid transfer proteins (nsLTPs) are encoded by multigene families and possess physiological functions that remain unclear. Our objective was to characterize the complete nsLtp gene family in rice and arabidopsis and to perform wheat EST database mining for nsLtp gene discovery.b ResultsIn this study, we carried out a genome-wide analysis of nsLtp gene families in Oryza sativa and Arabidopsis thaliana and identified 52 rice nsLtp genes and 49 arabidopsis nsLtp genes. Here w...

  7. Library Automation

    OpenAIRE

    Dhakne, B. N.; Giri, V. V; Waghmode, S. S.

    2010-01-01

    New technologies library provides several new materials, media and mode of storing and communicating the information. Library Automation reduces the drudgery of repeated manual efforts in library routine. By use of library automation collection, Storage, Administration, Processing, Preservation and communication etc.

  8. Process automation

    International Nuclear Information System (INIS)

    Process automation technology has been pursued in the chemical processing industries and to a very limited extent in nuclear fuel reprocessing. Its effective use has been restricted in the past by the lack of diverse and reliable process instrumentation and the unavailability of sophisticated software designed for process control. The Integrated Equipment Test (IET) facility was developed by the Consolidated Fuel Reprocessing Program (CFRP) in part to demonstrate new concepts for control of advanced nuclear fuel reprocessing plants. A demonstration of fuel reprocessing equipment automation using advanced instrumentation and a modern, microprocessor-based control system is nearing completion in the facility. This facility provides for the synergistic testing of all chemical process features of a prototypical fuel reprocessing plant that can be attained with unirradiated uranium-bearing feed materials. The unique equipment and mission of the IET facility make it an ideal test bed for automation studies. This effort will provide for the demonstration of the plant automation concept and for the development of techniques for similar applications in a full-scale plant. A set of preliminary recommendations for implementing process automation has been compiled. Some of these concepts are not generally recognized or accepted. The automation work now under way in the IET facility should be useful to others in helping avoid costly mistakes because of the underutilization or misapplication of process automation. 6 figs

  9. Introduction to Space Resource Mining

    Science.gov (United States)

    Mueller, Robert P.

    2013-01-01

    There are vast amounts of resources in the solar system that will be useful to humans in space and possibly on Earth. None of these resources can be exploited without the first necessary step of extra-terrestrial mining. The necessary technologies for tele-robotic and autonomous mining have not matured sufficiently yet. The current state of technology was assessed for terrestrial and extraterrestrial mining and a taxonomy of robotic space mining mechanisms was presented which was based on current existing prototypes. Terrestrial and extra-terrestrial mining methods and technologies are on the cusp of massive changes towards automation and autonomy for economic and safety reasons. It is highly likely that these industries will benefit from mutual cooperation and technology transfer.

  10. Longwall mining

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1995-03-14

    As part of EIA`s program to provide information on coal, this report, Longwall-Mining, describes longwall mining and compares it with other underground mining methods. Using data from EIA and private sector surveys, the report describes major changes in the geologic, technological, and operating characteristics of longwall mining over the past decade. Most important, the report shows how these changes led to dramatic improvements in longwall mining productivity. For readers interested in the history of longwall mining and greater detail on recent developments affecting longwall mining, the report includes a bibliography.

  11. Data mining process automatization of air pollution data by the LISp-Miner system

    OpenAIRE

    Ochodnická, Zuzana

    2014-01-01

    This thesis is focused on the area of automated data mining. The aim of this thesis is a description of the area of automated data mining, creation of a design of an automated data mining tasks creation process for verification of set domain knowledge and new knowledge search, and also an implementation of verification of set domain knowledge of attribute dependency type influence with search space adjustments. The implementation language is the LMCL language that enables usage of the LISp-Mi...

  12. An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database

    International Nuclear Information System (INIS)

    Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including 1H, 13C and 15N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001–2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies.

  13. NVESD mine lane facility

    Science.gov (United States)

    Habersat, James D.; Marshall, Christopher; Maksymonko, George

    2003-09-01

    The NVESD Mine Lane Facility has recently undergone an extensive renovation. It now consists of an indoor, dry lane portion, a greenhouse portion with moisture-controlled lanes, a control room, and two outdoor lanes. The indoor structure contains six mine lanes, each approximately 2.5m (width) × 1.2m (depth) × 33m(length). These lanes contain six different soil types: magnetite/sand, silt, crusher run gravel (bluestone gravel), bank run gravel (tan gravel), red clay, and white sand. An automated trolley system is used for mounting the various mine detection systems and sensors under test. Data acquisition and data logging is fully automated. The greenhouse structure was added to provide moisture controlled lanes for measuring the effect of moisture on sensor effectiveness. A gantry type crane was installed to permit remotely controlled positioning of a sensor package over any portion of the greenhouse lanes at elevations from ground level up to 5m without shadowing the target area. The roof of the greenhouse is motorized, and can be rolled back to allow full solar loading. A control room overlooking the lanes is complete with recording and monitoring devices and contains controls to operate the trolleys. A facility overview is presented and typical results from recent data collection exercises are presented.

  14. Data Mining.

    Science.gov (United States)

    Benoit, Gerald

    2002-01-01

    Discusses data mining (DM) and knowledge discovery in databases (KDD), taking the view that KDD is the larger view of the entire process, with DM emphasizing the cleaning, warehousing, mining, and visualization of knowledge discovery in databases. Highlights include algorithms; users; the Internet; text mining; and information extraction.…

  15. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  16. Process Mining Versus Intention Mining

    OpenAIRE

    Khodabandelou, Ghazaleh; Hug, Charlotte; Deneckere, Rebecca; Salinesi, Camille

    2013-01-01

    Process mining aims to discover, enhance or check the conformance of activity oriented process models from event logs. A new field of research, called intention mining, recently emerged. This new field has the same objectives than process mining but specifically addresses intentional process models. This paper aims to highlight the differences between these two fields of research and illustrates the use of mining techniques on a dataset of event logs, to discover an activity process model as ...

  17. Automation Security

    OpenAIRE

    Mirzoev, Dr. Timur

    2014-01-01

    Web-based Automated Process Control systems are a new type of applications that use the Internet to control industrial processes with the access to the real-time data. Supervisory control and data acquisition (SCADA) networks contain computers and applications that perform key functions in providing essential services and commodities (e.g., electricity, natural gas, gasoline, water, waste treatment, transportation) to all Americans. As such, they are part of the nation s critical infrastructu...

  18. Data, Text and Web Mining for Business Intelligence : A Survey

    Directory of Open Access Journals (Sweden)

    Abdul-Aziz Rashid Al-Azmi

    2013-04-01

    Full Text Available The Information and Communication Technologies revolution brought a digital world with huge amountsof data available. Enterprises use mining technologies to search vast amounts of data for vital insight andknowledge. Mining tools such as data mining, text mining, and web mining are used to find hiddenknowledge in large databases or the Internet. Mining tools are automated software tools used to achievebusiness intelligence by finding hidden relations,and predicting future events from vast amounts of data.This uncovered knowledge helps in gaining completive advantages, better customers’ relationships, andeven fraud detection. In this survey, we’ll describe how these techniques work, how they are implemented.Furthermore, we shall discuss how business intelligence is achieved using these mining tools. Then lookinto some case studies of success stories using mining tools. Finally, we shall demonstrate some of the mainchallenges to the mining technologies that limit their potential.

  19. DATA, TEXT, AND WEB MINING FOR BUSINESS INTELLIGENCE: A SURVEY

    Directory of Open Access Journals (Sweden)

    Abdul-Aziz Rashid

    2013-03-01

    Full Text Available The Information and Communication Technologies revolution brought a digital world with huge amounts of data available. Enterprises use mining technologies to search vast amounts of data for vital insight and knowledge. Mining tools such as data mining, text mining, and web mining are used to find hidden knowledge in large databases or the Internet. Mining tools are automated software tools used to achieve business intelligence by finding hidden relations, and predicting future events from vast amounts of data. This uncovered knowledge helps in gaining completive advantages, better customers’ relationships, and even fraud detection. In this survey, we’ll describe how these techniques work, how they are implemented. Furthermore, we shall discuss how business intelligence is achieved using these mining tools. Then look into some case studies of success stories using mining tools. Finally, we shall demonstrate some of the main challenges to the mining technologies that limit their potential.

  20. Maintainability Analysis of Underground Mining Equipment Using Genetic Algorithms: Case Studies with an LHD Vehicle

    OpenAIRE

    Sihong Peng; Nick Vayenas

    2014-01-01

    While increased mine mechanization and automation make considerable contributions to mine productivity, unexpected equipment failures and planned or routine maintenance prohibit the maximum possible utilization of sophisticated mining equipment and require a significant amount of extra capital investment. This paper deals with aspects of maintainability prediction for mining machinery. A PC software called GenRel was developed for this purpose. In GenRel, it is assumed that failures of mining...

  1. Graph mining

    OpenAIRE

    Ramon, Jan

    2013-01-01

    Graph mining is the study of how to perform data mining and machine learning on data represented with graphs. One can distinguish between, on the one hand, transactional graph mining, where a database of separate, independent graphs is considered (such as databases of molecules and databases of images), and, on the other hand, large network analysis, where a single large network is considered (such as chemical interaction networks and concept networks).

  2. Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    OpenAIRE

    Knaus William A; Siadaty Mir S

    2006-01-01

    Abstract Background Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acqui...

  3. Text mining for the biocuration workflow

    OpenAIRE

    Hirschman, L.; Burns, G. A. P. C.; Krallinger, M.; Arighi, C.; Cohen, K. B.; Valencia, A.; Wu, C H; Chatr-aryamontri, A; Dowell, K. G.; Huala, E; Lourenco, A.; Nash, R; Veuthey, A.-L.; Wiegers, T.; Winter, A. G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations too...

  4. Text Mining of Supreme Administrative Court Jurisdictions

    OpenAIRE

    Feinerer , Ingo; Hornik, Kurt

    2007-01-01

    Within the last decade text mining, i.e., extracting sensitive information from text corpora, has become a major factor in business intelligence. The automated textual analysis of law corpora is highly valuable because of its impact on a company's legal options and the raw amount of available jurisdiction. The study of supreme court jurisdiction and international law corpora is equally important due to its effects on business sectors. In this paper we use text mining methods to investigate Au...

  5. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  6. Mining`s global tomorrow

    Energy Technology Data Exchange (ETDEWEB)

    Hobbs, B.; Grimstone, L.; MacBeth, A.

    1995-05-01

    Consists of an edited extract of the chapter `Mining Today, Australia`s Tomorrow - Exploration and Mining Globally to 2020` from the CSIRO book. `Challenge to change: Australia in 2020`. Forecasts the state of the Australian mineral industry in 2020 covering aspects such as: exploration concepts and area selection; excavation design and engineering; developments in mining technology and equipment; environmental management; and minerals processing. Presents an optimistic view of the long-term future of the Australian mining industry. Predicts that by 2020 Australia will emerge as the leader of an Austral/Asian trading bloc incorporating India, SE Asia, Pacific islands, Australia and New Zealand.

  7. Uranium mining

    International Nuclear Information System (INIS)

    Full text: The economic and environmental sustainability of uranium mining has been analysed by Monash University researcher Dr Gavin Mudd in a paper that challenges the perception that uranium mining is an 'infinite quality source' that provides solutions to the world's demand for energy. Dr Mudd says information on the uranium industry touted by politicians and mining companies is not necessarily inaccurate, but it does not tell the whole story, being often just an average snapshot of the costs of uranium mining today without reflecting the escalating costs associated with the process in years to come. 'From a sustainability perspective, it is critical to evaluate accurately the true lifecycle costs of all forms of electricity production, especially with respect to greenhouse emissions, ' he says. 'For nuclear power, a significant proportion of greenhouse emissions are derived from the fuel supply, including uranium mining, milling, enrichment and fuel manufacture.' Dr Mudd found that financial and environmental costs escalate dramatically as the uranium ore is used. The deeper the mining process required to extract the ore, the higher the cost for mining companies, the greater the impact on the environment and the more resources needed to obtain the product. It is clear that there is a strong sensitivity of energy and water consumption and greenhouse emissions to ore grade, and that ore grades are likely to continue to decline gradually in the medium to long term. These issues are critical to the current debate over nuclear power and greenhouse emissions, especially with respect to ascribing sustainability to such activities as uranium mining and milling. For example, mining at Roxby Downs is responsible for the emission of over one million tonnes of greenhouse gases per year and this could increase to four million tonnes if the mine is expanded.'

  8. INTEGRATING DATA MINING INTO BUSINESS INTELLIGENCE

    Directory of Open Access Journals (Sweden)

    Maria Cristina ENACHE

    2006-01-01

    Full Text Available Data Mining is a broad term often used to describe the process of using database technology, modeling techniques, statistical analysis, and machine learning to analyze large amounts of data in an automated fashion to discover hidden patterns and predictive information in the data. By building highly complex and sophisticated statistical and mathematical models, organizations can gain new insight into their activities. The purpose of this document is to provide users with a background of a few key data mining concepts and business intelligence and about benefits of integrating business intelligence and data mining.

  9. Discussion of Minos Mine operating system

    Energy Technology Data Exchange (ETDEWEB)

    Pan, B.

    1991-10-01

    The MINOS (mine operating system), which is used in the majority of British collieries, provides central control at the surface for the machinery and environmental equipment distributed throughout the mine. Installed equipment, including face machinery, conveyors, pumps, fans and sensors are connected to local outstations which all communicate with the control system via a single run of signal cable. The article discusses the system particularly its use in the Automated Control System of Underground Mining Locomotives (ACSUML). The discussion includes the use of MINOS to improve wagon identification, the operating principle of ACSUML and the possibilities of a driverless locomotive. 2 figs.

  10. Data mining in healthcare: decision making and precision

    OpenAIRE

    Ionuţ ŢĂRANU

    2016-01-01

    The trend of application of data mining in healthcare today is increased because the health sector is rich with information and data mining has become a necessity. Healthcare organizations generate and collect large volumes of information to a daily basis. Use of information technology enables automation of data mining and knowledge that help bring some interesting patterns which means eliminating manual tasks and easy data extraction directly from electronic records, electronic transfer syst...

  11. Self-configuring data mining for ubiquitous computing

    OpenAIRE

    Çaycı, Ayşegül; Cayci, Aysegul

    2013-01-01

    Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate ...

  12. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  13. Automated Budget System

    Data.gov (United States)

    Department of Transportation — The Automated Budget System (ABS) automates management and planning of the Mike Monroney Aeronautical Center (MMAC) budget by providing enhanced capability to plan,...

  14. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome.

    Science.gov (United States)

    Elsik, Christine G; Unni, Deepak R; Diesh, Colin M; Tayal, Aditi; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies. PMID:26481361

  15. High safety in the mining industry

    Energy Technology Data Exchange (ETDEWEB)

    1987-08-01

    Presents an interview in question and answer format with the deputy chairman of Gosgortekhnadzor (Committee for Supervision of Industrial Work Safety and Mining Supervision) in which he discusses two recent fatal accidents in the Yasinovskaya-Glubokaya and Chaikino coal mines and identifies areas where safety needs to be improved (more automation, protective devices, ventilation etc.). Discusses the particular problems involved with deep mining (20% of mines are now deeper than 700 m and 27 mines are deeper than 1000 m), such as fires, dust, methane, rock falls, insufficient maintenance and strata control and poor ventilation. Confirms that a large number of accidents is due to poor organization and stresses the fact the coal industry must be subjected to perestroika (restructuring) as much as other areas of society.

  16. FROM DATA MINING TO BEHAVIOR MINING

    OpenAIRE

    ZHENGXIN CHEN

    2006-01-01

    Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rathe...

  17. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  18. Beegle: from literature mining to disease-gene discovery.

    Science.gov (United States)

    ElShal, Sarah; Tranchevent, Léon-Charles; Sifrim, Alejandro; Ardeshirdavani, Amin; Davis, Jesse; Moreau, Yves

    2016-01-29

    Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/. PMID:26384564

  19. Automated document analysis system

    Science.gov (United States)

    Black, Jeffrey D.; Dietzel, Robert; Hartnett, David

    2002-08-01

    A software application has been developed to aid law enforcement and government intelligence gathering organizations in the translation and analysis of foreign language documents with potential intelligence content. The Automated Document Analysis System (ADAS) provides the capability to search (data or text mine) documents in English and the most commonly encountered foreign languages, including Arabic. Hardcopy documents are scanned by a high-speed scanner and are optical character recognized (OCR). Documents obtained in an electronic format bypass the OCR and are copied directly to a working directory. For translation and analysis, the script and the language of the documents are first determined. If the document is not in English, the document is machine translated to English. The documents are searched for keywords and key features in either the native language or translated English. The user can quickly review the document to determine if it has any intelligence content and whether detailed, verbatim human translation is required. The documents and document content are cataloged for potential future analysis. The system allows non-linguists to evaluate foreign language documents and allows for the quick analysis of a large quantity of documents. All document processing can be performed manually or automatically on a single document or a batch of documents.

  20. Automated Event Service: Efficient and Flexible Searching for Earth Science Phenomena Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Develop an Automated Event Service system that: Methodically mines custom-defined events in the reanalysis data sets of global atmospheric models. Enables...

  1. Manufacturing and automation

    OpenAIRE

    Ernesto Córdoba Nieto

    2010-01-01

    The article presents concepts and definitions from different sources concerning automation. The work approaches automation by virtue of the author’s experience in manufacturing production; why and how automation prolects are embarked upon is considered. Technological reflection regarding the progressive advances or stages of automation in the production area is stressed. Coriat and Freyssenet’s thoughts about and approaches to the problem of automation and its current state are taken and e...

  2. Genome mining in Sorangium cellulosum So ce56: identification and characterization of the homologous electron transfer proteins of a myxobacterial cytochrome P450.

    Science.gov (United States)

    Ewen, Kerstin Maria; Hannemann, Frank; Khatri, Yogan; Perlova, Olena; Kappl, Reinhard; Krug, Daniel; Hüttermann, Jürgen; Müller, Rolf; Bernhardt, Rita

    2009-10-16

    Myxobacteria, especially members of the genus Sorangium, are known for their biotechnological potential as producers of pharmaceutically valuable secondary metabolites. The biosynthesis of several of those myxobacterial compounds includes cytochrome P450 activity. Although class I cytochrome P450 enzymes occur wide-spread in bacteria and rely on ferredoxins and ferredoxin reductases as essential electron mediators, the study of these proteins is often neglected. Therefore, we decided to search in the Sorangium cellulosum So ce56 genome for putative interaction partners of cytochromes P450. In this work we report the investigation of eight myxobacterial ferredoxins and two ferredoxin reductases with respect to their activity in cytochrome P450 systems. Intriguingly, we found not only one, but two ferredoxins whose ability to sustain an endogenous So ce56 cytochrome P450 was demonstrated by CYP260A1-dependent conversion of nootkatone. Moreover, we could demonstrate that the two ferredoxins were able to receive electrons from both ferredoxin reductases. These findings indicate that S. cellulosum can alternate between different electron transport pathways to sustain cytochrome P450 activity. PMID:19696019

  3. A genome-wide survey of maize lipid-related genes: candidate genes mining,digital gene expression profiling and colocation with QTL for maize kernel oil

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Lipids play an important role in plants due to their abundance and their extensive participation in many metabolic processes.Genes involved in lipid metabolism have been extensively studied in Arabidopsis and other plant species.In this study,a total of 1003 maize lipid-related genes were cloned and annotated,including 42 genes with experimental validation,732 genes with full-length cDNA and protein sequences in public databases and 229 newly cloned genes.Ninety-seven maize lipid-related genes with tissue-preferential expression were discovered by in silico gene expression profiling based on 1984483 maize Expressed Sequence Tags collected from 182 cDNA libraries.Meanwhile,70 QTL clusters for maize kernel oil were identified,covering 34.5% of the maize genome.Fifty-nine (84%) QTL clusters co-located with at least one lipid-related gene,and the total number of these genes amounted to 147.Interestingly,thirteen genes with kernel-preferential expression profiles fell within QTL clusters for maize kernel oil content.All the maize lipid-related genes identified here may provide good targets for maize kernel oil QTL cloning and thus help us to better understand the molecular mechanism of maize kernel oil accumulation.

  4. A Comparative Study on Serial and Parallel Web Content Mining

    Directory of Open Access Journals (Sweden)

    Binayak Panda

    2016-03-01

    Full Text Available World Wide Web (WWW is such a repository which serves every individuals need starting with the context of education to entertainment etc. But from users point of view getting relevant information with respect to one particular context is time consuming and also not so easy. It is because of the volume of data which is unstructured, distributed and dynamic in nature. There can be automation to extract relevant information with respect to one particular context, which is named as Web Content Mining. The efficiency of automation depends on validity of expected outcome as well as amount of processing time. The acceptability of outcome depends on user or user’s policy. But the amount of processing time depends on the methodology of Web Content Mining. In this work a study has been carried out between Serial Web Content Mining and Parallel Web Content Mining. This work also focuses on the frame work of implementation of parallelism in Web Content Mining.

  5. ONTOLOGY BASED DATA MINING METHODOLOGY FOR DISCRIMINATION PREVENTION

    Directory of Open Access Journals (Sweden)

    Nandana Nagabhushana

    2014-09-01

    Full Text Available Data Mining is being increasingly used in the field of automation of decision making processes, which involve extraction and discovery of information hidden in large volumes of collected data. Nonetheless, there are negative perceptions like privacy invasion and potential discrimination which contribute as hindrances to the use of data mining methodologies in software systems employing automated decision making. Loan granting, Employment, Insurance Premium calculation, Admissions in Educational Institutions etc., can make use of data mining to effectively prevent human biases pertaining to certain attributes like gender, nationality, race etc. in critical decision making. The proposed methodology prevents discriminatory rules ensuing due to the presence of certain information regarding sensitive discriminatory attributes in the data itself. Two aspects of novelty in the proposal are, first, the rule mining technique based on ontologies and the second, concerning generalization and transformation of the mined rules that are quantized as discriminatory, into non-discriminatory ones.

  6. Coastal mining

    Science.gov (United States)

    Bell, Peter M.

    The Exclusive Economic Zone (EEZ) declared by President Reagan in March 1983 has met with a mixed response from those who would benefit from a guaranteed, 200-nautical-mile (370-km) protected underwater mining zone off the coasts of the United States and its possessions. On the one hand, the U.S. Department of the Interior is looking ahead and has been very successful in safeguarding important natural resources that will be needed in the coming decades. On the other hand, the mining industry is faced with a depressed metals and mining market.A report of the Exclusive Economic Zone Symposium held in November 1983 by the U.S. Geological Survey, the Mineral Management Service, and the Bureau of Mines described the mixed response as: “ … The Department of Interior … raring to go into promotion of deep-seal mining but industrial consortia being very pessimistic about the program, at least for the next 30 or so years.” (Chemical & Engineering News, February 5, 1983).

  7. Coal Mines, Active - Longwall Mining Panels

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — Coal mining has occurred in Pennsylvania for over a century. A method of coal mining known as Longwall Mining has become more prevalent in recent decades. Longwall...

  8. In silico genome wide mining of conserved and novel miRNAs in the brain and pineal gland of Danio rerio using small RNA sequencing data.

    Science.gov (United States)

    Agarwal, Suyash; Nagpure, Naresh Sahebrao; Srivastava, Prachi; Kushwaha, Basdeo; Kumar, Ravindra; Pandey, Manmohan; Srivastava, Shreya

    2016-03-01

    MicroRNAs (miRNAs) are small, non-coding RNA molecules that bind to the mRNA of the target genes and regulate the expression of the gene at the post-transcriptional level. Zebrafish is an economically important freshwater fish species globally considered as a good predictive model for studying human diseases and development. The present study focused on uncovering known as well as novel miRNAs, target prediction of the novel miRNAs and the differential expression of the known miRNA using the small RNA sequencing data of the brain and pineal gland (dark and light treatments) obtained from NCBI SRA. A total of 165, 151 and 145 known zebrafish miRNAs were found in the brain, pineal gland (dark treatment) and pineal gland (light treatment), respectively. Chromosomes 4 and 5 of zebrafish reference assembly GRCz10 were found to contain maximum number of miR genes. The miR-181a and miR-182 were found to be highly expressed in terms of number of reads in the brain and pineal gland, respectively. Other ncRNAs, such as tRNA, rRNA and snoRNA, were curated against Rfam. Using GRCz10 as reference, the subsequent bioinformatic analyses identified 25, 19 and 9 novel miRNAs from the brain, pineal gland (dark treatment) and pineal gland (light treatment), respectively. Targets of the novel miRNAs were identified, based on sequence complementarity between miRNAs and mRNA, by searching for antisense hits in the 3'-UTR of reference RNA sequences of the zebrafish. The discovery of novel miRNAs and their targets in the zebrafish genome can be a valuable scientific resource for further functional studies not only in zebrafish but also in other economically important fishes. PMID:26981358

  9. Sentinel Mining

    DEFF Research Database (Denmark)

    Middelfart, Morten

    This thesis introduces the novel concept of sentinel rules (sentinels). Sentinels are intended to represent the relationships between the data originating from the external environment and the data representing the critical organizational performance. The intention with sentinels is to warn...... into geography dimension) combined with a decrease in the money invested in customer support for laptop computers (drilldown into product dimension) is observed. The work leading to this thesis progressed from algorithms for regular sentinel mining with only one source and one target measure, into algorithms...... for mining generalized and multidimensional sentinels with multiple source measures. Furthermore, the mining algorithms became capable of automatically fitting the best warning periods for a given sentinel. Aside from expanding the capabilities of the algorithms, the work demonstrates a significant...

  10. Frontiers of biomedical text mining: current progress

    OpenAIRE

    Zweigenbaum, Pierre; Demner-Fushman, Dina; Hong YU; Cohen, Kevin B.

    2007-01-01

    It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a ...

  11. Data mining

    CERN Document Server

    Gorunescu, Florin

    2011-01-01

    The knowledge discovery process is as old as Homo sapiens. Until some time ago, this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers. Digging intelligently in different large databases, data mining aims to extract implicit, previously unknown and potentially useful information from data, since 'knowledge is power'. The goal of this book is to provide, in a friendly way

  12. Data mining and education.

    Science.gov (United States)

    Koedinger, Kenneth R; D'Mello, Sidney; McLaughlin, Elizabeth A; Pardos, Zachary A; Rosé, Carolyn P

    2015-01-01

    An emerging field of educational data mining (EDM) is building on and contributing to a wide variety of disciplines through analysis of data coming from various educational technologies. EDM researchers are addressing questions of cognition, metacognition, motivation, affect, language, social discourse, etc. using data from intelligent tutoring systems, massive open online courses, educational games and simulations, and discussion forums. The data include detailed action and timing logs of student interactions in user interfaces such as graded responses to questions or essays, steps in rich problem solving environments, games or simulations, discussion forum posts, or chat dialogs. They might also include external sensors such as eye tracking, facial expression, body movement, etc. We review how EDM has addressed the research questions that surround the psychology of learning with an emphasis on assessment, transfer of learning and model discovery, the role of affect, motivation and metacognition on learning, and analysis of language data and collaborative learning. For example, we discuss (1) how different statistical assessment methods were used in a data mining competition to improve prediction of student responses to intelligent tutor tasks, (2) how better cognitive models can be discovered from data and used to improve instruction, (3) how data-driven models of student affect can be used to focus discussion in a dialog-based tutoring system, and (4) how machine learning techniques applied to discussion data can be used to produce automated agents that support student learning as they collaborate in a chat room or a discussion board. PMID:26263424

  13. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    OpenAIRE

    Hamilton John P; Campbell Matthew; Thibaud-Nissen Françoise; Zhu Wei; Buell C

    2007-01-01

    Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging k...

  14. Manufacturing and automation

    Directory of Open Access Journals (Sweden)

    Ernesto Córdoba Nieto

    2010-04-01

    Full Text Available The article presents concepts and definitions from different sources concerning automation. The work approaches automation by virtue of the author’s experience in manufacturing production; why and how automation prolects are embarked upon is considered. Technological reflection regarding the progressive advances or stages of automation in the production area is stressed. Coriat and Freyssenet’s thoughts about and approaches to the problem of automation and its current state are taken and examined, especially that referring to the problem’s relationship with reconciling the level of automation with the flexibility and productivity demanded by competitive, worldwide manufacturing.

  15. Tono mine

    International Nuclear Information System (INIS)

    This technical report provides a comprehensive presentation of the Geoscientific studies (GS)' performed since 1986, and a new work on the Earthquake frontier research for terrestrial subsurface (EFR)' plan performed since 1995 in and around the Tono mine in Gifu prefecture. Main objects of GS in Tono area to provide sufficient informations on deep underground geological environment for its performance assessment and to develop some methods on site characterization. At present, some major studies are under progress in such fields as hydrology, hydro-geochemistry, isotope chemistry of groundwater, nuclide retardation, mine-by experiments, and development on instruments. And, EFR is divided to three categories, two of which have been performed at Tono mine under their names of the 'Development of ACROSS (Accurately controlled routinely operated signal system) for detecting microscale crustal movements' and the 'Studies of precursory and co-seismic changes in rock stress, water level and groundwater chemistry'. Here were shown on geology and geoscientific studies on Tono mine and on earthquake frontier research for terrestrial subsurface. (G.K.)

  16. Data mining in space physics: MineTool algorithm

    Science.gov (United States)

    Karimabadi, H.; Sipes, T. B.; White, H.; Marinucci, M.; Dmitriev, A.; Chao, J. K.; Driscoll, J.; Balac, N.

    2007-11-01

    A novel data mining method called MineTool is introduced which, by virtue of automating the modeling process and model evaluations, makes it more accessible to nonexperts. The technique aggregates the various stages of model building into a four-step process consisting of (1) data segmentation and sampling, (2) variable preselection and transform generation, (3) predictive model estimation and validation, and (4) final model testing. Optimal strategies are chosen for each modeling step. However, the modular design of the MineTool enables the substitution of alternative strategies in any of the four modeling steps. A notable feature of the technique is that the final model is always in closed analytical form rather than "black box" form of most other techniques. MineTool can be used for analysis of data (e.g., time series) as well as images. The utility of the technique is illustrated through several examples based on synthetic data. Application of the technique to analysis of spacecraft data will be presented in subsequent papers.

  17. Data mining concepts and techniques

    CERN Document Server

    Han, Jiawei

    2005-01-01

    Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and app...

  18. An automated swimming respirometer

    DEFF Research Database (Denmark)

    STEFFENSEN, JF; JOHANSEN, K; BUSHNELL, PG

    1984-01-01

    An automated respirometer is described that can be used for computerized respirometry of trout and sharks.......An automated respirometer is described that can be used for computerized respirometry of trout and sharks....

  19. Configuration Management Automation (CMA)

    Data.gov (United States)

    Department of Transportation — Configuration Management Automation (CMA) will provide an automated, integrated enterprise solution to support CM of FAA NAS and Non-NAS assets and investments. CMA...

  20. Workflow automation architecture standard

    Energy Technology Data Exchange (ETDEWEB)

    Moshofsky, R.P.; Rohen, W.T. [Boeing Computer Services Co., Richland, WA (United States)

    1994-11-14

    This document presents an architectural standard for application of workflow automation technology. The standard includes a functional architecture, process for developing an automated workflow system for a work group, functional and collateral specifications for workflow automation, and results of a proof of concept prototype.

  1. Literature classification for semi-automated updating of biological knowledgebases

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Winther, Ole;

    2013-01-01

    abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion: We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and...

  2. Application of Modern Tools and Techniques for Mine Safety & Disaster Management

    Science.gov (United States)

    Kumar, Dheeraj

    2016-04-01

    The implementation of novel systems and adoption of improvised equipment in mines help mining companies in two important ways: enhanced mine productivity and improved worker safety. There is a substantial need for adoption of state-of-the-art automation technologies in the mines to ensure the safety and to protect health of mine workers. With the advent of new autonomous equipment used in the mine, the inefficiencies are reduced by limiting human inconsistencies and error. The desired increase in productivity at a mine can sometimes be achieved by changing only a few simple variables. Significant developments have been made in the areas of surface and underground communication, robotics, smart sensors, tracking systems, mine gas monitoring systems and ground movements etc. Advancement in information technology in the form of internet, GIS, remote sensing, satellite communication, etc. have proved to be important tools for hazard reduction and disaster management. This paper is mainly focused on issues pertaining to mine safety and disaster management and some of the recent innovations in the mine automations that could be deployed in mines for safe mining operations and for avoiding any unforeseen mine disaster.

  3. TOP-10 DATA MINING CASE STUDIES

    OpenAIRE

    GABOR MELLI; XINDONG WU; PAUL BEINAT; FRANCESCO BONCHI; LONGBING CAO; RONG DUAN; CHRISTOS FALOUTSOS; RAYID GHANI; BRENDAN KITTS; BART GOETHALS; GEOFF MCLACHLAN; JIAN PEI; ASHOK SRIVASTAVA; OSMAR ZAÏANE

    2012-01-01

    We report on the panel discussion held at the ICDM'10 conference on the top 10 data mining case studies in order to provide a snapshot of where and how data mining techniques have made significant real-world impact. The tasks covered by 10 case studies range from the detection of anomalies such as cancer, fraud, and system failures to the optimization of organizational operations, and include the automated extraction of information from unstructured sources. From the 10 cases we find that sup...

  4. Automated simultaneous analysis phylogenetics (ASAP: an enabling tool for phlyogenomics

    Directory of Open Access Journals (Sweden)

    Lee Ernest K

    2008-02-01

    Full Text Available Abstract Background The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices. Results To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics, to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis. Conclusion Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data.

  5. Siemens' innovative role in mining technology

    Energy Technology Data Exchange (ETDEWEB)

    1990-07-01

    The growth of the mining industry in South Africa has played a decisive role in the industrial development of the country. As mining activities expanded, the need for energy production increased and as of late mining is becoming more mechanised and the need for more energy as well as automation is growing. The origins of Siemens operations in South Africa date back to the humble beginnings of the mining era, when the company provided the first generator and floodlights to illuminate the famous 'Big Hole' of the diamond mine at Kimberley as well as hydro-electric plants in 1895 on the Crocodile River and Blyde River respectively to supply the newly established mines in the Lydenburg district with electric power. 7 figs.

  6. Shoe-String Automation

    Energy Technology Data Exchange (ETDEWEB)

    Duncan, M.L.

    2001-07-30

    Faced with a downsizing organization, serious budget reductions and retirement of key metrology personnel, maintaining capabilities to provide necessary services to our customers was becoming increasingly difficult. It appeared that the only solution was to automate some of our more personnel-intensive processes; however, it was crucial that the most personnel-intensive candidate process be automated, at the lowest price possible and with the lowest risk of failure. This discussion relates factors in the selection of the Standard Leak Calibration System for automation, the methods of automation used to provide the lowest-cost solution and the benefits realized as a result of the automation.

  7. Genome bioinformatics of tomato and potato

    NARCIS (Netherlands)

    Datema, E.

    2011-01-01

    In the past two decades genome sequencing has developed from a laborious and costly technology employed by large international consortia to a widely used, automated and affordable tool used worldwide by many individual research groups. Genome sequences of many food animals and crop plants have been

  8. Mission-Critical Mobile Broadband Communications in Open Pit Mines

    DEFF Research Database (Denmark)

    Uzeda Garcia, Luis Guilherme; Portela Lopes de Almeida, Erika; Barbosa, Viviane S. B.;

    2016-01-01

    that need to be met by the wireless network. This article introduces fundamental concepts behind open-pit mining and discusses why this ever changing environment coupled with strict industrial reliability requirements pose unique challenges to traditional broadband network planning and optimization......The need for continuous safety improvements and increased operational efficiency is driving the mining industry through a transition towards automated operations. From a communications perspective, this transition introduces a new set of high-bandwidth business- and mission-critical applications...... techniques. On the other hand, unlike unpredictable disaster scenarios, mining is a carefully planned activity. Taking advantage of this predictability element, we propose a framework that integrates mine and network planning so that continuous and automated adaptation of the network becomes possible...

  9. Data Mining in Medical Application for better Detection of Disease

    Directory of Open Access Journals (Sweden)

    Fanil V. Gada

    2013-04-01

    Full Text Available Automated extraction of knowledge from voluminous documents is a vast research area. Text mining is a promising approach for extracting knowledge from unstructured textual documents. The objective of this paper is to mine documents pertaining to any Medical information in this case the diseases, which are retrieved from  a databank, and find novel transitive associations among biological objects. This paper discusses the extraction of biological objects from the databank using an  Automated Vocabulary Discovery (AVD algorithm[7]. A text-mining process is described for finding transitive (novel associations among the extracted biological objects. The text mining algorithm  also assigns a numerical significance score to them. The expectation is that those with higher score have greater likelihood of being true than those with lower scores.

  10. Papers of the CIM Toronto 2005 mining industry conference and exhibition : Mining rocks. Online ed.

    International Nuclear Information System (INIS)

    This conference highlighted technical innovations and best business practices within Canada's mining industry. It provided an opportunity for geologists, engineers and mine operators to exchange the latest information concerning innovations, challenges and discoveries in the mining industry in Canada and internationally. A session on mine management focused on underground mining operations, maintenance engineering, open-pit operations and geotechnical engineering. A session on current projects focused on the activities involved with developing properties from the exploration phase through to production. Mine economics, geology, mine design and management practices were highlighted along with technology and advanced systems, underground technologies, open-pit technologies, metallurgy, and developments in mineral processing. The presentations also addressed the issue of how to ensure the development of mineral resources so they continue to be integrally important to Canada's economic prosperity. Some of the challenges facing the industry include environmental, community, human resource and automation issues. The trade show allowed leading equipment and service providers to exhibit the latest tools and equipment driving mine production. The exhibition included technology that has contributed to environmental, geotechnical, production, maintenance and processing performance and safety. More than 43 technical papers were presented at the conference, of which 5 have been indexed separately for inclusion in this database. refs., tabs., figs

  11. Automated Single Cell Data Decontamination Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Tennessen, Kristin [Lawrence Berkeley National Lab. (LBNL), Walnut Creek, CA (United States). Dept. of Energy Joint Genome Inst.; Pati, Amrita [Lawrence Berkeley National Lab. (LBNL), Walnut Creek, CA (United States). Dept. of Energy Joint Genome Inst.

    2014-03-21

    Recent technological advancements in single-cell genomics have encouraged the classification and functional assessment of microorganisms from a wide span of the biospheres phylogeny.1,2 Environmental processes of interest to the DOE, such as bioremediation and carbon cycling, can be elucidated through the genomic lens of these unculturable microbes. However, contamination can occur at various stages of the single-cell sequencing process. Contaminated data can lead to wasted time and effort on meaningless analyses, inaccurate or erroneous conclusions, and pollution of public databases. A fully automated decontamination tool is necessary to prevent these instances and increase the throughput of the single-cell sequencing process

  12. MouseMine: a new data warehouse for MGI.

    Science.gov (United States)

    Motenko, H; Neuhauser, S B; O'Keefe, M; Richardson, J E

    2015-08-01

    MouseMine (www.mousemine.org) is a new data warehouse for accessing mouse data from Mouse Genome Informatics (MGI). Based on the InterMine software framework, MouseMine supports powerful query, reporting, and analysis capabilities, the ability to save and combine results from different queries, easy integration into larger workflows, and a comprehensive Web Services layer. Through MouseMine, users can access a significant portion of MGI data in new and useful ways. Importantly, MouseMine is also a member of a growing community of online data resources based on InterMine, including those established by other model organism databases. Adopting common interfaces and collaborating on data representation standards are critical to fostering cross-species data analysis. This paper presents a general introduction to MouseMine, presents examples of its use, and discusses the potential for further integration into the MGI interface. PMID:26092688

  13. Uses of antimicrobial genes from microbial genome

    Science.gov (United States)

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  14. A genetic programming based business process mining approach

    OpenAIRE

    Turner, Christopher James

    2009-01-01

    As business processes become ever more complex there is a need for companies to understand the processes they already have in place. To undertake this manually would be time consuming. The practice of process mining attempts to automatically construct the correct representation of a process based on a set of process execution logs. The aim of this research is to develop a genetic programming based approach for business process mining. The focus of this research is on automated/semi automat...

  15. Data processing in management of Dolni Rozinka uranium mines

    International Nuclear Information System (INIS)

    In 1985, a qualitative inovation was introduced of data processing by the commissioning of the EC 1026 computer with a terminal network and a remote data communication system. The design jobs which are being gradually implemented are mainly oriented to the creating of an automated information system for operative control of mining production, data preparation in mining plants, and to the personnel, wages, material consumptions, etc. areas. (J.B.)

  16. Data mining meets economic analysis: opportunities and challenges

    OpenAIRE

    Baicoianu, A.; Dumitrescu, S

    2010-01-01

    Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of economic data, in order to support both companies’ and individuals’ strategic planning and investment decision-making. The goal of this paper is to illustrate the impact of data mining techniques on sales, ...

  17. Mining Attribute-Based Access Control Policies from Logs

    OpenAIRE

    Xu, Zhongyuan; Stoller, Scott,

    2014-01-01

    Attribute-based access control (ABAC) provides a high level of flexibility that promotes security and information sharing. ABAC policy mining algorithms have potential to significantly reduce the cost of migration to ABAC, by partially automating the development of an ABAC policy from information about the existing access-control policy and attribute data. This paper presents an algorithm for mining ABAC policies from operation logs and attribute data. To the best of our knowledge, it is the ...

  18. Uranium mining in Australia

    International Nuclear Information System (INIS)

    The mining of uranium in Australia is criticised in relation to it's environmental impact, economics and effects on mine workers and Aborigines. A brief report is given on each of the operating and proposed uranium mines in Australia

  19. Exploration and Mining Roadmap

    Energy Technology Data Exchange (ETDEWEB)

    none,

    2002-09-01

    This Exploration and Mining Technology Roadmap represents the third roadmap for the Mining Industry of the Future. It is based upon the results of the Exploration and Mining Roadmap Workshop held May 10 ñ 11, 2001.

  20. Coal Mine Permit Boundaries

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — ESRI ArcView shapefile depicting New Mexico coal mines permitted under the Surface Mining Control and Reclamation Act of 1977 (SMCRA), by either the NM Mining...

  1. Mining Patient Journeys From Healthcare Narratives

    OpenAIRE

    Dehghan, Azad

    2015-01-01

    The aim of the thesis is to investigate the feasibility of using text mining methods to reconstruct patient journeys from unstructured clinical narratives.A novel method to extract and represent patient journeys is proposed and evaluated in this thesis. A composition of methods were designed, developed and evaluated to this end; which included health-related concept extraction, temporal information extraction, and concept clustering and automated work-flow generation.A suite of methods to ext...

  2. Integrating Data Mining Into Business Intelligence

    OpenAIRE

    Maria Cristina ENACHE

    2006-01-01

    Data Mining is a broad term often used to describe the process of using database technology, modeling techniques, statistical analysis, and machine learning to analyze large amounts of data in an automated fashion to discover hidden patterns and predictive information in the data. By building highly complex and sophisticated statistical and mathematical models, organizations can gain new insight into their activities. The purpose of this document is to provide users with a background of a few...

  3. GDR surface mining technology - a programme for complicated geological and climatic conditions of surface mining

    Energy Technology Data Exchange (ETDEWEB)

    Rudolf, W.; Klose, W.

    1979-08-01

    This paper describes surface mining as an expanding technology with a work productivity 2.5 to 6.0 times higher than in underground mining. Increasing amounts of overburden can be removed, from 100,000 m3 to 300,000 m3 per day, by large excavation complexes. TAKRAF had exported 300 surface mining machines to various countries as of 1979. Surface mining technology is continually being improved with developments in equipment, such as better service life, unit construction and interchangeability of parts, higher capacity, automation, climatic resistance to 60 C, etc. The TAKRAF equipment series are introduced including information on their range of capacity. TAKRAF bucket wheel and bucket chain excavators, conveyor belt systems, overburden conveyor bridges and swing chutes are described. Equipment for briquetting plants, brown coal enrichment and power plants is also produced by TAKRAF.

  4. Corrosion of friction rock stabilizers in selected uranium and copper mine waters. Report of Investigations/1984

    International Nuclear Information System (INIS)

    The Bureau of Mines evaluated corrosion resistance of Split Set friction rock stabilizer mine roof bolts to aid in better prediction of useful service life. Electrochemical corrosion testing was conducted utilizing an automated corrosion measurement system. Natural and/or synthetic mine waters from four uranium and two copper mines were the test media for the two types of high-strength, low-alloy (HSLA) steels from which Split Set stabilizers are manufactured, and for galvanized steel. Tests were conducted with waters of minimum and maximum dissolved oxygen content at in-mine water temperatures. Retrieved Split Set stabilizers were also evaluated for property changes

  5. Genome Sequence of Mushroom Soft-Rot Pathogen Janthinobacterium agaricidamnosum.

    Science.gov (United States)

    Graupner, Katharina; Lackner, Gerald; Hertweck, Christian

    2015-01-01

    Janthinobacterium agaricidamnosum causes soft-rot disease of the cultured button mushroom Agaricus bisporus and is thus responsible for agricultural losses. Here, we present the genome sequence of J. agaricidamnosum DSM 9628. The 5.9-Mb genome harbors several secondary metabolite biosynthesis gene clusters, which renders this neglected bacterium a promising source for genome mining approaches. PMID:25883287

  6. Draft Genome Sequence of Fungus Clonostachys rosea Strain YKD0085.

    Science.gov (United States)

    Liu, Shuai; Chang, Yaowen; Hu, Xujia; Gong, Xuanyun; Di, Yingtong; Dong, Jinyan; Hao, Xiaojiang

    2016-01-01

    Here, we report the draft genome sequence of Clonostachys rosea (strain YKD0085). The functional annotation of C. rosea provides important information related to its ability to produce secondary metabolites. The genome sequence presented here builds the basis for further genome mining. PMID:27340057

  7. Genome Sequence of Mushroom Soft-Rot Pathogen Janthinobacterium agaricidamnosum

    OpenAIRE

    Graupner, Katharina; Lackner, Gerald; Hertweck, Christian

    2015-01-01

    Janthinobacterium agaricidamnosum causes soft-rot disease of the cultured button mushroom Agaricus bisporus and is thus responsible for agricultural losses. Here, we present the genome sequence of J. agaricidamnosum DSM 9628. The 5.9-Mb genome harbors several secondary metabolite biosynthesis gene clusters, which renders this neglected bacterium a promising source for genome mining approaches.

  8. Draft Genome Sequence of Streptomyces hygroscopicus subsp. hygroscopicus NBRC 16556.

    Science.gov (United States)

    Komaki, Hisayuki; Ichikawa, Natsuko; Oguchi, Akio; Hamada, Moriyuki; Tamura, Tomohiko; Suzuki, Ken-Ichiro; Fujita, Nobuyuki

    2016-01-01

    Here, we report the draft genome sequence of strain NBRC 16556, deposited as Streptomyces hygroscopicus subsp. hygroscopicus into the NBRC culture collection. An average nucleotide identity analysis confirmed that the taxonomic identification is correct. The genome sequence will serve as a valuable reference for genome mining to search new secondary metabolites. PMID:27198007

  9. Coal Mines, Abandoned - Digitized Mined Areas

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — Coal mining has occurred in Pennsylvania for over a century. The maps to these coal mines are stored at many various public and private locations (if they still...

  10. Development of On-Board Fluid Analysis for the Mining Industry - Final report

    Energy Technology Data Exchange (ETDEWEB)

    Pardini, Allan F.

    2005-08-16

    Pacific Northwest National Laboratory (PNNL: Operated by Battelle Memorial Institute for the Department of Energy) is working with the Department of Energy (DOE) to develop technology for the US mining industry. PNNL was awarded a three-year program to develop automated on-board/in-line or on-site oil analysis for the mining industry.

  11. Automated stopcock actuator

    OpenAIRE

    Vandehey, N. T.; O'Neil, J.P.

    2015-01-01

    Introduction We have developed a low-cost stopcock valve actuator for radiochemistry automation built using a stepper motor and an Arduino, an open-source single-board microcontroller. The con-troller hardware can be programmed to run by serial communication or via two 5–24 V digital lines for simple integration into any automation control system. This valve actuator allows for automated use of a single, disposable stopcock, providing a number of advantages over stopcock manifold systems ...

  12. The Adaptive Automation Design

    OpenAIRE

    Calefato, Caterina; Montanari, Roberto; TESAURI, Francesco

    2008-01-01

    After considering the positive effects of adaptive automation implementation, this chapter focuses on two partly overlapping phenomena: on the one hand, the role of trust in automation is considered, particularly as to the effects of overtrust and mistrust in automation's reliability; on the other hand, long-term lack of exercise on specific operation may lead users to skill deterioration. As a future work, it will be interesting and challenging to explore the conjunction of adaptive automati...

  13. Service functional test automation

    OpenAIRE

    Hillah, Lom Messan; Maesano, Ariele-Paolo; Rosa, Fabio; Maesano, Libero; Lettere, Marco; Fontanelli, Riccardo

    2015-01-01

    This paper presents the automation of the functional test of services (black-box testing) and services architectures (grey-box testing) that has been developed by the MIDAS project and is accessible on the MIDAS SaaS. In particular, the paper illustrates the solutions of tough functional test automation problems such as: (i) the configuration of the automated test execution system against large and complex services architectures, (ii) the constraint-based test input generation, (iii) the spec...

  14. Automated Weather Observing System

    Data.gov (United States)

    Department of Transportation — The Automated Weather Observing System (AWOS) is a suite of sensors, which measure, collect, and disseminate weather data to help meteorologists, pilots, and flight...

  15. Laboratory Automation and Middleware.

    Science.gov (United States)

    Riben, Michael

    2015-06-01

    The practice of surgical pathology is under constant pressure to deliver the highest quality of service, reduce errors, increase throughput, and decrease turnaround time while at the same time dealing with an aging workforce, increasing financial constraints, and economic uncertainty. Although not able to implement total laboratory automation, great progress continues to be made in workstation automation in all areas of the pathology laboratory. This report highlights the benefits and challenges of pathology automation, reviews middleware and its use to facilitate automation, and reviews the progress so far in the anatomic pathology laboratory. PMID:26065792

  16. Automated cloning methods.; TOPICAL

    International Nuclear Information System (INIS)

    Argonne has developed a series of automated protocols to generate bacterial expression clones by using a robotic system designed to be used in procedures associated with molecular biology. The system provides plate storage, temperature control from 4 to 37 C at various locations, and Biomek and Multimek pipetting stations. The automated system consists of a robot that transports sources from the active station on the automation system. Protocols for the automated generation of bacterial expression clones can be grouped into three categories (Figure 1). Fragment generation protocols are initiated on day one of the expression cloning procedure and encompass those protocols involved in generating purified coding region (PCR)

  17. Advances in Computer, Communication, Control and Automation

    CERN Document Server

    011 International Conference on Computer, Communication, Control and Automation

    2012-01-01

    The volume includes a set of selected papers extended and revised from the 2011 International Conference on Computer, Communication, Control and Automation (3CA 2011). 2011 International Conference on Computer, Communication, Control and Automation (3CA 2011) has been held in Zhuhai, China, November 19-20, 2011. This volume  topics covered include signal and Image processing, speech and audio Processing, video processing and analysis, artificial intelligence, computing and intelligent systems, machine learning, sensor and neural networks, knowledge discovery and data mining, fuzzy mathematics and Applications, knowledge-based systems, hybrid systems modeling and design, risk analysis and management, system modeling and simulation. We hope that researchers, graduate students and other interested readers benefit scientifically from the proceedings and also find it stimulating in the process.

  18. Proceedings: Fourth Workshop on Mining Scientific Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Kamath, C

    2001-07-24

    Commercial applications of data mining in areas such as e-commerce, market-basket analysis, text-mining, and web-mining have taken on a central focus in the JCDD community. However, there is a significant amount of innovative data mining work taking place in the context of scientific and engineering applications that is not well represented in the mainstream KDD conferences. For example, scientific data mining techniques are being developed and applied to diverse fields such as remote sensing, physics, chemistry, biology, astronomy, structural mechanics, computational fluid dynamics etc. In these areas, data mining frequently complements and enhances existing analysis methods based on statistics, exploratory data analysis, and domain-specific approaches. On the surface, it may appear that data from one scientific field, say genomics, is very different from another field, such as physics. However, despite their diversity, there is much that is common across the mining of scientific and engineering data. For example, techniques used to identify objects in images are very similar, regardless of whether the images came from a remote sensing application, a physics experiment, an astronomy observation, or a medical study. Further, with data mining being applied to new types of data, such as mesh data from scientific simulations, there is the opportunity to apply and extend data mining to new scientific domains. This one-day workshop brings together data miners analyzing science data and scientists from diverse fields to share their experiences, learn how techniques developed in one field can be applied in another, and better understand some of the newer techniques being developed in the KDD community. This is the fourth workshop on the topic of Mining Scientific Data sets; for information on earlier workshops, see http://www.ahpcrc.org/conferences/. This workshop continues the tradition of addressing challenging problems in a field where the diversity of applications is

  19. Automation of Shift Work in Operations

    International Nuclear Information System (INIS)

    NPP Krsko has started with Operations shift work automation in 2009. The main goal is to establish complete equipment and systems status control by electronic mean in next two years (until outage 2012). Other benefits that can be attained with such automation are: operations performance improvement (immediate response), better interaction (communication) among plant personnel, reduction of possible human errors, outage shift work optimization, reduced repetitive work, availability of on-line information related to plant operation. To achieve the plant status control by electronic mean, various shift work processes in operations must be automated and integrated such as: Operator rounds; Narrative Logs; Tagging process; Equipment deficiencies tracking; Plant Technical Specifications monitoring system; Temporary modifications; Personnel Qualifications and Scheduling (working hours control); Personnel authorization; Notification (Obligatory readings); Key control. Before implementation of shift work processes automation there are some prerequisites that must be fulfilled. Technically very important is to have precisely defined processes, good equipment data base with defined components and subcomponents and consistently labeled equipment. Domestic law requirements should be checked for conformity related to documents authorization. All relevant bodies which participate in decision making within the plant and outside organizations (regulatory authority - SNSA, Energy and Mining Inspectorate) should be acquainted with the plant intention to replace old way of administering to new way. This paper gives details on the software features, implementation, and lists and discusses difficulties that can arise during such change.(author).

  20. Image Mining: Review and New Challenges

    Directory of Open Access Journals (Sweden)

    Barbora Zahradnikova

    2015-07-01

    Full Text Available Besides new technology, a huge volume of data in various form has been available for people. Image data represents a keystone of many research areas including medicine, forensic criminology, robotics and industrial automation, meteorology and geography as well as education. Therefore, obtaining specific in-formation from image databases has become of great importance. Images as a special category of data differ from text data as in terms of their nature so in terms of storing and retrieving. Image Mining as a research field is an interdisciplinary area combining methodologies and knowledge of many branches including data mining, computer vision, image processing, image retrieval, statis-tics, recognition, machine learning, artificial intelligence etc. This review focuses researching the current image mining approaches and techniques aiming at widening the possibilities of facial image analysis. This paper aims at reviewing the current state of the IM as well as at describing challenges and identifying directions of the future research in the field.

  1. Web Mining using Semantic Data Mining Techniques

    OpenAIRE

    K.Ganapathi Babu; A.Komali; V.Mythry; A.S.K.Ratnam

    2012-01-01

    The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining, Internet technology and World Wide Web, and for the more recent Semantic Web. Semantic Web Mining is the outcome of two new and fast developing domains: Semantic Web and Data Mining. The Semantic Web is an extension of the current web in which informatio...

  2. The application and implementation of optimized mine ventilation on demand (OMVOD) at the Xstrata Nickel Rim South Mine, Sudbury, Ontario

    International Nuclear Information System (INIS)

    An Optimized Mine Ventilation on Demand (OMVOD) system has been installed at the Xstrata Nickel Rim South Mine in Sudbury. Developed by Simsmart Technologies, the OMVOD system monitors and controls air quality and quantity through real time dynamic automation. A ventilation on demand (VOD) system was needed to remove diesel particulate matter (DPM), carbon monoxide (CO) and nitrogen dioxide (NO2). This paper described the real-time tracking and monitoring of the OMVOD system and optimization of ventilation equipment. Simsmart's OMVOD system was shown to reduce energy costs while improve air quality in the underground mine. 7 refs., 3 tabs., 8 figs.

  3. Mining machine safari

    Energy Technology Data Exchange (ETDEWEB)

    Woof, M.

    1998-11-01

    New South African and other mining equipment on display at the Electra 98 exhibition is described. Products include: cutting machines; shovels; crushing machines; drilling equipment; control systems; moon buggy inspection vehicles; remote control underground mining machines; longwall shearers; mining software; scrapedozers; continuous miners; sprays; mine haulage equipment; milling machines; flotation plant; mud removal systems; chains; vehicle exhaust filters and continuous miner monitoring systems.

  4. Data mining for ontology development.

    Energy Technology Data Exchange (ETDEWEB)

    Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

    2010-06-01

    A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

  5. Library Automation Style Guide.

    Science.gov (United States)

    Gaylord Bros., Liverpool, NY.

    This library automation style guide lists specific terms and names often used in the library automation industry. The terms and/or acronyms are listed alphabetically and each is followed by a brief definition. The guide refers to the "Chicago Manual of Style" for general rules, and a notes section is included for the convenience of individual…

  6. Automation in Warehouse Development

    NARCIS (Netherlands)

    Hamberg, R.; Verriet, J.

    2012-01-01

    The warehouses of the future will come in a variety of forms, but with a few common ingredients. Firstly, human operational handling of items in warehouses is increasingly being replaced by automated item handling. Extended warehouse automation counteracts the scarcity of human operators and support

  7. Automate functional testing

    Directory of Open Access Journals (Sweden)

    Ramesh Kalindri

    2014-06-01

    Full Text Available Currently, software engineers are increasingly turning to the option of automating functional tests, but not always have successful in this endeavor. Reasons range from low planning until over cost in the process. Some principles that can guide teams in automating these tests are described in this article.

  8. Mine mapping and layout

    Energy Technology Data Exchange (ETDEWEB)

    Williams, W.R.

    1983-01-01

    This book provides a study aid to mine engineering students combining graphical techniques of mine mapping, mine design, and mine layout, as well as supplemental techniques needed for adequate mine design. The book reviews cartographic techniques, the fundamentals of plane surveying, standard drafting room practices, guidelines for mine map symbolization and production, mine entry nomenclature, and details of topographic map production. Supplemental techniques include methods of achieving functional performance in the layout of a mine, statistical mapping of exploratory data, and techniques of descriptive geometry. Designs of ventilation, transportation, drainage, and electrical supply systems of mines are discussed in the final chapters. An appendix discusses mine mapping symbols. 105 references, 234 figures, 16 tables.

  9. Automation in Immunohematology

    Directory of Open Access Journals (Sweden)

    Meenu Bajpai

    2012-01-01

    Full Text Available There have been rapid technological advances in blood banking in South Asian region over the past decade with an increasing emphasis on quality and safety of blood products. The conventional test tube technique has given way to newer techniques such as column agglutination technique, solid phase red cell adherence assay, and erythrocyte-magnetized technique. These new technologies are adaptable to automation and major manufacturers in this field have come up with semi and fully automated equipments for immunohematology tests in the blood bank. Automation improves the objectivity and reproducibility of tests. It reduces human errors in patient identification and transcription errors. Documentation and traceability of tests, reagents and processes and archiving of results is another major advantage of automation. Shifting from manual methods to automation is a major undertaking for any transfusion service to provide quality patient care with lesser turnaround time for their ever increasing workload. This article discusses the various issues involved in the process.

  10. Automated model building

    CERN Document Server

    Caferra, Ricardo; Peltier, Nicholas

    2004-01-01

    This is the first book on automated model building, a discipline of automated deduction that is of growing importance Although models and their construction are important per se, automated model building has appeared as a natural enrichment of automated deduction, especially in the attempt to capture the human way of reasoning The book provides an historical overview of the field of automated deduction, and presents the foundations of different existing approaches to model construction, in particular those developed by the authors Finite and infinite model building techniques are presented The main emphasis is on calculi-based methods, and relevant practical results are provided The book is of interest to researchers and graduate students in computer science, computational logic and artificial intelligence It can also be used as a textbook in advanced undergraduate courses

  11. Automation in Warehouse Development

    CERN Document Server

    Verriet, Jacques

    2012-01-01

    The warehouses of the future will come in a variety of forms, but with a few common ingredients. Firstly, human operational handling of items in warehouses is increasingly being replaced by automated item handling. Extended warehouse automation counteracts the scarcity of human operators and supports the quality of picking processes. Secondly, the development of models to simulate and analyse warehouse designs and their components facilitates the challenging task of developing warehouses that take into account each customer’s individual requirements and logistic processes. Automation in Warehouse Development addresses both types of automation from the innovative perspective of applied science. In particular, it describes the outcomes of the Falcon project, a joint endeavour by a consortium of industrial and academic partners. The results include a model-based approach to automate warehouse control design, analysis models for warehouse design, concepts for robotic item handling and computer vision, and auton...

  12. Advances in inspection automation

    Science.gov (United States)

    Weber, Walter H.; Mair, H. Douglas; Jansen, Dion; Lombardi, Luciano

    2013-01-01

    This new session at QNDE reflects the growing interest in inspection automation. Our paper describes a newly developed platform that makes the complex NDE automation possible without the need for software programmers. Inspection tasks that are tedious, error-prone or impossible for humans to perform can now be automated using a form of drag and drop visual scripting. Our work attempts to rectify the problem that NDE is not keeping pace with the rest of factory automation. Outside of NDE, robots routinely and autonomously machine parts, assemble components, weld structures and report progress to corporate databases. By contrast, components arriving in the NDT department typically require manual part handling, calibrations and analysis. The automation examples in this paper cover the development of robotic thickness gauging and the use of adaptive contour following on the NRU reactor inspection at Chalk River.

  13. Semantic web mining

    OpenAIRE

    Stumme, Gerd; Hotho, Andreas; Berendt, Bettina

    2006-01-01

    Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: an increasing number of researchers is working on improving the results of Web Mining by exploiting semantic structures in the Web, and they make use of Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The Semantic Web is t...

  14. Predictive Data mining and discovering hidden values of Data warehouse

    Directory of Open Access Journals (Sweden)

    Mehta Neel B

    2011-04-01

    Full Text Available Data Mining is an analytic process to explore data (usually large amounts of data - typically business or market related in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new sets of data. The main target of data mining application is prediction. Predictive data mining is important and it has the most direct business applications in world. The paper briefly explains the process of data mining which consists of three stages: (1 the Initial exploration, (2 Pattern identification with validation, and (3 Deployment (application of the model to new data in order to generate predictions. Data Mining is being done for Patterns and Relationships recognitions in Data analysis, with an emphasis on large Observational data bases. From a statistical perspective Data Mining is viewed as computer automated exploratory data analytical system for large sets of data and it has huge Research challenges in India and abroad as well. Machine learning methods form the core of Data Mining and Decision tree learning. Data mining work is integrated within an existing user environment, including the works that already make use of data warehousing and Online Analytical Processing (OLAP. The paper describes how data mining tools predict future trends and behaviour which allows in making proactive knowledge-driven decisions.

  15. Automated Menu Recommendation System Based on Past Preferences

    OpenAIRE

    Daniel Simon Sanz; Ankur Agrawal

    2014-01-01

    Data mining plays an important role in ecommerce in today’s world. Time is critical when it comes to shopping as options are unlimited and making a choice can be tedious. This study presents an application of data mining in the form of an Android application that can provide user with automated suggestion based on past preferences. The application helps a person to choose what food they might want to order in a specific restaurant. The application learns user behavior with each order - what t...

  16. Improvement of Test Automation

    OpenAIRE

    Räsänen, Timo

    2013-01-01

    The purpose for this study was to find out how to ensure that the automated testing of MME in the Network Verification will continue smooth and reliable while using the in-house developed test automation framework. The goal of this thesis was to reveal the reasons of the currently challenging situation and to find the key elements to be improved in the MME testing carried by the test automation. Also a reason for the study was to get solutions as to how to change the current procedures and wa...

  17. Chef infrastructure automation cookbook

    CERN Document Server

    Marschall, Matthias

    2013-01-01

    Chef Infrastructure Automation Cookbook contains practical recipes on everything you will need to automate your infrastructure using Chef. The book is packed with illustrated code examples to automate your server and cloud infrastructure.The book first shows you the simplest way to achieve a certain task. Then it explains every step in detail, so that you can build your knowledge about how things work. Eventually, the book shows you additional things to consider for each approach. That way, you can learn step-by-step and build profound knowledge on how to go about your configuration management

  18. A mining pearl of Saxon ore mining

    International Nuclear Information System (INIS)

    Mine 371 of the Soviet-German WISMUT company, i.e.: Half of the capital with the unissued shares has now been inherited, according to the Russo-German Treaty of 1953 and in connection with German reunification by the Federal German Republic, with all the consequences with regard to mining supervision, results of mining, mining/social aspects and looking after the spoil. In addition, there will probably be the marketing of remaining uranium concentrates, because the Soviet Union will leave the company at the end of 1990. (orig./HS)

  19. Text Classification using Data Mining

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  20. Text mining for the biocuration workflow.

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  1. Typical Genomic Framework on Disease Analysis

    OpenAIRE

    J. Stanly Thomas; Dr.N Rajkumar

    2015-01-01

    The challenging and major role of the doctor in human life is to predict as well as diagnose the disease which has got infected in the human body. This typical genomic framework on disease analysis algorithm is designed to store and drive each and every gene characteristics like shape, weight, location and normal growth culture. Whenever the disease report is feed into this data mining algorithm triggers the similarity test built upon the data mining classification rules. A gene is usually co...

  2. Automated Vehicles Symposium 2014

    CERN Document Server

    Beiker, Sven; Road Vehicle Automation 2

    2015-01-01

    This paper collection is the second volume of the LNMOB series on Road Vehicle Automation. The book contains a comprehensive review of current technical, socio-economic, and legal perspectives written by experts coming from public authorities, companies and universities in the U.S., Europe and Japan. It originates from the Automated Vehicle Symposium 2014, which was jointly organized by the Association for Unmanned Vehicle Systems International (AUVSI) and the Transportation Research Board (TRB) in Burlingame, CA, in July 2014. The contributions discuss the challenges arising from the integration of highly automated and self-driving vehicles into the transportation system, with a focus on human factors and different deployment scenarios. This book is an indispensable source of information for academic researchers, industrial engineers, and policy makers interested in the topic of road vehicle automation.

  3. I-94 Automation FAQs

    Data.gov (United States)

    Department of Homeland Security — In order to increase efficiency, reduce operating costs and streamline the admissions process, U.S. Customs and Border Protection has automated Form I-94 at air and...

  4. Automated Vehicles Symposium 2015

    CERN Document Server

    Beiker, Sven

    2016-01-01

    This edited book comprises papers about the impacts, benefits and challenges of connected and automated cars. It is the third volume of the LNMOB series dealing with Road Vehicle Automation. The book comprises contributions from researchers, industry practitioners and policy makers, covering perspectives from the U.S., Europe and Japan. It is based on the Automated Vehicles Symposium 2015 which was jointly organized by the Association of Unmanned Vehicle Systems International (AUVSI) and the Transportation Research Board (TRB) in Ann Arbor, Michigan, in July 2015. The topical spectrum includes, but is not limited to, public sector activities, human factors, ethical and business aspects, energy and technological perspectives, vehicle systems and transportation infrastructure. This book is an indispensable source of information for academic researchers, industrial engineers and policy makers interested in the topic of road vehicle automation.

  5. Hydrometeorological Automated Data System

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Office of Hydrologic Development of the National Weather Service operates HADS, the Hydrometeorological Automated Data System. This data set contains the last...

  6. An automated Certification Authority

    CERN Document Server

    Shamardin, L V

    2002-01-01

    This note describe an approach to building an automated Certification Authority. It is compatible with basic requirements of RFC2527. It also supports Registration Authorities and Globus Toolkit grid-cert-renew automatic certificate renewal.

  7. Disassembly automation automated systems with cognitive abilities

    CERN Document Server

    Vongbunyong, Supachai

    2015-01-01

    This book presents a number of aspects to be considered in the development of disassembly automation, including the mechanical system, vision system and intelligent planner. The implementation of cognitive robotics increases the flexibility and degree of autonomy of the disassembly system. Disassembly, as a step in the treatment of end-of-life products, can allow the recovery of embodied value left within disposed products, as well as the appropriate separation of potentially-hazardous components. In the end-of-life treatment industry, disassembly has largely been limited to manual labor, which is expensive in developed countries. Automation is one possible solution for economic feasibility. The target audience primarily comprises researchers and experts in the field, but the book may also be beneficial for graduate students.

  8. Automated security management

    CERN Document Server

    Al-Shaer, Ehab; Xie, Geoffrey

    2013-01-01

    In this contributed volume, leading international researchers explore configuration modeling and checking, vulnerability and risk assessment, configuration analysis, and diagnostics and discovery. The authors equip readers to understand automated security management systems and techniques that increase overall network assurability and usability. These constantly changing networks defend against cyber attacks by integrating hundreds of security devices such as firewalls, IPSec gateways, IDS/IPS, authentication servers, authorization/RBAC servers, and crypto systems. Automated Security Managemen

  9. Automating Supplier Selection Procedures

    OpenAIRE

    Davidrajuh, Reggie

    2001-01-01

    This dissertation describes a methodology, tools, and implementation techniques of automating supplier selection procedures of a small and medium-sized agile virtual enterprise. Firstly, a modeling approach is devised that can be used to model the supplier selection procedures of an enterprise. This modeling approach divides the supplier selection procedures broadly into three stages, the pre-selection, selection, and post-selection stages. Secondly, a methodology is presented for automating ...

  10. Taiwan Automated Telescope Network

    OpenAIRE

    Shuhrat Ehgamberdiev; Alexander Serebryanskiy; Antonio Jimenez; Li-Han Wang; Ming-Tsung Sun; Javier Fernandez Fernandez; Dean-Yi Chou

    2010-01-01

    A global network of small automated telescopes, the Taiwan Automated Telescope (TAT) network, dedicated to photometric measurements of stellar pulsations, is under construction. Two telescopes have been installed in Teide Observatory, Tenerife, Spain and Maidanak Observatory, Uzbekistan. The third telescope will be installed at Mauna Loa Observatory, Hawaii, USA. Each system uses a 9-cm Maksutov-type telescope. The effective focal length is 225 cm, corresponding to an f-ratio of 25. The field...

  11. Automated Lattice Perturbation Theory

    Energy Technology Data Exchange (ETDEWEB)

    Monahan, Christopher

    2014-11-01

    I review recent developments in automated lattice perturbation theory. Starting with an overview of lattice perturbation theory, I focus on the three automation packages currently "on the market": HiPPy/HPsrc, Pastor and PhySyCAl. I highlight some recent applications of these methods, particularly in B physics. In the final section I briefly discuss the related, but distinct, approach of numerical stochastic perturbation theory.

  12. Automated functional software testing

    OpenAIRE

    Jelnikar, Kristina

    2009-01-01

    The following work describes an approach to software test automation of functional testing. In the introductory part we are introducing what testing problems development companies are facing. The second chapter describes some testing methods, what role does testing have in software development, some approaches to software development and the meaning of testing environment. Chapter 3 is all about test automation. After a brief historical presentation, we are demonstrating through s...

  13. Instant Sikuli test automation

    CERN Document Server

    Lau, Ben

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. A concise guide written in an easy-to follow style using the Starter guide approach.This book is aimed at automation and testing professionals who want to use Sikuli to automate GUI. Some Python programming experience is assumed.

  14. Linked Data approach for selection process automation in Systematic Reviews

    OpenAIRE

    Torchiano, Marco; Morisio, Maurizio; Tomassetti, Federico Cesare Argentino; Ardito, Luca; Vetro, Antonio; Rizzo, Giuseppe

    2011-01-01

    Background: a systematic review identifies, evaluates and synthesizes the available literature on a given topic using scientific and repeatable methodologies. The significant workload required and the subjectivity bias could affect results. Aim: semi-automate the selection process to reduce the amount of manual work needed and the consequent subjectivity bias. Method: extend and enrich the selection of primary studies using the existing technologies in the field of Linked Data and text mining...

  15. Large-scale automated synthesis of human functional neuroimaging data

    OpenAIRE

    Yarkoni, Tal; Poldrack, Russell A.; Nichols, Thomas E.; Van Essen, David C; Wager, Tor D.

    2011-01-01

    The explosive growth of the human neuroimaging literature has led to major advances in understanding of human brain function, but has also made aggregation and synthesis of neuroimaging findings increasingly difficult. Here we describe and validate an automated brain mapping framework that uses text mining, meta-analysis and machine learning techniques to generate a large database of mappings between neural and cognitive states. We demonstrate the capacity of our approach to automatically con...

  16. Mining Ostrava '93

    International Nuclear Information System (INIS)

    Part I of the Proceedings contains 55 contributions, out of which 2 deal with environmental impacts of undermining during coal mining, and of shocks and vibrations during underground coal mining. (Z.S.)

  17. Mines and Mineral Resources

    Data.gov (United States)

    Department of Homeland Security — Mines in the United States According to the Homeland Security Infrastructure Program Tiger Team Report Table E-2.V.1 Sub-Layer Geographic Names, a mine is defined...

  18. A Framework to Support Automated Classification and Labeling of Brain Electromagnetic Patterns

    OpenAIRE

    Gwen A. Frishkoff; Robert M. Frank; Jiawei Rong; Dejing Dou; Joseph Dien; Laura K. Halderman

    2007-01-01

    This paper describes a framework for automated classification and labeling of patterns in electroencephalographic (EEG) and magnetoencephalographic (MEG) data. We describe recent progress on four goals: 1) specification of rules and concepts that capture expert knowledge of event-related potentials (ERP) patterns in visual word recognition; 2) implementation of rules in an automated data processing and labeling stream; 3) data mining techniques that lead to r...

  19. Recent advances in remote coal mining machine sensing, guidance, and teleoperation

    Energy Technology Data Exchange (ETDEWEB)

    Ralston, J.C.; Hainsworth, D.W.; Reid, D.C.; Anderson, D.L.; McPhee, R.J. [CSIRO Exploration & Minerals, Kenmore, Qld. (Australia)

    2001-10-01

    Some recent applications of sensing, guidance and telerobotic technology in the coal mining industry are presented. Of special interest is the development of semi or fully autonomous systems to provide remote guidance and communications for coal mining equipment. The use of radar and inertial based sensors are considered in an attempt to solve the horizontal and lateral guidance problems associated with mining equipment automation. Also described is a novel teleoperated robot vehicle with unique communications capabilities, called the Numbat, which is used in underground mine safety and reconnaissance missions.

  20. Mining in El Salvador

    DEFF Research Database (Denmark)

    Pacheco Cueva, Vladimir

    2014-01-01

    In this guest article, Vladimir Pacheco, a social scientist who has worked on mining and human rights shares his perspectives on a current campaign against mining in El Salvador – Central America’s smallest but most densely populated country.......In this guest article, Vladimir Pacheco, a social scientist who has worked on mining and human rights shares his perspectives on a current campaign against mining in El Salvador – Central America’s smallest but most densely populated country....

  1. Towards semantic web mining

    OpenAIRE

    Berendt, Bettina; Hotho, Andreas; Stumme, Gerd

    2002-01-01

    Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.

  2. Mine drainage treatment

    OpenAIRE

    Golomeova, Mirjana; Zendelska, Afrodita; Krstev, Boris; Golomeov, Blagoj; Krstev, Aleksandar

    2012-01-01

    Water flowing from underground and surface mines and contains high concentrations of dissolved metals is called mine drainage. Mine drainage can be categorized into several basic types by their alkalinity or acidity. Sulfide rich and carbonate poor materials are expected to produce acidic drainage, and alkaline rich materials, even with significant sulfide concentrations, often produce net alkaline water. Mine drainages are dangerous because pollutants may decompose in the environment. In...

  3. Web Usage Mining

    OpenAIRE

    Bari, Pranit; P M Chawan

    2013-01-01

    The paper discusses about web usage mining involves the automatic discovery of user access patterns from one or more Web servers. This article provides a survey and analysis of current Web usage mining systems and technologies. The paper also confers about the procedure in which the web usage mining of the data sets is carried out. Finally the paper concludes with the areas in which web usage mining is implemented

  4. Uranium mining: Saskatchewan status

    International Nuclear Information System (INIS)

    This paper gives the status of uranium mining by Areva in Saskatchewan. Uranium production now meets 85% of world demand for power generation. 80% of world production of uranium comes from top 5 countries: Kazakhstan, Canada, Australia, Niger and Namibia. Saskatchewan is currently the only Canadian province with active uranium mines and mills and the largest exploration programs. Several mine projects are going through the environmental assessment process. Public opinion is in favour of mining activities in Saskatchewan.

  5. A MINE alternative to D-optimal designs for the linear model.

    Directory of Open Access Journals (Sweden)

    Amanda M Bouffier

    Full Text Available Doing large-scale genomics experiments can be expensive, and so experimenters want to get the most information out of each experiment. To this end the Maximally Informative Next Experiment (MINE criterion for experimental design was developed. Here we explore this idea in a simplified context, the linear model. Four variations of the MINE method for the linear model were created: MINE-like, MINE, MINE with random orthonormal basis, and MINE with random rotation. Each method varies in how it maximizes the MINE criterion. Theorem 1 establishes sufficient conditions for the maximization of the MINE criterion under the linear model. Theorem 2 establishes when the MINE criterion is equivalent to the classic design criterion of D-optimality. By simulation under the linear model, we establish that the MINE with random orthonormal basis and MINE with random rotation are faster to discover the true linear relation with p regression coefficients and n observations when p>>n. We also establish in simulations with n<100, p=1000, σ=0.01 and 1000 replicates that these two variations of MINE also display a lower false positive rate than the MINE-like method and additionally, for a majority of the experiments, for the MINE method.

  6. The mining methods at the Fraisse mine

    International Nuclear Information System (INIS)

    The Fraisse mine is one of the four underground mines of the La Crouzille mining divisions of Cogema. Faced with the necessity to mechanize its workings, this mine also had to satisfy a certain number of stringent demands. This has led to concept of four different mining methods for the four workings at present in active operation at this pit, which nevertheless preserve the basic ideas of the methods of top slicing under concrete slabs (TSS) or horizontal cut-and-fill stopes (CFS). An electric scooptram is utilized. With this type of vehicle the stringent demands for the introduction of means for fire fighting and prevention are reduced to a minimum. Finally, the dimensions of the vehicles and the operation of these methods result in a net-to-gross tonnages of close to 1, i.e. a maximum output, combined with a minimum of contamination

  7. Data Mining for CRM

    Science.gov (United States)

    Thearling, Kurt

    Data Mining technology allows marketing organizations to better understand their customers and respond to their needs. This chapter describes how Data Mining can be combined with customer relationship management to help drive improved interactions with customers. An example showing how to use Data Mining to drive customer acquisition activities is presented.

  8. Mined-out land

    International Nuclear Information System (INIS)

    Estonian mineral resources are deposited in low depth and mining fields are large, therefore vast areas are affected by mining. There are at least 800 deposits with total area of 6,000 km2 and about the same number of underground mines, surface mines, peat fields, quarries, and sand and gravel pits. The deposits cover more than 10% of Estonian mainland. The total area of operating mine claims exceeds 150 km2 that makes 0.3 % of Estonian area. The book is written mainly for the people who are living or acting in the area influenced by mining. The observations and research could benefit those who are interested in geography and environment, who follow formation and look of mined-out landscapes. The book contains also warnings for careless people on and under the surface of the mined-out land. Part of the book contains results of the research made in 1968-1993 by the first two authors working at the Estonian branch of A.Skochinsky Institute of Mining. Since 1990, Arvi Toomik continued this study at the Northeastern section of the Institute of Ecology of Tallinn Pedagogical University. Enno Reinsalu studied aftereffects of mining at the Mining Department of Tallinn Technical University from 1998 to 2000. Geographical Information System for Mining was studied by Ingo Valgma within his doctoral dissertation, and this book is one of the applications of his study

  9. Web Usage Mining

    OpenAIRE

    Benkovská, Petra

    2007-01-01

    General characteristic of web mining including methodology and procedures incorporated into this term. Relation to other areas (data mining, artificial intelligence, statistics, databases, internet technologies, management etc.) Web usage mining - data sources, data pre-processing, characterization of analytical methods and tools, interpretation of outputs (results), and possible areas of usage including examples. Suggestion of solution method, realization and a concrete example's outputs int...

  10. Mine waste management

    International Nuclear Information System (INIS)

    This book reports on mine waste management. Topics covered include: Performance review of modern mine waste management units; Mine waste management requirements; Prediction of acid generation potential; Attenuation of chemical constituents; Climatic considerations; Liner system design; Closure requirements; Heap leaching; Ground water monitoring; and Economic impact evaluation

  11. Gold-Mining

    DEFF Research Database (Denmark)

    Raaballe, J.; Grundy, B.D.

    2002-01-01

    operating gold mines. Asymmetric information on the reserves in the mine implies that, at a high enough price of gold, the manager of high type finds the extraction value of the company to be higher than the current market value of the non-operating gold mine. Due to this under valuation the maxim of market...... structure, objectives of the manager, and convenience yield)....

  12. Automation of analytical work at a chemical extraction mine

    International Nuclear Information System (INIS)

    Uranium in the solution is determined using an automatic analyzer operating in a discontinuous mode, based on the principle of direct uranium photometry using the Arsenazo III reagent. An envisaged development is discussed of automatic analyzers using data processing which would eliminate the human factor effect. (J.C.)

  13. Distributed Framework for Data Mining As a Service on Private Cloud

    Directory of Open Access Journals (Sweden)

    Shraddha Masih

    2014-11-01

    Full Text Available Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data. Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is not optimal when the data is distributed over the network. To perform data mining in distributed scenario, we need to design a different framework to improve efficiency. Also, the size of accumulated data grows exponentially with time and is difficult to mine using a single computer. Personal computers have limitations in terms of computation capability and storage capacity. Cloud computing can be exploited for compute-intensive and data intensive applications. Data mining algorithms are both compute and data intensive, therefore cloud based tools can provide an infrastructure for distributed data mining. This paper is intended to use cloud computing to support distributed data mining. We propose a cloud based data mining model which provides the facility of mass data storage along with distributed data mining facility. This paper provide a solution for distributed data mining on Hadoop framework using an interface to run the algorithm on specified number of nodes without any user level configuration. Hadoop is configured over private servers and clients can process their data through common framework from anywhere in private network. Data to be mined can either be chosen from cloud data server or can be uploaded from private computers on the network. It is observed that the framework is helpful in processing large size data in less time as compared to single system.

  14. Genome bioinformatics of tomato and potato

    OpenAIRE

    Datema, E.

    2011-01-01

    In the past two decades genome sequencing has developed from a laborious and costly technology employed by large international consortia to a widely used, automated and affordable tool used worldwide by many individual research groups. Genome sequences of many food animals and crop plants have been deciphered and are being exploited for fundamental research and applied to improve their breeding programs. The developments in sequencing technologies have also impacted the associated bioinformat...

  15. Data mining meets economic analysis: opportunities and challenges

    Directory of Open Access Journals (Sweden)

    Baicoianu, A.

    2010-12-01

    Full Text Available Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of economic data, in order to support both companies’ and individuals’ strategic planning and investment decision-making. The goal of this paper is to illustrate the impact of data mining techniques on sales, customer satisfaction and corporate profits. To this end, we present different data mining techniques and we discuss important data mining issues involved in specific economic applications. In addition, we discuss about a new method based on Boolean functions, LAD, which is successfully applied to data analysis. Finally, we highlight a number of challenges and opportunities for future research.

  16. Application of fuzzy logic for determining of coal mine mechanization

    Institute of Scientific and Technical Information of China (English)

    HOSSEINI SAA; ATAEI M; HOSSEINI S M; AKHYANI M

    2012-01-01

    The fundamental task of mining engineers is to produce more coal at a given level of labour input and material costs,for optimum quality and maximum efficiency.To achieve these goals,it is necessary to automate and mechanize mining operations.Mechanization is an objective that can result in significant cost reduction and higher levels of profitability for underground mines.To analyze the potential of mechanization,some important factors such as seam inclination and thickness,geological disturbances,seam floor conditions and roof conditions should be considered.In this study we have used fuzzy logic,membership functions and created fuzzy rule-based methods and considered the ultimate objective:mechanization of mining.As a case study,the mechanization of the Tazare coal seams in Shahroud area of Iran was investigated.The results show a low potential for mechanization in most of the Tazare coal seams.

  17. The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis

    OpenAIRE

    Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna; Inzé, Dirk; Van de Peer, Yves

    2013-01-01

    Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology res...

  18. Mined area detection overview

    Science.gov (United States)

    Burch, Ian A.; Deas, Robert M.; Port, Daniel M.

    2002-08-01

    An overview of the progress on the UK MOD Applied Research Program for Land Mine Detection. The Defense Science and Technology Laboratory (Dstl) carries out and manages the whole of the UK MOD's Mined Area Detection Applied Research Program both within its own laboratories and in partnership with industrial and academic research organizations. This paper will address two specific areas of Applied Research: hand held mine detection and vehicle mounted mine detection in support of the Mine Detection Neutralization and Route Marking System which started in April 1997. Both are multi-sensor systems, incorporating between them metal detection, ground penetrating radar, nuclear quadrupole resonance, ultra-wideband radar, and polarized thermal imaging.

  19. Mining planing introduction

    International Nuclear Information System (INIS)

    Basic concepts concerning mining parameters, plan establishment and typical procedure methods applied throughout the physical execution of mining operations are here determined, analyzed and discussed. Technological and economic aspects of the exploration phase are presented as well as general mathematical and statistical methods for estimating, analyzing and representing mineral deposits which are virtually essential for good mining project execution. The characterization of important mineral substances and the basic parameters of mining works are emphasized in conjunction with long, medium and short term mining planning. Finally, geological modelling, ore reserves calculations and final economic evaluations are considered using a hypothetical example in order to consolidate the main elaborated ideas. (D.J.M.)

  20. Mining text data

    CERN Document Server

    Aggarwal, Charu C

    2012-01-01

    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. ""Mining Text Data"" introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including

  1. Data Mining Model Comparison

    Science.gov (United States)

    Giudici, Paolo

    The aim of this contribution is to illustrate the role of statistical models and, more generally, of statistics, in choosing a Data Mining model. After a preliminary introduction on the distinction between Data Mining and statistics, we will focus on the issue of how to choose a Data Mining methodology. This well illustrates how statistical thinking can bring real added value to a Data Mining analysis, as otherwise it becomes rather difficult to make a reasoned choice. In the third part of the paper we will present, by means of a case study in credit risk management, how Data Mining and statistics can profitably interact.

  2. Geochemistry and mineralogy of arsenic in mine wastes and stream sediments in a historic metal mining area in the UK

    International Nuclear Information System (INIS)

    Mining generates large amounts of waste which may contain potentially toxic elements (PTE), which, if released into the wider environment, can cause air, water and soil pollution long after mining operations have ceased. The fate and toxicological impact of PTEs are determined by their partitioning and speciation and in this study, the concentrations and mineralogy of arsenic in mine wastes and stream sediments in a former metal mining area of the UK are investigated. Pseudo-total (aqua-regia extractable) arsenic concentrations in all samples from the mining area exceeded background and guideline values by 1–5 orders of magnitude, with a maximum concentration in mine wastes of 1.8 × 105 mg kg−1 As and concentrations in stream sediments of up to 2.5 × 104 mg kg−1 As, raising concerns over potential environmental impacts. Mineralogical analysis of the wastes and sediments was undertaken by scanning electron microscopy (SEM) and automated SEM-EDS based quantitative evaluation (QEMSCAN®). The main arsenic mineral in the mine waste was scorodite and this was significantly correlated with pseudo-total As concentrations and significantly inversely correlated with potentially mobile arsenic, as estimated from the sum of exchangeable, reducible and oxidisable arsenic fractions obtained from a sequential extraction procedure; these findings correspond with the low solubility of scorodite in acidic mine wastes. The work presented shows that the study area remains grossly polluted by historical mining and processing and illustrates the value of combining mineralogical data with acid and sequential extractions to increase our understanding of potential environmental threats. - Highlights: • Stream sediments in a former mining area remain polluted with up to 25 g As per kg. • The main arsenic mineral in adjacent mine wastes appears to be scorodite. • Low solubility scorodite was inversely correlated with potentially mobile As. • Combining mineralogical and

  3. Geochemistry and mineralogy of arsenic in mine wastes and stream sediments in a historic metal mining area in the UK

    Energy Technology Data Exchange (ETDEWEB)

    Rieuwerts, J.S., E-mail: jrieuwerts@plymouth.ac.uk [School of Geography, Earth and Environmental Sciences, Plymouth University, Plymouth PL4 8AA (United Kingdom); Mighanetara, K.; Braungardt, C.B. [School of Geography, Earth and Environmental Sciences, Plymouth University, Plymouth PL4 8AA (United Kingdom); Rollinson, G.K. [Camborne School of Mines, CEMPS, University of Exeter, Tremough Campus, Penryn, Cornwall TR10 9EZ (United Kingdom); Pirrie, D. [Helford Geoscience LLP, Menallack Farm, Treverva, Penryn, Cornwall TR10 9BP (United Kingdom); Azizi, F. [School of Geography, Earth and Environmental Sciences, Plymouth University, Plymouth PL4 8AA (United Kingdom)

    2014-02-01

    Mining generates large amounts of waste which may contain potentially toxic elements (PTE), which, if released into the wider environment, can cause air, water and soil pollution long after mining operations have ceased. The fate and toxicological impact of PTEs are determined by their partitioning and speciation and in this study, the concentrations and mineralogy of arsenic in mine wastes and stream sediments in a former metal mining area of the UK are investigated. Pseudo-total (aqua-regia extractable) arsenic concentrations in all samples from the mining area exceeded background and guideline values by 1–5 orders of magnitude, with a maximum concentration in mine wastes of 1.8 × 10{sup 5} mg kg{sup −1} As and concentrations in stream sediments of up to 2.5 × 10{sup 4} mg kg{sup −1} As, raising concerns over potential environmental impacts. Mineralogical analysis of the wastes and sediments was undertaken by scanning electron microscopy (SEM) and automated SEM-EDS based quantitative evaluation (QEMSCAN®). The main arsenic mineral in the mine waste was scorodite and this was significantly correlated with pseudo-total As concentrations and significantly inversely correlated with potentially mobile arsenic, as estimated from the sum of exchangeable, reducible and oxidisable arsenic fractions obtained from a sequential extraction procedure; these findings correspond with the low solubility of scorodite in acidic mine wastes. The work presented shows that the study area remains grossly polluted by historical mining and processing and illustrates the value of combining mineralogical data with acid and sequential extractions to increase our understanding of potential environmental threats. - Highlights: • Stream sediments in a former mining area remain polluted with up to 25 g As per kg. • The main arsenic mineral in adjacent mine wastes appears to be scorodite. • Low solubility scorodite was inversely correlated with potentially mobile As. • Combining

  4. Cancer genomics

    DEFF Research Database (Denmark)

    Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner

    2007-01-01

    Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...

  5. Informationization of coal enterprises and digital mine

    Institute of Scientific and Technical Information of China (English)

    LU Jian-jun; WANG Xiao-lu; MA Li; ZHAO An-xin

    2008-01-01

    Analyzed the main problems which were found in current conditions and prob-lems of informationization in coal enterprises. It clarified how to achieve informationizationin coal mine and put forward a general configuration of informationization construction inwhich informationization in coal enterprises was divided into two parts: informationizationof safety production and informationization of management. Planned a platform of inte-grated management of informationization in coal enterprises. Ultimately, it has broughtforward that an overall integrated digital mine is the way to achieve the goal of informa-tionization in coal enterprises, which can promote the application of automation, digitaliza-tion, networking, informaitionization to intellectualization. At the same time, the competi-tiveness of enterprises can be improved entirely, and new type of coal industry can besupported by information technology.

  6. Opinion mining and summarization for customer reviews

    Directory of Open Access Journals (Sweden)

    Sanjeev kumar Chauhan

    2012-08-01

    Full Text Available Opinion Mining is related to detect the opinion of the author expressed in the document. The primary task in the field of opinion Mining is Subjectivity Analysis which finds whether the document is subjective or objective. Subjectivity shows that the document contains some opinionated part, while the objectivity shows thatthe document is far behind from the opinionated part i.e. it has no sentiments containing. The next task is Sentiment Polarity Analysis which differentiates the documents according to positivity and negativity. But presently there is no automated system which can perform this task. We are developing a system which can findthe degree of polarity of each document and according to it assign a human like rating to that document. At last it generates the summary of review which contains only the highly subjective and feature related part of the document.

  7. Data mining in radiology.

    Science.gov (United States)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-04-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining. PMID:25024513

  8. Mining and nature conservation

    International Nuclear Information System (INIS)

    To an increasing degree the permissibility of mining projects is coming under the purview of nature conservation law. This field of law owes its current prominence largely to the amended Federal Nature Conservation Law and the recurrent effects of the communal habitat protection guidelines on mining operations. This has had momentous consequences for all sectors of the mining industry. The same applies to the rehabilitation of areas formerly used for mining, a very visible example of which is the remediation of former Wismut mines. This congress report on the Third Colloquium on Mining and Environmental Protection and the Tenth Aachen Environmental Meeting contains examples of rehabilitation measures that have been successfully implemented in various branches of mining

  9. Automated Camera Calibration

    Science.gov (United States)

    Chen, Siqi; Cheng, Yang; Willson, Reg

    2006-01-01

    Automated Camera Calibration (ACAL) is a computer program that automates the generation of calibration data for camera models used in machine vision systems. Machine vision camera models describe the mapping between points in three-dimensional (3D) space in front of the camera and the corresponding points in two-dimensional (2D) space in the camera s image. Calibrating a camera model requires a set of calibration data containing known 3D-to-2D point correspondences for the given camera system. Generating calibration data typically involves taking images of a calibration target where the 3D locations of the target s fiducial marks are known, and then measuring the 2D locations of the fiducial marks in the images. ACAL automates the analysis of calibration target images and greatly speeds the overall calibration process.

  10. Automated telescope scheduling

    Science.gov (United States)

    Johnston, Mark D.

    1988-08-01

    With the ever increasing level of automation of astronomical telescopes the benefits and feasibility of automated planning and scheduling are becoming more apparent. Improved efficiency and increased overall telescope utilization are the most obvious goals. Automated scheduling at some level has been done for several satellite observatories, but the requirements on these systems were much less stringent than on modern ground or satellite observatories. The scheduling problem is particularly acute for Hubble Space Telescope: virtually all observations must be planned in excruciating detail weeks to months in advance. Space Telescope Science Institute has recently made significant progress on the scheduling problem by exploiting state-of-the-art artificial intelligence software technology. What is especially interesting is that this effort has already yielded software that is well suited to scheduling groundbased telescopes, including the problem of optimizing the coordinated scheduling of more than one telescope.

  11. The UCSC Genome Browser Database: update 2006

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R;

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, m......RNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety...... of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering...

  12. Economics of mine water treatment

    OpenAIRE

    Dvořáček, Jaroslav; Vidlář, Jiří; Štěrba, Jiří; Heviánková, Silvie; Vaněk, Michal; Barták, Pavel

    2012-01-01

    Mine water poses a significant problem in lignite coal mining. The drainage of mine water is the fundamental prerequisite of mining operations. Under the legislation of the Czech Republic, mine water that discharges into surface watercourse is subject to the permission of the state administration body in the water management sector. The permission also stipulates the limits for mine water pollution. Therefore, mine water has to be purified prior to discharge. Although all...

  13. Myths in test automation

    OpenAIRE

    Jazmine Francis

    2015-01-01

    Myths in automation of software testing is an issue of discussion that echoes about the areas of service in validation of software industry. Probably, the first though that appears in knowledgeable reader would be Why this old topic again? What's New to discuss the matter? But, for the first time everyone agrees that undoubtedly automation testing today is not today what it used to be ten or fifteen years ago, because it has evolved in scope and magnitude. What began as a simple linear script...

  14. Automated phantom assay system

    International Nuclear Information System (INIS)

    This paper describes an automated phantom assay system developed for assaying phantoms spiked with minute quantities of radionuclides. The system includes a computer-controlled linear-translation table that positions the phantom at exact distances from a spectrometer. A multichannel analyzer (MCA) interfaces with a computer to collect gamma spectral data. Signals transmitted between the controller and MCA synchronize data collection and phantom positioning. Measured data are then stored on disk for subsequent analysis. The automated system allows continuous unattended operation and ensures reproducible results

  15. Design and implementation of the cacao genome database

    Science.gov (United States)

    The Cacao Genome Database (CGD, www.cacaogenomedb.org) is being developed to provide a comprehensive data mining resource of genomic, genetic and breeding data for Theobroma cacao. Designed using Chado and a collection of Drupal modules, known as Tripal, CGD currently contains the genetically anchor...

  16. Mine your own business! Mine other's news!

    OpenAIRE

    Pham, Quang-Khai; Saint-Paul, Régis; Benatallah, Boualem; Mouaddib, Noureddine; Raschia, Guillaume

    2008-01-01

    Major media companies such as The Financial Times, the Wall Street Journal or Reuters generate huge amounts of textual news data on a daily basis. Mining frequent patterns in this mass of information is critical for knowledge workers such as financial analysts, stock traders or economists. Using existing frequent pattern mining (FPM) algorithms for the analysis of news data is difficult because of the size and lack of structuring of the free text news content. In this article, we demonstrate ...

  17. Text Mining Perspectives in Microarray Data Mining

    OpenAIRE

    Natarajan, Jeyakumar

    2013-01-01

    Current microarray data mining methods such as clustering, classification, and association analysis heavily rely on statistical and machine learning algorithms for analysis of large sets of gene expression data. In recent years, there has been a growing interest in methods that attempt to discover patterns based on multiple but related data sources. Gene expression data and the corresponding literature data are one such example. This paper suggests a new approach to microarray data mining as ...

  18. Data mining, mining data : energy consumption modelling

    Energy Technology Data Exchange (ETDEWEB)

    Dessureault, S. [Arizona Univ., Tucson, AZ (United States)

    2007-09-15

    Most modern mining operations are accumulating large amounts of data on production and business processes. Data, however, provides value only if it can be translated into information that appropriate users can utilize. This paper emphasized that a new technological focus should emerge, notably how to concentrate data into information; analyze information sufficiently to become knowledge; and, act on that knowledge. Researchers at the Mining Information Systems and Operations Management (MISOM) laboratory at the University of Arizona have created a method to transform data into action. The data-to-action approach was exercised in the development of an energy consumption model (ECM), in partnership with a major US-based copper mining company, 2 software companies, and the MISOM laboratory. The approach begins by integrating several key data sources using data warehousing techniques, and increasing the existing level of integration and data cleaning. An online analytical processing (OLAP) cube was also created to investigate the data and identify a subset of several million records. Data mining algorithms were applied using the information that was isolated by the OLAP cube. The data mining results showed that traditional cost drivers of energy consumption are poor predictors. A comparison was made between traditional methods of predicting energy consumption and the prediction formed using data mining. Traditionally, in the mines for which data were available, monthly averages of tons and distance are used to predict diesel fuel consumption. However, this article showed that new information technology can be used to incorporate many more variables into the budgeting process, resulting in more accurate predictions. The ECM helped mine planners improve the prediction of energy use through more data integration, measure development, and workflow analysis. 5 refs., 11 figs.

  19. Data mining approach to model the diagnostic service management.

    Science.gov (United States)

    Lee, Sun-Mi; Lee, Ae-Kyung; Park, Il-Su

    2006-01-01

    Korea has National Health Insurance Program operated by the government-owned National Health Insurance Corporation, and diagnostic services are provided every two year for the insured and their family members. Developing a customer relationship management (CRM) system using data mining technology would be useful to improve the performance of diagnostic service programs. Under these circumstances, this study developed a model for diagnostic service management taking into account the characteristics of subjects using a data mining approach. This study could be further used to develop an automated CRM system contributing to the increase in the rate of receiving diagnostic services. PMID:17102454

  20. Mining and processing of uranium ores in the USSR

    International Nuclear Information System (INIS)

    Experience gained in uranium ore mining by modern methods in combination with underground and heap leaching is summarized. More intensive processing of low-grade ores has been achieved through the use of autoclave leaching, sorptive treatment of thick pulps, extractive separation of pure uranium compounds, automated continuous sorption devices of high efficiency for processing the underground- and heap-leaching liquors, natural and mine water, and recovery of molybdenum, vanadium, scandium, rare earths and phosphate fertilizers from low-grade ores. Production of ion-exchangers and extractants has been developed and processes for concomitant recovery of copper, gold, ionium, tungsten, caesium, zirconium, tantalum, nickel and cobalt have been designed. (author)

  1. Allele mining and enhanced genetic recombination for rice breeding.

    Science.gov (United States)

    Leung, Hei; Raghavan, Chitra; Zhou, Bo; Oliva, Ricardo; Choi, Il Ryong; Lacorte, Vanica; Jubay, Mona Liza; Cruz, Casiana Vera; Gregorio, Glenn; Singh, Rakesh Kumar; Ulat, Victor Jun; Borja, Frances Nikki; Mauleon, Ramil; Alexandrov, Nickolai N; McNally, Kenneth L; Sackville Hamilton, Ruaraidh

    2015-12-01

    Traditional rice varieties harbour a large store of genetic diversity with potential to accelerate rice improvement. For a long time, this diversity maintained in the International Rice Genebank has not been fully used because of a lack of genome information. The publication of the first reference genome of Nipponbare by the International Rice Genome Sequencing Project (IRGSP) marked the beginning of a systematic exploration and use of rice diversity for genetic research and breeding. Since then, the Nipponbare genome has served as the reference for the assembly of many additional genomes. The recently completed 3000 Rice Genomes Project together with the public database (SNP-Seek) provides a new genomic and data resource that enables the identification of useful accessions for breeding. Using disease resistance traits as case studies, we demonstrated the power of allele mining in the 3,000 genomes for extracting accessions from the GeneBank for targeted phenotyping. Although potentially useful landraces can now be identified, their use in breeding is often hindered by unfavourable linkages. Efficient breeding designs are much needed to transfer the useful diversity to breeding. Multi-parent Advanced Generation InterCross (MAGIC) is a breeding design to produce highly recombined populations. The MAGIC approach can be used to generate pre-breeding populations with increased genotypic diversity and reduced linkage drag. Allele mining combined with a multi-parent breeding design can help convert useful diversity into breeding-ready genetic resources. PMID:26606925

  2. Automated conflict resolution issues

    Science.gov (United States)

    Wike, Jeffrey S.

    1991-01-01

    A discussion is presented of how conflicts for Space Network resources should be resolved in the ATDRSS era. The following topics are presented: a description of how resource conflicts are currently resolved; a description of issues associated with automated conflict resolution; present conflict resolution strategies; and topics for further discussion.

  3. Protokoller til Home Automation

    DEFF Research Database (Denmark)

    Kjær, Kristian Ellebæk

    2008-01-01

    computer, der kan skifte mellem foruddefinerede indstillinger. Nogle gange kan computeren fjernstyres over internettet, så man kan se hjemmets status fra en computer eller måske endda fra en mobiltelefon. Mens nævnte anvendelser er klassiske indenfor home automation, er yderligere funktionalitet dukket op...

  4. Myths in test automation

    Directory of Open Access Journals (Sweden)

    Jazmine Francis

    2015-01-01

    Full Text Available Myths in automation of software testing is an issue of discussion that echoes about the areas of service in validation of software industry. Probably, the first though that appears in knowledgeable reader would be Why this old topic again? What's New to discuss the matter? But, for the first time everyone agrees that undoubtedly automation testing today is not today what it used to be ten or fifteen years ago, because it has evolved in scope and magnitude. What began as a simple linear scripts for web applications today has a complex architecture and a hybrid framework to facilitate the implementation of testing applications developed with various platforms and technologies. Undoubtedly automation has advanced, but so did the myths associated with it. The change in perspective and knowledge of people on automation has altered the terrain. This article reflects the points of views and experience of the author in what has to do with the transformation of the original myths in new versions, and how they are derived; also provides his thoughts on the new generation of myths.

  5. Automated data model evaluation

    International Nuclear Information System (INIS)

    Modeling process is essential phase within information systems development and implementation. This paper presents methods and techniques for analysis and evaluation of data model correctness. Recent methodologies and development results regarding automation of the process of model correctness analysis and relations with ontology tools has been presented. Key words: Database modeling, Data model correctness, Evaluation

  6. Automated solvent concentrator

    Science.gov (United States)

    Griffith, J. S.; Stuart, J. L.

    1976-01-01

    Designed for automated drug identification system (AUDRI), device increases concentration by 100. Sample is first filtered, removing particulate contaminants and reducing water content of sample. Sample is extracted from filtered residue by specific solvent. Concentrator provides input material to analysis subsystem.

  7. ELECTROPNEUMATIC AUTOMATION EDUCATIONAL LABORATORY

    OpenAIRE

    Dolgorukov, S. O.; National Aviation University; Roman, B. V.; National Aviation University

    2013-01-01

    The article reflects current situation in education regarding mechatronics learning difficulties. Com-plex of laboratory test benches on electropneumatic automation are considered as a tool in advancing through technical science. Course of laboratory works developed to meet the requirement of efficient and reliable way of practical skills acquisition is regarded the simplest way for students to learn the ba-sics of mechatronics.

  8. Automating Shallow Seismic Imaging

    Energy Technology Data Exchange (ETDEWEB)

    Steeples, Don W.

    2004-12-09

    This seven-year, shallow-seismic reflection research project had the aim of improving geophysical imaging of possible contaminant flow paths. Thousands of chemically contaminated sites exist in the United States, including at least 3,700 at Department of Energy (DOE) facilities. Imaging technologies such as shallow seismic reflection (SSR) and ground-penetrating radar (GPR) sometimes are capable of identifying geologic conditions that might indicate preferential contaminant-flow paths. Historically, SSR has been used very little at depths shallower than 30 m, and even more rarely at depths of 10 m or less. Conversely, GPR is rarely useful at depths greater than 10 m, especially in areas where clay or other electrically conductive materials are present near the surface. Efforts to image the cone of depression around a pumping well using seismic methods were only partially successful (for complete references of all research results, see the full Final Technical Report, DOE/ER/14826-F), but peripheral results included development of SSR methods for depths shallower than one meter, a depth range that had not been achieved before. Imaging at such shallow depths, however, requires geophone intervals of the order of 10 cm or less, which makes such surveys very expensive in terms of human time and effort. We also showed that SSR and GPR could be used in a complementary fashion to image the same volume of earth at very shallow depths. The primary research focus of the second three-year period of funding was to develop and demonstrate an automated method of conducting two-dimensional (2D) shallow-seismic surveys with the goal of saving time, effort, and money. Tests involving the second generation of the hydraulic geophone-planting device dubbed the ''Autojuggie'' showed that large numbers of geophones can be placed quickly and automatically and can acquire high-quality data, although not under rough topographic conditions. In some easy

  9. Automating spectral measurements

    Science.gov (United States)

    Goldstein, Fred T.

    2008-09-01

    This paper discusses the architecture of software utilized in spectroscopic measurements. As optical coatings become more sophisticated, there is mounting need to automate data acquisition (DAQ) from spectrophotometers. Such need is exacerbated when 100% inspection is required, ancillary devices are utilized, cost reduction is crucial, or security is vital. While instrument manufacturers normally provide point-and-click DAQ software, an application programming interface (API) may be missing. In such cases automation is impossible or expensive. An API is typically provided in libraries (*.dll, *.ocx) which may be embedded in user-developed applications. Users can thereby implement DAQ automation in several Windows languages. Another possibility, developed by FTG as an alternative to instrument manufacturers' software, is the ActiveX application (*.exe). ActiveX, a component of many Windows applications, provides means for programming and interoperability. This architecture permits a point-and-click program to act as automation client and server. Excel, for example, can control and be controlled by DAQ applications. Most importantly, ActiveX permits ancillary devices such as barcode readers and XY-stages to be easily and economically integrated into scanning procedures. Since an ActiveX application has its own user-interface, it can be independently tested. The ActiveX application then runs (visibly or invisibly) under DAQ software control. Automation capabilities are accessed via a built-in spectro-BASIC language with industry-standard (VBA-compatible) syntax. Supplementing ActiveX, spectro-BASIC also includes auxiliary serial port commands for interfacing programmable logic controllers (PLC). A typical application is automatic filter handling.

  10. Collaborative Data Mining

    Science.gov (United States)

    Moyle, Steve

    Collaborative Data Mining is a setting where the Data Mining effort is distributed to multiple collaborating agents - human or software. The objective of the collaborative Data Mining effort is to produce solutions to the tackled Data Mining problem which are considered better by some metric, with respect to those solutions that would have been achieved by individual, non-collaborating agents. The solutions require evaluation, comparison, and approaches for combination. Collaboration requires communication, and implies some form of community. The human form of collaboration is a social task. Organizing communities in an effective manner is non-trivial and often requires well defined roles and processes. Data Mining, too, benefits from a standard process. This chapter explores the standard Data Mining process CRISP-DM utilized in a collaborative setting.

  11. Implementation of Paste Backfill Mining Technology in Chinese Coal Mines

    Directory of Open Access Journals (Sweden)

    Qingliang Chang

    2014-01-01

    Full Text Available Implementation of clean mining technology at coal mines is crucial to protect the environment and maintain balance among energy resources, consumption, and ecology. After reviewing present coal clean mining technology, we introduce the technology principles and technological process of paste backfill mining in coal mines and discuss the components and features of backfill materials, the constitution of the backfill system, and the backfill process. Specific implementation of this technology and its application are analyzed for paste backfill mining in Daizhuang Coal Mine; a practical implementation shows that paste backfill mining can improve the safety and excavation rate of coal mining, which can effectively resolve surface subsidence problems caused by underground mining activities, by utilizing solid waste such as coal gangues as a resource. Therefore, paste backfill mining is an effective clean coal mining technology, which has widespread application.

  12. Treatment of mine-water from decommissioning uranium mines

    International Nuclear Information System (INIS)

    Treatment methods for mine-water from decommissioning uranium mines are introduced and classified. The suggestions on optimal treatment methods are presented as a matter of experience with decommissioned Chenzhou Uranium Mine

  13. Implementation of Paste Backfill Mining Technology in Chinese Coal Mines

    Science.gov (United States)

    Chang, Qingliang; Zhou, Huaqiang; Bai, Jianbiao

    2014-01-01

    Implementation of clean mining technology at coal mines is crucial to protect the environment and maintain balance among energy resources, consumption, and ecology. After reviewing present coal clean mining technology, we introduce the technology principles and technological process of paste backfill mining in coal mines and discuss the components and features of backfill materials, the constitution of the backfill system, and the backfill process. Specific implementation of this technology and its application are analyzed for paste backfill mining in Daizhuang Coal Mine; a practical implementation shows that paste backfill mining can improve the safety and excavation rate of coal mining, which can effectively resolve surface subsidence problems caused by underground mining activities, by utilizing solid waste such as coal gangues as a resource. Therefore, paste backfill mining is an effective clean coal mining technology, which has widespread application. PMID:25258737

  14. Mining in South Africa

    Energy Technology Data Exchange (ETDEWEB)

    Brewis, T.

    1992-09-01

    Poor metals prices, high inflation, rapid political change and (in the case of gold) hot, deep mines all pose challenging problems. Nevertheless, mining plays an essential role in the South African economy and the long-term outlook is positive. The article discusses recent developments in technology and gives production figures for mining of gold, coal, platinum, diamonds, ferrous metals, non-ferrous metals and industrial minerals. 4 refs., 1 tab., 5 photos.

  15. DATA MINING TECHNOLOGIES

    OpenAIRE

    Titrade Cristina-Maria

    2010-01-01

    Knowledge discovery and data mining software (Knowledge Discovery and Data Mining - KDD) as an interdisciplinary field emersion have been in rapid growth to merge databases, statistics, industries closely related to the desire to extract valuable information and knowledge in a volume as possible.There is a difference in understanding of "knowledge discovery" and "data mining." Discovery information (Knowledge Discovery) in the database is a process to identify patterns / templates of valid da...

  16. Identification and evolutionary genomics of novel LTR retrotransposons in Brassica

    OpenAIRE

    NOUROZ, FAISAL; NOREEN, SHUMAILA; HESLOP-HARRISON, JOHN SEYMOUR

    2015-01-01

    Abstract: Retrotransposons (REs) are the most abundant and diverse elements identified from eukaryotic genomes. Using computational and molecular methods, 262 intact LTR retrotransposons were identified from Brassica genomes by dot plot analysis and data mining. The Copia superfamily was dominant (206 elements) over Gypsy (56), with estimated intact copies of ~1596 Copia and 540 Gypsy and ~7540 Copia and 780 Gypsy from Brassica rapa and Brassica oleracea whole genomes, respectively. Canonical...

  17. BRAD, the genetics and genomics database for Brassica plants

    OpenAIRE

    Li Pingxia; Liu Bo; Sun Silong; Fang Lu; Wu Jian; Liu Shengyi; Cheng Feng; Hua Wei; Wang Xiaowu

    2011-01-01

    Abstract Background Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data. D...

  18. The Genome Database for Rosaceae (GDR): year 10 update

    OpenAIRE

    Jung, Sook; Stephen P Ficklin; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G; Mueller, Lukas A.; Olmstead, Mercy A.

    2013-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae ge...

  19. Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities

    OpenAIRE

    Chen, Kevin; Pachter, Lior

    2005-01-01

    The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fe...

  20. Uranium mining and milling

    International Nuclear Information System (INIS)

    In this report uranium mining and milling are reviewed. The fuel cycle, different types of uranium geological deposits, blending of ores, open cast and underground mining, the mining cost and radiation protection in mines are treated in the first part of this report. In the second part, the milling of uranium ores is treated, including process technology, acid and alkaline leaching, process design for physical and chemical treatment of the ores, and the cost. Each chapter is clarified by added figures, diagrams, tables, and flowsheets. (HK)

  1. Responsible Mining: A Human Resources Strategy for Mine Development Project

    OpenAIRE

    Sampathkumar, Sriram (Ram)

    2012-01-01

    Mining is a global industry. Most mining companies operate internationally, often in remote, challenging environments and consequently frequently have respond to unusual and demanding Human Resource (HR) requirements. It is my opinion that the strategic imperative behind success in mining industry is responsible mining. The purpose of this paper is to examine how an effective HR strategy can be a competitive advantage that contributes to the success of a mining project in the global mining in...

  2. Physics Mining of Multi-Source Data Sets

    Science.gov (United States)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  3. Statistical data analytics foundations for data mining, informatics, and knowledge discovery

    CERN Document Server

    Piegorsch, Walter W

    2015-01-01

      A comprehensive introduction to statistical methods for data mining and knowledge discovery.Applications of data mining and 'big data' increasingly take center stage in our modern, knowledge-driven society, supported by advances in computing power, automated data acquisition, social media development and interactive, linkable internet software.  This book presents a coherent, technical introduction to modern statistical learning and analytics, starting from the core foundations of statistics and probability. It includes an overview of probability and statistical distributions, basic

  4. Advanced Text Mining Methods for the Financial Markets and Forecasting of Intraday Volatility

    OpenAIRE

    Pieper, Michael J.

    2011-01-01

    The flow of information in financial markets is covered in two parts. An high-order estimator of intraday volatility is introduced in order to boost risk forecasts. Over the last decade, text mining of news and its application to finance were a vibrant topic of research as well as in the finance industry. This thesis develops a coherent approach to financial text mining that can be utilized for automated trading.

  5. Developing Image Processing Meta-Algorithms with Data Mining of Multiple Metrics

    OpenAIRE

    Kelvin Leung; Alexandre Cunha; TOGA, A. W.; D. Stott Parker

    2014-01-01

    People often use multiple metrics in image processing, but here we take a novel approach of mining the values of batteries of metrics on image processing results. We present a case for extending image processing methods to incorporate automated mining of multiple image metric values. Here by a metric we mean any image similarity or distance measure, and in this paper we consider intensity-based and statistical image measures and focus on registration as an image processing problem. We show ho...

  6. Automation of a single-DNA molecule stretching device

    DEFF Research Database (Denmark)

    Sørensen, Kristian Tølbøl; Lopacinska, Joanna M.; Tommerup, Niels;

    2015-01-01

    We automate the manipulation of genomic-length DNA in a nanofluidic device based on real-time analysis of fluorescence images. In our protocol, individual molecules are picked from a microchannel and stretched with pN forces using pressure driven flows. The millimeter-long DNA fragments free......, we demonstrate how to estimate the length of molecules by continuous real-time image stitching and how to increase the effective resolution of a pressure controller by pulse width modulation. The sequence of image-processing steps addresses the challenges of genomic-length DNA visualization; however...

  7. Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics

    OpenAIRE

    Fei, Zhangjun; Joung, Je-Gun; Tang, Xuemei; Zheng, Yi; Huang, Mingyun; Lee, Je Min; McQuinn, Ryan; Tieman, Denise M.; Alba, Rob; Klee, Harry J.; Giovannoni, James J

    2010-01-01

    Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respe...

  8. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context...... enabling automation of repeated tasks. This tool can be relevant in many pangenomic as well as in metagenomic studies, by giving a quick overview of clusters of insertion sites, genomic islands and overall homology between a reference sequence and a data set.......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...

  9. Using data mining to predict secondary school student performance

    OpenAIRE

    Cortez, Paulo; Silva, Alice Maria Gonçalves

    2008-01-01

    Although the educational level of the Portuguese population has improved in the last decades, the statistics keep Portugal at Europe’s tail end due to its high student failure rates. In particular, lack of success in the core classes of Mathematics and the Portuguese language is extremely serious. On the other hand, the fields of Business Intelligence (BI)/Data Mining (DM), which aim at extracting high-level knowledge from raw data, offer interesting automated tools that can aid the educat...

  10. Data Mining Cultural Aspects of Social Media Marketing

    OpenAIRE

    Hochreiter, Ronald; Waldhauser, Christoph

    2014-01-01

    For marketing to function in a globalized world it must respect a diverse set of local cultures. With marketing efforts extending to social media platforms, the crossing of cultural boundaries can happen in an instant. In this paper we examine how culture influences the popularity of marketing messages in social media platforms. Text mining, automated translation and sentiment analysis contribute largely to our research. From our analysis of 400 posts on the localized Google+ pages of German ...

  11. MTGD: The Medicago truncatula genome database.

    Science.gov (United States)

    Krishnakumar, Vivek; Kim, Maria; Rosen, Benjamin D; Karamycheva, Svetlana; Bidwell, Shelby L; Tang, Haibao; Town, Christopher D

    2015-01-01

    Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant 'mines' such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community. PMID:25432968

  12. Automated Wildfire Detection Through Artificial Neural Networks

    Science.gov (United States)

    Miller, Jerry; Borne, Kirk; Thomas, Brian; Huang, Zhenping; Chi, Yuechen

    2005-01-01

    We have tested and deployed Artificial Neural Network (ANN) data mining techniques to analyze remotely sensed multi-channel imaging data from MODIS, GOES, and AVHRR. The goal is to train the ANN to learn the signatures of wildfires in remotely sensed data in order to automate the detection process. We train the ANN using the set of human-detected wildfires in the U.S., which are provided by the Hazard Mapping System (HMS) wildfire detection group at NOAA/NESDIS. The ANN is trained to mimic the behavior of fire detection algorithms and the subjective decision- making by N O M HMS Fire Analysts. We use a local extremum search in order to isolate fire pixels, and then we extract a 7x7 pixel array around that location in 3 spectral channels. The corresponding 147 pixel values are used to populate a 147-dimensional input vector that is fed into the ANN. The ANN accuracy is tested and overfitting is avoided by using a subset of the training data that is set aside as a test data set. We have achieved an automated fire detection accuracy of 80-92%, depending on a variety of ANN parameters and for different instrument channels among the 3 satellites. We believe that this system can be deployed worldwide or for any region to detect wildfires automatically in satellite imagery of those regions. These detections can ultimately be used to provide thermal inputs to climate models.

  13. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  14. Rumen microbial genomics

    International Nuclear Information System (INIS)

    cellulase systems are employed by at least some ruminal bacteria. But is that enough? 'Metagenomics' is a term coined with reference to the genetic potential resident within an entire microbial community, and is dependent upon high throughput DNA sequencing, advances in recombinant DNA technologies, and computational biology. It is anticipated that metagenomics will significantly augment the rumen genome studies that are already underway, and allow for the genetic characterization of microbes that cannot currently be cultured in the laboratory. The genetic potential of these species, which undoubtedly make a significant contribution to the ecology of the rumen environment have, until now, escaped attention. The '-omics' technologies also offer exciting new opportunities to investigate microbial diversity and physiology in ruminants, other herbivorous animals, and humans. Hopefully, the current model that has been established by the North American Consortium will be just the beginning, but we are aware that many challenges lay ahead in terms of funding, data acquisition, data mining, and data interpretation. The benefits from these studies will however have global implications for animal productivity. (author)

  15. Text Mining for Protein Docking.

    Directory of Open Access Journals (Sweden)

    Varsha D Badal

    2015-12-01

    Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound

  16. Automated Preferences Elicitation

    Czech Academy of Sciences Publication Activity Database

    Kárný, Miroslav; Guy, Tatiana Valentine

    Prague : Institute of Information Theory and Automation, 2011, s. 20-25. ISBN 978-80-903834-6-3. [The 2nd International Workshop od Decision Making with Multiple Imperfect Decision Makers. Held in Conjunction with the 25th Annual Conference on Neural Information Processing Systems (NIPS 2011). Sierra Nevada (ES), 16.12.2011-16.12.2011] R&D Projects: GA MŠk 1M0572; GA ČR GA102/08/0567 Institutional research plan: CEZ:AV0Z10750506 Keywords : elicitation * decision making * Bayesian decision making * fully probabilistic design Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2011/AS/karny-automated preferences elicitation.pdf

  17. Automated drawing generation system

    International Nuclear Information System (INIS)

    Since automated CAD drawing generation systems still require human intervention, improvements were focussed on an interactive processing section (data input and correcting operation) which necessitates a vast amount of work. As a result, human intervention was eliminated, the original objective of a computerized system. This is the first step taken towards complete automation. The effects of development and commercialization of the system are as described below. (1) The interactive processing time required for generating drawings was improved. It was determined that introduction of the CAD system has reduced the time required for generating drawings. (2) The difference in skills between workers preparing drawings has been eliminated and the quality of drawings has been made uniform. (3) The extent of knowledge and experience demanded of workers has been reduced. (author)

  18. Terminal automation system maintenance

    Energy Technology Data Exchange (ETDEWEB)

    Coffelt, D.; Hewitt, J. [Engineered Systems Inc., Tempe, AZ (United States)

    1997-01-01

    Nothing has improved petroleum product loading in recent years more than terminal automation systems. The presence of terminal automation systems (TAS) at loading racks has increased operational efficiency and safety and enhanced their accounting and management capabilities. However, like all finite systems, they occasionally malfunction or fail. Proper servicing and maintenance can minimize this. And in the unlikely event a TAS breakdown does occur, prompt and effective troubleshooting can reduce its impact on terminal productivity. To accommodate around-the-clock loading at racks, increasingly unattended by terminal personnel, TAS maintenance, servicing and troubleshooting has become increasingly demanding. It has also become increasingly important. After 15 years of trial and error at petroleum and petrochemical storage and transfer terminals, a number of successful troubleshooting programs have been developed. These include 24-hour {open_quotes}help hotlines,{close_quotes} internal (terminal company) and external (supplier) support staff, and {open_quotes}layered{close_quotes} support. These programs are described.

  19. ATLAS Distributed Computing Automation

    CERN Document Server

    Schovancova, J; The ATLAS collaboration; Borrego, C; Campana, S; Di Girolamo, A; Elmsheuser, J; Hejbal, J; Kouba, T; Legger, F; Magradze, E; Medrano Llamas, R; Negri, G; Rinaldi, L; Sciacca, G; Serfon, C; Van Der Ster, D C

    2012-01-01

    The ATLAS Experiment benefits from computing resources distributed worldwide at more than 100 WLCG sites. The ATLAS Grid sites provide over 100k CPU job slots, over 100 PB of storage space on disk or tape. Monitoring of status of such a complex infrastructure is essential. The ATLAS Grid infrastructure is monitored 24/7 by two teams of shifters distributed world-wide, by the ATLAS Distributed Computing experts, and by site administrators. In this paper we summarize automation efforts performed within the ATLAS Distributed Computing team in order to reduce manpower costs and improve the reliability of the system. Different aspects of the automation process are described: from the ATLAS Grid site topology provided by the ATLAS Grid Information System, via automatic site testing by the HammerCloud, to automatic exclusion from production or analysis activities.

  20. Rapid automated nuclear chemistry

    International Nuclear Information System (INIS)

    Rapid Automated Nuclear Chemistry (RANC) can be thought of as the Z-separation of Neutron-rich Isotopes by Automated Methods. The range of RANC studies of fission and its products is large. In a sense, the studies can be categorized into various energy ranges from the highest where the fission process and particle emission are considered, to low energies where nuclear dynamics are being explored. This paper presents a table which gives examples of current research using RANC on fission and fission products. The remainder of this text is divided into three parts. The first contains a discussion of the chemical methods available for the fission product elements, the second describes the major techniques, and in the last section, examples of recent results are discussed as illustrations of the use of RANC

  1. Rapid automated nuclear chemistry

    Energy Technology Data Exchange (ETDEWEB)

    Meyer, R.A.

    1979-05-31

    Rapid Automated Nuclear Chemistry (RANC) can be thought of as the Z-separation of Neutron-rich Isotopes by Automated Methods. The range of RANC studies of fission and its products is large. In a sense, the studies can be categorized into various energy ranges from the highest where the fission process and particle emission are considered, to low energies where nuclear dynamics are being explored. This paper presents a table which gives examples of current research using RANC on fission and fission products. The remainder of this text is divided into three parts. The first contains a discussion of the chemical methods available for the fission product elements, the second describes the major techniques, and in the last section, examples of recent results are discussed as illustrations of the use of RANC.

  2. Automated Alphabet Reduction for Protein Datasets

    Directory of Open Access Journals (Sweden)

    Valencia Alfonso

    2009-01-01

    Full Text Available Abstract Background We investigate automated and generic alphabet reduction techniques for protein structure prediction datasets. Reducing alphabet cardinality without losing key biochemical information opens the door to potentially faster machine learning, data mining and optimization applications in structural bioinformatics. Furthermore, reduced but informative alphabets often result in, e.g., more compact and human-friendly classification/clustering rules. In this paper we propose a robust and sophisticated alphabet reduction protocol based on mutual information and state-of-the-art optimization techniques. Results We applied this protocol to the prediction of two protein structural features: contact number and relative solvent accessibility. For both features we generated alphabets of two, three, four and five letters. The five-letter alphabets gave prediction accuracies statistically similar to that obtained using the full amino acid alphabet. Moreover, the automatically designed alphabets were compared against other reduced alphabets taken from the literature or human-designed, outperforming them. The differences between our alphabets and the alphabets taken from the literature were quantitatively analyzed. All the above process had been performed using a primary sequence representation of proteins. As a final experiment, we extrapolated the obtained five-letter alphabet to reduce a, much richer, protein representation based on evolutionary information for the prediction of the same two features. Again, the performance gap between the full representation and the reduced representation was small, showing that the results of our automated alphabet reduction protocol, even if they were obtained using a simple representation, are also able to capture the crucial information needed for state-of-the-art protein representations. Conclusion Our automated alphabet reduction protocol generates competent reduced alphabets tailored specifically for a

  3. Automated Microbial Metabolism Laboratory

    Science.gov (United States)

    1973-01-01

    Development of the automated microbial metabolism laboratory (AMML) concept is reported. The focus of effort of AMML was on the advanced labeled release experiment. Labeled substrates, inhibitors, and temperatures were investigated to establish a comparative biochemical profile. Profiles at three time intervals on soil and pure cultures of bacteria isolated from soil were prepared to establish a complete library. The development of a strategy for the return of a soil sample from Mars is also reported.

  4. Components for automated microscopy

    Science.gov (United States)

    Determann, H.; Hartmann, H.; Schade, K. H.; Stankewitz, H. W.

    1980-12-01

    A number of devices, aiming at automated analysis of microscopic objects as regards their morphometrical parameters or their photometrical values, were developed. These comprise: (1) a device for automatic focusing tuned on maximum contrast; (2) a feedback system for automatic optimization of microscope illumination; and (3) microscope lenses with adjustable pupil distances for usage in the two previous devices. An extensive test program on histological and zytological applications proves the wide application possibilities of the autofocusing device.

  5. Automation of dissolution tests

    OpenAIRE

    Rolf Rolli

    2003-01-01

    Dissolution testing of drug formulations was introduced in the 1960s and accepted by health regulatory authorities in the 1970s. Since then, the importance of dissolution has grown rapidly as have the number of tests and demands in quality-control laboratories. Recent research works lead to the development of in-vitro dissolution tests as replacements for human and animal bioequivalence studies. For many years, a lot of time and effort has been invested in automation of dissolution tests. The...

  6. Automated uranium assays

    International Nuclear Information System (INIS)

    Precise, timely inventories of enriched uranium stocks are vital to help prevent the loss, theft, or diversion of this material for illicit use. A wet-chemistry analyzer has been developed at LLL to assist in these inventories by performing automated analyses of uranium samples from different stages in the nuclear fuel cycle. These assays offer improved accuracy, reduced costs, significant savings in manpower, and lower radiation exposure for personnel compared with present techniques

  7. Construction Automation and Robotics

    OpenAIRE

    Bock, Thomas

    2008-01-01

    Due to the high complexity of the construction process and the stagnating technological development a long-term preparation is necessary to adapt it to advanced construction methods. Architects, engineers and all other participants of the construction process have to be integrated in this adaptation process. The short- and long-term development of automation will take place step-by-step and will be oriented to the respective application and requirements. In the initial phase existing building...

  8. Shielded cells transfer automation

    International Nuclear Information System (INIS)

    Nuclear waste from shielded cells is removed, packaged, and transferred manually in many nuclear facilities. To reduce radiation exposure to operators, technological advances in remote handling and automation were employed. An industrial robot and a specially designed end effector, access port, and sealing machine were used to remotely bag waste containers out of a glove box. The system is operated from a control panel outside the work area via television cameras

  9. Automated protein-DNA interaction screening of Drosophila regulatory elements

    OpenAIRE

    Hens, Korneel; Feuz, Jean-Daniel; Isakova, Alina; Iagovitina, Antonina; Massouras, Andreas; Bryois, Julien; Callaerts, Patrick; Celniker, Susan E.; Deplancke, Bart

    2011-01-01

    Drosophila melanogaster has one of the best characterized metazoan genomes in terms of functionally annotated regulatory elements. To explore how these elements contribute to gene regulation in the context of gene regulatory networks, we need convenient tools to identify the proteins that bind to them. Here, we present the development and validation of a highly automated protein-DNA interaction detection method, enabling the high-throughput yeast one-hybrid-based screening of DNA elements ver...

  10. LINAC control automation system

    International Nuclear Information System (INIS)

    A 7 MeV Electron Beam Linear Accelerator (LINAC) being used for pulse radiolysis experiments at RC and CDD, B.A.R.C. has been automated with a PLC based control panel designed and developed by Computer Division, B.A.R.C.. The control panel after power on switches ON various units in a pre-defined sequence and intervals on a single turn of START key from OFF to ON position. The control panel also generates various ramp signals in a pre-defined sequence and rate and steady values and feeds to the LINAC bringing it to the ready for experiment condition. Similarly on a single turn of STOP key from OFF to ON position, the panel ramps down the various signals in pre-defined manners and makes OFF the various units in predefined sequence and timing providing safety to the machine. The steady values for various signals are on line settable as and when required so. This automation system relieves the operator from fatigue of time consuming manual ramping up or down of various signals and running around in four rooms for switching ON or OFF the various units enhancing efficiency and safety. This also facilitates the user scientist to do start up and shutdown operation in the absence of skilled operators and thus adds flexibility for working up to extended timing. This unit has been working satisfactorily since August 2002. For extraordinary condition automation to manual or vice versa change over has been provided. (author)

  11. Challenges in mine safety

    International Nuclear Information System (INIS)

    Some of the key issues discussed are: new technology in mining and its impact on safety, risk assessment and accident prevention, information technology and mine safety, quality assurance and safety, workplace monitoring and occupational health surveys, disaster prevention and management plans, environmental concerns and public acceptance and human resource development

  12. The Penarroya mining railway

    International Nuclear Information System (INIS)

    The French Society Miniere et Metallurgique de Panarroya mining railway, 241 km long, was the second largest private narrow gauge railway in Spain. Located in the inland. linked the coal and galena mines with foundries and also with the national railroads grid, to transport the minerals to national and foreign markets. (Author)

  13. Ghana - Mining and Development

    OpenAIRE

    P. C. Mohan

    2004-01-01

    The objectives of the project ($9.37 million, 1996-2001) were to (a) enhance the capacity of the mining sector institutions to carry out their functions of encouraging and regulating investments in the mining sector in an environmentally sound manner and (b) support the use of techniques and mechanisms that will improve productivity, financial viability and reduce the environmental impact of ...

  14. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  15. Boosting association rule mining in large datasets via Gibbs sampling.

    Science.gov (United States)

    Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

    2016-05-01

    Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm. PMID:27091963

  16. Geochemistry and mineralogy of arsenic in mine wastes and stream sediments in a historic metal mining area in the UK.

    Science.gov (United States)

    Rieuwerts, J S; Mighanetara, K; Braungardt, C B; Rollinson, G K; Pirrie, D; Azizi, F

    2014-02-15

    Mining generates large amounts of waste which may contain potentially toxic elements (PTE), which, if released into the wider environment, can cause air, water and soil pollution long after mining operations have ceased. The fate and toxicological impact of PTEs are determined by their partitioning and speciation and in this study, the concentrations and mineralogy of arsenic in mine wastes and stream sediments in a former metal mining area of the UK are investigated. Pseudo-total (aqua-regia extractable) arsenic concentrations in all samples from the mining area exceeded background and guideline values by 1-5 orders of magnitude, with a maximum concentration in mine wastes of 1.8×10(5)mgkg(-1) As and concentrations in stream sediments of up to 2.5×10(4)mgkg(-1) As, raising concerns over potential environmental impacts. Mineralogical analysis of the wastes and sediments was undertaken by scanning electron microscopy (SEM) and automated SEM-EDS based quantitative evaluation (QEMSCAN®). The main arsenic mineral in the mine waste was scorodite and this was significantly correlated with pseudo-total As concentrations and significantly inversely correlated with potentially mobile arsenic, as estimated from the sum of exchangeable, reducible and oxidisable arsenic fractions obtained from a sequential extraction procedure; these findings correspond with the low solubility of scorodite in acidic mine wastes. The work presented shows that the study area remains grossly polluted by historical mining and processing and illustrates the value of combining mineralogical data with acid and sequential extractions to increase our understanding of potential environmental threats. PMID:24295744

  17. Data mining for service

    CERN Document Server

    2014-01-01

    Virtually all nontrivial and modern service related problems and systems involve data volumes and types that clearly fall into what is presently meant as "big data", that is, are huge, heterogeneous, complex, distributed, etc. Data mining is a series of processes which include collecting and accumulating data, modeling phenomena, and discovering new information, and it is one of the most important steps to scientific analysis of the processes of services.  Data mining application in services requires a thorough understanding of the characteristics of each service and knowledge of the compatibility of data mining technology within each particular service, rather than knowledge only in calculation speed and prediction accuracy. Varied examples of services provided in this book will help readers understand the relation between services and data mining technology. This book is intended to stimulate interest among researchers and practitioners in the relation between data mining technology and its application to ...

  18. Automated expert modeling for automated student evaluation.

    Energy Technology Data Exchange (ETDEWEB)

    Abbott, Robert G.

    2006-01-01

    The 8th International Conference on Intelligent Tutoring Systems provides a leading international forum for the dissemination of original results in the design, implementation, and evaluation of intelligent tutoring systems and related areas. The conference draws researchers from a broad spectrum of disciplines ranging from artificial intelligence and cognitive science to pedagogy and educational psychology. The conference explores intelligent tutoring systems increasing real world impact on an increasingly global scale. Improved authoring tools and learning object standards enable fielding systems and curricula in real world settings on an unprecedented scale. Researchers deploy ITS's in ever larger studies and increasingly use data from real students, tasks, and settings to guide new research. With high volumes of student interaction data, data mining, and machine learning, tutoring systems can learn from experience and improve their teaching performance. The increasing number of realistic evaluation studies also broaden researchers knowledge about the educational contexts for which ITS's are best suited. At the same time, researchers explore how to expand and improve ITS/student communications, for example, how to achieve more flexible and responsive discourse with students, help students integrate Web resources into learning, use mobile technologies and games to enhance student motivation and learning, and address multicultural perspectives.

  19. Fuel cell mining vehicles: design, performance and advantages

    International Nuclear Information System (INIS)

    The potential for using fuel cell technology in underground mining equipment was discussed with reference to the risks associated with the operation of hydrogen vehicles, hydrogen production and hydrogen delivery systems. This paper presented some of the initiatives for mine locomotives and fuel cell stacks for underground environments. In particular, it presents the test results of the first applied industrial fuel cell vehicle in the world, a mining and tunneling locomotive. This study was part of an international initiative managed by the Fuel Cell Propulsion Institute which consists of several mining companies, mining equipment manufacturers, and fuel cell technology developers. Some of the obvious benefits of fuel cells for underground mining operations include no exhaust gases, lower electrical costs, significantly reduced maintenance, and lower ventilation costs. Another advantage is that the technology can be readily automated and computer-based for tele-remote operations. This study also quantified the cost and operational benefits associated with fuel cell vehicles compared to diesel vehicles. It is expected that higher vehicle productivity could render fuel cell underground vehicles cost-competitive. 6 refs., 1 tab

  20. A Noun Phrase Analysis Tool for Mining Online Community Conversations

    Science.gov (United States)

    Haythornthwaite, Caroline; Gruzd, Anatoliy

    Online communities are creating a growing legacy of texts in online bulletin board postings, chat, blogs, etc. These texts record conversation, knowledge exchange, and variation in focus as groups grow, mature, and decline; they represent a rich history of group interaction and an opportunity to explore the purpose and development of online communities. However, the quantity of data created by these communities is vast, and to address their processes in a timely manner requires automated processes. This raises questions about how to conduct automated analyses, and what can we gain from them: Can we gain an idea of community interests, priorities, and operation from automated examinations of texts of postings and patterns of posting behavior? Can we mine stored texts to discover patterns of language and interaction that characterize a community?

  1. Tellurium Mobility Through Mine Environments

    Science.gov (United States)

    Dorsk, M.

    2015-12-01

    Tellurium is a rare metalloid that has received minimal research regarding environmental mobility. Observations of Tellurium mobility are mainly based on observations of related metalloids such as selenium and beryllium; yet little research has been done on specific Tellurium behavior. This laboratory work established the environmental controls that influence Tellurium mobility and chemical speciation in aqueous driven systems. Theoretical simulations show possible mobility of Te as Te(OH)3[+] at highly oxidizing and acidic conditions. Movement as TeO3[2-] under more basic conditions may also be possible in elevated Eh conditions. Mobility in reducing environments is theoretically not as likely. For a practical approach to investigate mobility conditions for Te, a site with known Tellurium content was chosen in Colorado. Composite samples were selected from the top, center and bottom of a tailings pile for elution experiments. These samples were disintegrated using a rock crusher and pulverized with an automated mortar and pestle. The material was then classified to 70 microns. A 10g sample split was digested in concentrated HNO3 and HF and analyzed by Atomic Absorption Spectroscopy to determine initial Te concentrations. Additional 10g splits from each location were subjected to elution in 100 mL of each of the following solutions; nitric acid to a pH of 1.0, sulfuric acid to a pH of 2.0, sodium hydroxide to a pH of 12, ammonium hydroxide to a pH of 10, a pine needle/soil tea from material within the vicinity of the collection site to a pH of 3.5 and lastly distilled water to serve as control with a pH of 7. Sulfuric acid was purposefully chosen to simulate acid mine drainage from the decomposition of pyrite within the mine tailings. Sample sub sets were also inundated with 10mL of a 3% hydrogen peroxide solution to induce oxidizing conditions. All collected eluates were then analyzed by atomic absorption spectroscopy (AAS) to measure Tellurium concentrations in

  2. Spatiotemporal Data Mining: A Computational Perspective

    Directory of Open Access Journals (Sweden)

    Shashi Shekhar

    2015-10-01

    Full Text Available Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge. Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatiotemporal databases. It has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology. The complexity of spatiotemporal data and intrinsic relationships limits the usefulness of conventional data science techniques for extracting spatiotemporal patterns. In this survey, we review recent computational techniques and tools in spatiotemporal data mining, focusing on several major pattern families: spatiotemporal outlier, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots, and change detection. Compared with other surveys in the literature, this paper emphasizes the statistical foundations of spatiotemporal data mining and provides comprehensive coverage of computational approaches for various pattern families. ISPRS Int. J. Geo-Inf. 2015, 4 2307 We also list popular software tools for spatiotemporal data analysis. The survey concludes with a look at future research needs.

  3. M2m Automation: Matlab-To-Map Reduce Automation

    Directory of Open Access Journals (Sweden)

    Archana C S

    2014-06-01

    Full Text Available Abstract- MapReduce is a very popular parallel programming model for cloud computing platforms, and has become an effective method for processing massive data by using a cluster of computers. Program language -to-MapReduce Automator is a possible solution to help traditional programmers easily deploy an application to cloud systems through translating sequential codes to MapReduce codes.M2M Automation mainly focuses on automating numerical computations by using hadoop at the back end. M2M automates Hadoop, for faster execution of Matlab commands using MapReduce code.

  4. Ideate about building green mine of uranium mining and metallurgy

    International Nuclear Information System (INIS)

    Analysing the current situation of uranium mining and metallurgy; Setting up goals for green uranium mining and metallurgy, its fundamental conditions, Contents and measures. Putting forward an idea to combine green uranium mining and metallurgy with the state target for green mining, and keeping its own characteristics. (author)

  5. Advanced Data Mining of Leukemia Cells Micro-Arrays

    Directory of Open Access Journals (Sweden)

    Ryan M. Pierce

    2009-12-01

    Full Text Available This paper provides continuation and extensions of previous research by Segall and Pierce (2009a that discussed data mining for micro-array databases of Leukemia cells for primarily self-organized maps (SOM. As Segall and Pierce (2009a and Segall and Pierce (2009b the results of applying data mining are shown and discussed for the data categories of microarray databases of HL60, Jurkat, NB4 and U937 Leukemia cells that are also described in this article. First, a background section is provided on the work of others pertaining to the applications of data mining to micro-array databases of Leukemia cells and micro-array databases in general. As noted in predecessor article by Segall and Pierce (2009a, micro-array databases are one of the most popular functional genomics tools in use today. This research in this paper is intended to use advanced data mining technologies for better interpretations and knowledge discovery as generated by the patterns of gene expressions of HL60, Jurkat, NB4 and U937 Leukemia cells. The advanced data mining performed entailed using other data mining tools such as cubic clustering criterion, variable importance rankings, decision trees, and more detailed examinations of data mining statistics and study of other self-organized maps (SOM clustering regions of workspace as generated by SAS Enterprise Miner version 4. Conclusions and future directions of the research are also presented.

  6. Treatment of mine waters discharged from underground uranium mines

    International Nuclear Information System (INIS)

    Contaminated mine water treatment before discharging into surface water streams is mandatory for the uranium mines within the National Uranium Company SA - Romania in order to limit supplementary exposure of the population living downside the mine sites. Present mine water treatment plants have to be upgraded in order to ensure the stringent limits for uranium and radium concentrations even when processing waters resulted from the mine flooding process. Ion exchange method is used for uranium removal while radium is separated by adsorption on activated carbon. Separation process and performance are presented for the water treatment plant at an active mine and at a closed mine. (author)

  7. Recent Developments of Genomic Research in Soybean

    Institute of Scientific and Technical Information of China (English)

    Ching Chan; Xinpeng Qi; Man-Wah Li; Fuk-Ling Wong; Hon-Ming Lam

    2012-01-01

    Soybean is an important cash crop with unique and important traits such as the high seed protein and oil contents,and the ability to perform symbiotic nitrogen fixation.A reference genome of cultivated soybeans was established in 2010,followed by whole-genome re-sequencing of wild and cultivated soybean accessions.These efforts revealed unique features of the soybean genome and helped to understand its evolution.Mapping of variations between wild and cultivated soybean genomes were performed.These genomic variations may be related to the process of domestication and human selection.Wild soybean germplasms exhibited high genomic diversity and hence may be an important source of novel genes/alleles.Accumulation of genomic data will help to refine genetic maps and expedite the identification of functional genes.In this review,we summarize the major findings from the whole-genome sequencing projects and discuss the possible impacts on soybean researches and breeding programs.Some emerging areas such as transcriptomic and epigenomic studies will be introduced.In addition,we also tabulated some useful bioinformatics tools that will help the mining of the soybean genomic data.

  8. The UCSC Genome Browser database: 2015 update.

    Science.gov (United States)

    Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  9. Automated Assessment, Face to Face

    OpenAIRE

    Rizik M. H. Al-Sayyed; Amjad Hudaib; Muhannad AL-Shboul; Yousef Majdalawi; Mohammed Bataineh

    2010-01-01

    This research paper evaluates the usability of automated exams and compares them with the paper-and-pencil traditional ones. It presents the results of a detailed study conducted at The University of Jordan (UoJ) that comprised students from 15 faculties. A set of 613 students were asked about their opinions concerning automated exams; and their opinions were deeply analyzed. The results indicate that most students reported that they are satisfied with using automated exams but they have sugg...

  10. Automation System Products and Research

    OpenAIRE

    Rintala, Mikko; Sormunen, Jussi; Kuisma, Petri; Rahkala, Matti

    2014-01-01

    Automation systems are used in most buildings nowadays. In the past they were mainly used in industry to control and monitor critical systems. During the past few decades the automation systems have become more common and are used today from big industrial solutions to homes of private customers. With the growing need for ecologic and cost-efficient management systems, home and building automation systems are becoming a standard way of controlling lighting, ventilation, heating etc. Auto...

  11. Embedded system for building automation

    OpenAIRE

    Rolih, Andrej

    2014-01-01

    Home automation is a fast developing field of computer science and electronics. Companies are offering many different products for home automation. Ranging anywhere from complete systems for building management and control, to simple smart lights that can be connected to the internet. These products offer the user greater living comfort and lower their expenses by reducing the energy usage. This thesis shows the development of a simple home automation system that focuses mainly on the enhance...

  12. Implementation of Paste Backfill Mining Technology in Chinese Coal Mines

    OpenAIRE

    Qingliang Chang; Jianhang Chen; Huaqiang Zhou; Jianbiao Bai

    2014-01-01

    Implementation of clean mining technology at coal mines is crucial to protect the environment and maintain balance among energy resources, consumption, and ecology. After reviewing present coal clean mining technology, we introduce the technology principles and technological process of paste backfill mining in coal mines and discuss the components and features of backfill materials, the constitution of the backfill system, and the backfill process. Specific implementation of this technology a...

  13. Rising Standards for Data Mining

    OpenAIRE

    Kalyankar, N. V.

    2010-01-01

    This paper presents an overview of data mining, then discusses standards of existing and proposed that are relevant to data mining. This includes standards that affect several stages of a data mining project. Summaries of several emerging standards are given, as well as proposals that have the potential to change the way data mining tools are built.

  14. Environmental aspects of mining

    International Nuclear Information System (INIS)

    More soil is removed from the earth's surface nowadays by mining than by natural erosion from all of the world's rivers. Few people realise that mines and smelting works account for up to a tenth of our total energy consumption, and that mining leaves thousands of tons of waste behind; next to which, the everyday waste accumulated in towns worldwide almost pales into insignificance. The question is, can we really afford the price of our consumption of rawmaterials in terms of the consequential harm to our environment? New strategies and solution possibilities are discussed in the text. (orig./HP)

  15. World-wide distribution automation systems

    Energy Technology Data Exchange (ETDEWEB)

    Devaney, T.M.

    1994-12-31

    A worldwide power distribution automation system is outlined. Distribution automation is defined and the status of utility automation is discussed. Other topics discussed include a distribution management system, substation feeder, and customer functions, potential benefits, automation costs, planning and engineering considerations, automation trends, databases, system operation, computer modeling of system, and distribution management systems.

  16. SorghumFDB: sorghum functional genomics database with multidimensional network analysis

    OpenAIRE

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB inte...

  17. AUTOMATED API TESTING APPROACH

    Directory of Open Access Journals (Sweden)

    SUNIL L. BANGARE

    2012-02-01

    Full Text Available Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. With the help of software testing we can verify or validate the software product. Normally testing will be done after development of software but we can perform the software testing at the time of development process also. This paper will give you a brief introduction about Automated API Testing Tool. This tool of testing will reduce lots of headache after the whole development of software. It saves time as well as money. Such type of testing is helpful in the Industries & Colleges also.

  18. Automated radioimmunoassay of nicotine

    International Nuclear Information System (INIS)

    The authors have developed an automated nonequilibrium procedure for the radioimmunoassay of nicotine. The use of a unique iodinated nicotine derivative in this procedure gave a sensitivity of 10 μg/l for nicotine with a between-run precision of 7.4% and within-run precision of 6.0%. Nicotine levels of 60 to 67 μg/l were found in subjects 15 min after smoking one standard cigarette. The technique herein reported is a very rapid, and sensitive radioimmunoassay for nicotine and facilitates the determination of nicotine in smoking subjects during the actual process of smoking. (Auth.)

  19. Automated Motivic Analysis

    DEFF Research Database (Denmark)

    Lartillot, Olivier

    2016-01-01

    Motivic analysis provides very detailed understanding of musical composi- tions, but is also particularly difficult to formalize and systematize. A computational automation of the discovery of motivic patterns cannot be reduced to a mere extraction of all possible sequences of descriptions....... The systematic approach inexorably leads to a proliferation of redundant structures that needs to be addressed properly. Global filtering techniques cause a drastic elimination of interesting structures that damages the quality of the analysis. On the other hand, a selection of closed patterns allows...

  20. Mechatronic Design Automation

    DEFF Research Database (Denmark)

    Fan, Zhun

    This book proposes a novel design method that combines both genetic programming (GP) to automatically explore the open-ended design space and bond graphs (BG) to unify design representations of multi-domain Mechatronic systems. Results show that the method, formally called GPBG method, can...... successfully design analogue filters, vibration absorbers, micro-electro-mechanical systems, and vehicle suspension systems, all in an automatic or semi-automatic way. It also investigates the very important issue of co-designing plant-structures and dynamic controllers in automated design of Mechatronic...

  1. AUTOMATION OF REMEDY TICKETS CATEGORIZATION USING BUSINESS INTELLIGENCE TOOLS

    Directory of Open Access Journals (Sweden)

    DR. M RAJASEKHARA BABU

    2012-06-01

    Full Text Available The work log of an issue is often the primary source of information for predicting the cause. Mining patterns from work log is an important issue management task. This paper aims at developing an application which categorizes the issues into problem areas using a clustering algorithm. This algorithm helps one to cluster the issues by mining patterns from the work log files. Standard reports can be generated for the root cause analysis. The whole process is automated using Business Intelligence Tools. This paper can be helpful in minimizing the recurrence of issues by informing the technical decision makers about the impact of the issues on the system andthus providing a permanent fix.

  2. Automated Menu Recommendation System Based on Past Preferences

    Directory of Open Access Journals (Sweden)

    Daniel Simon Sanz

    2014-08-01

    Full Text Available Data mining plays an important role in ecommerce in today’s world. Time is critical when it comes to shopping as options are unlimited and making a choice can be tedious. This study presents an application of data mining in the form of an Android application that can provide user with automated suggestion based on past preferences. The application helps a person to choose what food they might want to order in a specific restaurant. The application learns user behavior with each order - what they order in each kind of meal and what are the products that they select together. After gathering enough information, the application can suggest the user about the most selected dish in the recent past and since the application started to learn. Applications, such as these, can play a major role in helping make a decision based on past preferences, thereby reducing the user involvement in decision making.

  3. Closedure - Mine Closure Technologies Resource

    Science.gov (United States)

    Kauppila, Päivi; Kauppila, Tommi; Pasanen, Antti; Backnäs, Soile; Liisa Räisänen, Marja; Turunen, Kaisa; Karlsson, Teemu; Solismaa, Lauri; Hentinen, Kimmo

    2015-04-01

    Closure of mining operations is an essential part of the development of eco-efficient mining and the Green Mining concept in Finland to reduce the environmental footprint of mining. Closedure is a 2-year joint research project between Geological Survey of Finland and Technical Research Centre of Finland that aims at developing accessible tools and resources for planning, executing and monitoring mine closure. The main outcome of the Closedure project is an updatable wiki technology-based internet platform (http://mineclosure.gtk.fi) in which comprehensive guidance on the mine closure is provided and main methods and technologies related to mine closure are evaluated. Closedure also provides new data on the key issues of mine closure, such as performance of passive water treatment in Finland, applicability of test methods for evaluating cover structures for mining wastes, prediction of water effluents from mine wastes, and isotopic and geophysical methods to recognize contaminant transport paths in crystalline bedrock.

  4. Literature classification for semi-automated updating of biological knowledgebases

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Winther, Ole;

    2013-01-01

    abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion: We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and...... types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results: We defined and applied a machine...

  5. Mining a Web Citation Database for Author Co-Citation Analysis.

    Science.gov (United States)

    He, Yulan; Hui, Siu Cheung

    2002-01-01

    Proposes a mining process to automate author co-citation analysis based on the Web Citation Database, a data warehouse for storing citation indices of Web publications. Describes the use of agglomerative hierarchical clustering for author clustering and multidimensional scaling for displaying author cluster maps, and explains PubSearch, a…

  6. Using Semantic Annotation for Mining Privacy and Security Requirements from European Union Directives

    OpenAIRE

    Guarda, Paolo; Kiyavitskaya, Nadzeya; Zannone, Nicola

    2008-01-01

    The increasing complexity of software systems and growing demand for regulations compliance require effective methods and tools to support requirements analysts activities. In order to facilitate alignment of software system requirements and regulations, systematic methods and tools automating regulations analysis must be developed. This work explores applicability of the semantic annotation tool Cerno to mining of rights and obligations from European privacy directives.

  7. Data mining in Cloud Computing

    OpenAIRE

    Ruxandra-Ştefania PETRE

    2012-01-01

    This paper describes how data mining is used in cloud computing. Data Mining is used for extracting potentially useful information from raw data. The integration of data mining techniques into normal day-to-day activities has become common place. Every day people are confronted with targeted advertising, and data mining techniques help businesses to become more efficient by reducing costs. Data mining techniques and applications are very much needed in the cloud computing paradigm. The implem...

  8. Security Measures in Data Mining

    OpenAIRE

    Anish Gupta; Vimal Bibhu; Rashid Hussain

    2012-01-01

    Data mining is a technique to dig the data from the large databases for analysis and executive decision making. Security aspect is one of the measure requirement for data mining applications. In this paper we present security requirement measures for the data mining. We summarize the requirements of security for data mining in tabular format. The summarization is performed by the requirements with different aspects of security measure of data mining. The performances and outcomes are determin...

  9. Concept of Web Usage Mining

    OpenAIRE

    Istrate Mihai

    2011-01-01

    Web mining is the use of data mining techniques to automatically discover and extract information from World Wide Web documents and services. This article considers the question: is effective Web mining possible? Skeptics believe that the Web is too unstructured for Web mining to succeed. Indeed, data mining has been applied to databases traditionally, yet much of the information on the Web lies buried in documents designed for human consumption such as home pages or product catalogs. Further...

  10. Mine Maps as Grey Literature

    OpenAIRE

    Musser, Linda R. (Pennsylvania State University); GreyNet, Grey Literature Network Service

    2000-01-01

    Mine maps are extremely useful resources for determining regional and local hydrogeologic conditions as well as mineral resources and reserves. Users include mining companies, property owners concerned about risk factors related to mining activities, government inspectors, engineers and planners. Hundreds of thousands of mine maps exist yet how many are collected or cataloged by libraries or archives? This paper examines the characteristics of mine maps, how they are published, their value, a...

  11. Languages for Mining and Learning

    OpenAIRE

    De Raedt, Luc

    2015-01-01

    Applying machine learning and data mining to novel applications is cumbersome. This observation is the prime motivation for the interest in languages for learning and mining. In this talk, I shall provide a gentle introduction to three types of languages that support machine learning and data mining: inductive query languages, which extend database query languages with primitives for mining and learning, modelling languages, which allow to declaratively specify and solve mining and learning p...

  12. Maneuver Automation Software

    Science.gov (United States)

    Uffelman, Hal; Goodson, Troy; Pellegrin, Michael; Stavert, Lynn; Burk, Thomas; Beach, David; Signorelli, Joel; Jones, Jeremy; Hahn, Yungsun; Attiyah, Ahlam; Illsley, Jeannette

    2009-01-01

    The Maneuver Automation Software (MAS) automates the process of generating commands for maneuvers to keep the spacecraft of the Cassini-Huygens mission on a predetermined prime mission trajectory. Before MAS became available, a team of approximately 10 members had to work about two weeks to design, test, and implement each maneuver in a process that involved running many maneuver-related application programs and then serially handing off data products to other parts of the team. MAS enables a three-member team to design, test, and implement a maneuver in about one-half hour after Navigation has process-tracking data. MAS accepts more than 60 parameters and 22 files as input directly from users. MAS consists of Practical Extraction and Reporting Language (PERL) scripts that link, sequence, and execute the maneuver- related application programs: "Pushing a single button" on a graphical user interface causes MAS to run navigation programs that design a maneuver; programs that create sequences of commands to execute the maneuver on the spacecraft; and a program that generates predictions about maneuver performance and generates reports and other files that enable users to quickly review and verify the maneuver design. MAS can also generate presentation materials, initiate electronic command request forms, and archive all data products for future reference.

  13. Automated Test Case Generation

    CERN Document Server

    CERN. Geneva

    2015-01-01

    I would like to present the concept of automated test case generation. I work on it as part of my PhD and I think it would be interesting also for other people. It is also the topic of a workshop paper that I am introducing in Paris. (abstract below) Please note that the talk itself would be more general and not about the specifics of my PhD, but about the broad field of Automated Test Case Generation. I would introduce the main approaches (combinatorial testing, symbolic execution, adaptive random testing) and their advantages and problems. (oracle problem, combinatorial explosion, ...) Abstract of the paper: Over the last decade code-based test case generation techniques such as combinatorial testing or dynamic symbolic execution have seen growing research popularity. Most algorithms and tool implementations are based on finding assignments for input parameter values in order to maximise the execution branch coverage. Only few of them consider dependencies from outside the Code Under Test’s scope such...

  14. Automation from pictures

    International Nuclear Information System (INIS)

    The state transition diagram (STD) model has been helpful in the design of real time software, especially with the emergence of graphical computer aided software engineering (CASE) tools. Nevertheless, the translation of the STD to real time code has in the past been primarily a manual task. At Los Alamos we have automated this process. The designer constructs the STD using a CASE tool (Cadre Teamwork) using a special notation for events and actions. A translator converts the STD into an intermediate state notation language (SNL), and this SNL is compiled directly into C code (a state program). Execution of the state program is driven by external events, allowing multiple state programs to effectively share the resources of the host processor. Since the design and the code are tightly integrated through the CASE tool, the design and code never diverge, and we avoid design obsolescence. Furthermore, the CASE tool automates the production of formal technical documents from the graphic description encapsulated by the CASE tool. (author)

  15. Automated digital magnetofluidics

    Energy Technology Data Exchange (ETDEWEB)

    Schneider, J; Garcia, A A; Marquez, M [Harrington Department of Bioengineering Arizona State University, Tempe AZ 85287-9709 (United States)], E-mail: tony.garcia@asu.edu

    2008-08-15

    Drops can be moved in complex patterns on superhydrophobic surfaces using a reconfigured computer-controlled x-y metrology stage with a high degree of accuracy, flexibility, and reconfigurability. The stage employs a DMC-4030 controller which has a RISC-based, clock multiplying processor with DSP functions, accepting encoder inputs up to 22 MHz, provides servo update rates as high as 32 kHz, and processes commands at rates as fast as 40 milliseconds. A 6.35 mm diameter cylindrical NdFeB magnet is translated by the stage causing water drops to move by the action of induced magnetization of coated iron microspheres that remain in the drop and are attracted to the rare earth magnet through digital magnetofluidics. Water drops are easily moved in complex patterns in automated digital magnetofluidics at an average speed of 2.8 cm/s over a superhydrophobic polyethylene surface created by solvent casting. With additional components, some potential uses for this automated microfluidic system include characterization of superhydrophobic surfaces, water quality analysis, and medical diagnostics.

  16. Automated Postediting of Documents

    CERN Document Server

    Knight, K; Knight, Kevin; Chander, Ishwar

    1994-01-01

    Large amounts of low- to medium-quality English texts are now being produced by machine translation (MT) systems, optical character readers (OCR), and non-native speakers of English. Most of this text must be postedited by hand before it sees the light of day. Improving text quality is tedious work, but its automation has not received much research attention. Anyone who has postedited a technical report or thesis written by a non-native speaker of English knows the potential of an automated postediting system. For the case of MT-generated text, we argue for the construction of postediting modules that are portable across MT systems, as an alternative to hardcoding improvements inside any one system. As an example, we have built a complete self-contained postediting module for the task of article selection (a, an, the) for English noun phrases. This is a notoriously difficult problem for Japanese-English MT. Our system contains over 200,000 rules derived automatically from online text resources. We report on l...

  17. Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    CERN Document Server

    Birkholtz, L -M; Wells, G; Grando, D; Joubert, F; Kasam, V; Zimmermann, M; Ortet, P; Jacq, N; Roy, S; Hoffmann-Apitius, M; Breton, V; Louw, A I; Maréchal, E

    2006-01-01

    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained fro...

  18. Ensemble Data Mining Methods

    Data.gov (United States)

    National Aeronautics and Space Administration — Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve...

  19. Mining activities at Neyveli

    International Nuclear Information System (INIS)

    Mining activities at lignite areas around Neyveli are described. Measures taken to safeguard the environment from despoliation of land, air pollution, noise pollution and effluents are described. (M.G.B.)

  20. Data mining in agriculture

    CERN Document Server

    Mucherino, Antonio; Pardalos, Panos M

    2009-01-01

    Data Mining in Agriculture represents a comprehensive effort to provide graduate students and researchers with an analytical text on data mining techniques applied to agriculture and environmental related fields. This book presents both theoretical and practical insights with a focus on presenting the context of each data mining technique rather intuitively with ample concrete examples represented graphically and with algorithms written in MATLAB®. Examples and exercises with solutions are provided at the end of each chapter to facilitate the comprehension of the material. For each data mining technique described in the book variants and improvements of the basic algorithm are also given. Also by P.J. Papajorgji and P.M. Pardalos: Advances in Modeling Agricultural Systems, 'Springer Optimization and its Applications' vol. 25, ©2009.

  1. The Pacesetter mine model

    International Nuclear Information System (INIS)

    As part of Syncrude Canada's continuing efforts to improve operations, its oil sands mine was compared to other surface mines in order to determine possible areas of improvement. The two basic tools used in this process were influence diagrams, which capture pictorially all of the factors that may affect a particular outcome, and benchmarking, which allows mathematical relationships to be developed for the various influences. These relationships estimate the magnitude of the influence on an outcome and allow prediction of changes in outcomes when an influence is changed. A model was developed, using data from four different mines, to make the comparisons with Syncrude's operation. The results show that Syncrude could improve costs by ca 10% if all of the superior methods and ideas used at other mines could be adopted. 1 fig

  2. Coal mine subsidence

    International Nuclear Information System (INIS)

    This paper examines the efficacy of the Department of the Interior's Office of Surface Mining Reclamation and Enforcement's (OSMRE) efforts to implement the federally assisted coal mine subsidence insurance program. Coal mine subsidence, a gradual settling of the earth's surface above an underground mine, can damage nearby land and property. To help protect property owners from subsidence-related damage, the Congress passed legislation in 1984 authorizing OSMRE to make grants of up to $3 million to each state to help the states establish self-sustaining, state-administered insurance programs. Of the 21 eligible states, six Colorado, Indiana, Kentucky, Ohio, West Virginia, and Wyoming applied for grants. This paper reviews the efforts of these six states to develop self-sustaining insurance programs and assessed OSMRE's oversight of those efforts

  3. VRLane: a desktop virtual safety management program for underground coal mine

    Science.gov (United States)

    Li, Mei; Chen, Jingzhu; Xiong, Wei; Zhang, Pengpeng; Wu, Daozheng

    2008-10-01

    VR technologies, which generate immersive, interactive, and three-dimensional (3D) environments, are seldom applied to coal mine safety work management. In this paper, a new method that combined the VR technologies with underground mine safety management system was explored. A desktop virtual safety management program for underground coal mine, called VRLane, was developed. The paper mainly concerned about the current research advance in VR, system design, key techniques and system application. Two important techniques were introduced in the paper. Firstly, an algorithm was designed and implemented, with which the 3D laneway models and equipment models can be built on the basis of the latest mine 2D drawings automatically, whereas common VR programs established 3D environment by using 3DS Max or the other 3D modeling software packages with which laneway models were built manually and laboriously. Secondly, VRLane realized system integration with underground industrial automation. VRLane not only described a realistic 3D laneway environment, but also described the status of the coal mining, with functions of displaying the run states and related parameters of equipment, per-alarming the abnormal mining events, and animating mine cars, mine workers, or long-wall shearers. The system, with advantages of cheap, dynamic, easy to maintenance, provided a useful tool for safety production management in coal mine.

  4. Web Mining: An Overview

    Directory of Open Access Journals (Sweden)

    P. V. G. S. Mudiraj B. Jabber K. David raju

    2011-12-01

    Full Text Available Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. The motive of mining is to find users’ access models automatically and quickly from the vast Web log data, such as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, registration information and other relative information left by user provide foundation for decision making of organizations. This article provides a survey and analysis of current Web usage mining systems and technologies. There are generally three tasks in Web Usage Mining: Preprocessing, Pattern analysis and Knowledge discovery. Preprocessing cleans log file of server by removing log entries such as error or failure and repeated request for the same URL from the same host etc... The main task of Pattern analysis is to filter uninteresting information and to visualize and interpret the interesting pattern to users. The statistics collected from the log file can help to discover the knowledge. This knowledge collected can be used to take decision on various factors like Excellent, Medium, Weak users and Excellent, Medium and Weak web pages based on hit counts of the web page in the web site. The design of the website is restructured based on user’s behavior or hit counts which provides quick response to the web users, saves memory space of servers and thus reducing HTTP requests and bandwidth utilization. This paper addresses challenges in three phases of Web Usage mining along with Web Structure Mining.This paper also discusses an application of WUM, an online Recommender System that dynamically generates links to pages that have not yet been visited by a user and might be of his potential interest. Differently from the recommender systems proposed so far, ONLINE MINER does not make use of any off-line component, and is able to manage Web sites made up of pages dynamically generated.

  5. Applied data mining

    CERN Document Server

    Xu, Guandong

    2013-01-01

    Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.

  6. Data mining in radiology

    OpenAIRE

    Amit T Kharat; Amarjit Singh; Kulkarni, Vilas M; Digish Shah

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed dec...

  7. MINING INDUSTRY IN CROATIA

    OpenAIRE

    Slavko Vujec

    1996-01-01

    The trends of World and European mine industry is presented with introductory short review. The mining industry is very important in economy of Croatia, because of cover most of needed petroleum and natural gas quantity, total construction raw materials and industrial non-metallic raw minerals. Detail quantitative presentation of mineral raw material production is compared with pre-war situation. The value of annual production is represented for each raw mineral (the paper is published in Cro...

  8. MINING INDUSTRY IN CROATIA

    Directory of Open Access Journals (Sweden)

    Slavko Vujec

    1996-12-01

    Full Text Available The trends of World and European mine industry is presented with introductory short review. The mining industry is very important in economy of Croatia, because of cover most of needed petroleum and natural gas quantity, total construction raw materials and industrial non-metallic raw minerals. Detail quantitative presentation of mineral raw material production is compared with pre-war situation. The value of annual production is represented for each raw mineral (the paper is published in Croatian.

  9. Data Stream Mining

    Science.gov (United States)

    Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

    Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).

  10. Visual Data Mining Techniques

    OpenAIRE

    Keim, Daniel A.; Ward, Matthew O.

    2002-01-01

    Never before in history has data been generated at such high volumes as it is today. Exploring and analyzing the vast volumes of data has become increasingly difficult. Information visualization and visual data mining can help to deal with the flood of information. The advantage of visual data exploration is that the user is directly involved in the data mining process. There are a large number of information visualization techniques that have been developed over the last two decades to suppo...

  11. Coal Mines Security System

    OpenAIRE

    Ankita Guhe; Shruti Deshmukh; Bhagyashree Borekar; Apoorva Kailaswar; Milind E. Rane

    2012-01-01

    Geological circumstances of mine seem to be extremely complicated and there are many hidden troubles. Coal is wrongly lifted by the musclemen from coal stocks, coal washeries, coal transfer and loading points and also in the transport routes by malfunctioning the weighing of trucks. CIL —Coal India Ltd is under the control of mafia and a large number of irregularities can be contributed to coal mafia. An Intelligent Coal Mine Security System using data acquisition method utilizes sensor, auto...

  12. Privacy Preserving Data Mining

    OpenAIRE

    A.T. Ravi; Chitra, S.

    2011-01-01

    Recent interest in data collection and monitoring using data mining for security and business-related applications has raised privacy. Privacy Preserving Data Mining (PPDM) techniques require data modification to disinfect them from sensitive information or to anonymize them at an uncertainty level. This study uses PPDM with adult dataset to investigate effects of K-anonymization for evaluation metrics. This study uses Artificial Bee Colony (ABC) algorithm for feature generalization and suppr...

  13. Python data mining environments

    OpenAIRE

    Mrak, Aleš

    2012-01-01

    In the thesis we compare the systems for data mining that have an interface in the programming language Python. Many open-source systems for data mining and library had implemented their software interfaces to the Python programming language. They choose Python because it is fast and provides object-oriented programming, allows for the integration of other software libraries in Python and is implemented in all major operating systems (Windows, Linux / Unix, OS / 2, Mac, etc..). Our analysis s...

  14. KREK: Minding your mines

    OpenAIRE

    Nygård, Jardar; Brødreskift, Kenneth; Zhang, Yin Yin; Kontny, Sandra Elisabeth

    2010-01-01

    This thesis presents the findings from an in-depth analysis of the Chinese market for environmental mining management, with the aim to explore the possibility to offer consulting and educational services in this market. The analysis is carried out on behalf of Kjeøy Research and Education Center (KREC), a Norwegian company providing education and consulting services on environmental management in the mining industry. KREC is a small business founded in 2003 and is located in Northern No...

  15. HIGH UTILITY ITEMSETS MINING

    OpenAIRE

    YING LIU; JIANWEI LI; WEI-KENG LIAO; ALOK CHOUDHARY; YONG SHI

    2010-01-01

    High utility itemsets mining identifies itemsets whose utility satisfies a given threshold. It allows users to quantify the usefulness or preferences of items using different values. Thus, it reflects the impact of different items. High utility itemsets mining is useful in decision-making process of many applications, such as retail marketing and Web service, since items are actually different in many aspects in real applications. However, due to the lack of "downward closure property", the c...

  16. International mining forum 2004, new technologies in underground mining, safety in mines proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Jerzy Kicki; Eugeniusz Sobczyk (eds.)

    2004-01-15

    The book comprises technical papers that were presented at the International Mining Forum 2004. This event aims to bring together scientists and engineers in mining, rock mechanics, and computer engineering, with a view to explore and discuss international developments in the field. Topics discussed in this book are: trends in the mining industry; new solutions and tendencies in underground mines; rock engineering problems in underground mines; utilization and exploitation of methane; prevention measures for the control of rock bursts in Polish mines; and current problems in Ukrainian coal mines.

  17. A gene pattern mining algorithm using interchangeable gene sets for prokaryotes

    Directory of Open Access Journals (Sweden)

    Kim Sun

    2008-02-01

    Full Text Available Abstract Background Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. Results In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable, we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. Conclusion The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.

  18. An Introduction to Internet Data Mining

    OpenAIRE

    Sumit Ahlawat

    2014-01-01

    In this paper we discuss mining with respect to web data referred here as web data mining. We have categorized web data mining into threes areas; web content mining, web structure mining and web usage mining. We have highlighted and discussed various research issues involved in each of these web data mining category. We believe that web data mining will be the topic of exploratory research in near future. Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD...

  19. Mining environmental handbook: effects of mining on the environment and American environmental controls on mining

    International Nuclear Information System (INIS)

    This mining environmental handbook examines and defines the dual-effect of mining on the environment and the relatively new environmental controls on mining. The US Federal and state laws dealing with environmental control of mining are outlined. Mining-related environmental impacts on land, water, air, plants, wildlife and society are reviewed and the technologies, practices and standards available to prevent or mitigate each of these adverse environmental impacts are described. Environmental permitting and systems design for environmental protection are covered. Specialised or individual mining situations such as coal mining, solution mining, acid mine drainage and use of surface mines as landfills and repositories are discussed. Costs and financial assurance requirements are also covered

  20. Investigation and characterization of mining subsidence in Kaiyang Phosphorus Mine

    Institute of Scientific and Technical Information of China (English)

    DENG Jian; BIAN Li

    2007-01-01

    In Kaiyang Phosphorus Mine, serious environmental and safety problems are caused by large scale mining activities in the past 40 years. These problems include mining subsidence, low recovery ratio, too much dead ore in pillars, and pollution of phosphorus gypsum. Mining subsidence falls into four categories: curved ground and mesa, ground cracks and collapse hole, spalling and eboulement, slope slide and creeping. Measures to treat the mining subsidence were put forward: finding out and managing abandoned stopes, optimizing mining method (cut and fill mining method), selecting proper backfilling materials (phosphogypsum mixtures), avoiding disorder mining operation, and treating highway slopes. These investigations and engineering treatment methods are believed to be able to contribute to the safety extraction of ore and sustainable development in Kaiyang Phosphorus Mine.

  1. Mine water sustainability assessment in a tungsten mine

    International Nuclear Information System (INIS)

    Currently all over the World, the sustainable developments of human activities are being given increasing importance. Mining operations playa vital role in fulfilling economic and social growth of a nation and therefore, it is necessary to develop SMP (Sustainable Mining Practices) to actualise sustainable developments. One of the major components of sustainable mining practices is the management of the water regime during mining operations for the present and the future generations. For real and efficient management of the water regime from the pre-mining to the postmining stage, a sustain ability index has been developed which follows a mathematical model based on the three environmental indicators namely Physico-chemical properties. Toxic elements and Other Components. This paper is specifically concerned with the assessment of mine water sustainability due to tungsten mining and describes the availability of tungsten resources throughout the World together with the mine water sustainability results of an underground tungsten mine in Portugal. (author)

  2. Testing automation of projects in telecommunication domain

    OpenAIRE

    Alexey, Veselov; Vsevolod, Kotlyarov

    2010-01-01

    This paper presents an integrated approach to testing automation of telecommunication projects along with proposals to automation of conformance testing. The underlying idea is to benefit from combining formal verification and testing automation techniques in order to improve product quality.

  3. WEB STRUCTURE MINING

    Directory of Open Access Journals (Sweden)

    CLAUDIA ELENA DINUCĂ

    2011-01-01

    Full Text Available The World Wide Web became one of the most valuable resources for information retrievals and knowledge discoveries due to the permanent increasing of the amount of data available online. Taking into consideration the web dimension, the users get easily lost in the web’s rich hyper structure. Application of data mining methods is the right solution for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering and Web based data warehousing. In this paper, I provide an introduction of Web mining categories and I focus on one of these categories: the Web structure mining. Web structure mining, one of three categories of web mining for data, is a tool used to identify the relationship between Web pages linked by information or direct link connection. It offers information about how different pages are linked together to form this huge web. Web Structure Mining finds hidden basic structures and uses hyperlinks for more web applications such as web search.

  4. Coal mining in Ramagundam

    Energy Technology Data Exchange (ETDEWEB)

    Chakraberty, S.

    1979-07-01

    The Ramagundam area in the South Godavari Coalfield is one of the most promising coal-bearing belts in India. It contains total coal reserves of about 1,132,000,000 tons in an area of approximately 150 square kilometers, and holds high potential for development into a vast industrial center. During the past four years production has doubled to 3,500,000 tons in 1978 to 1979. By 1983 to 1984, the total output per year is planned to be doubled again. Increased mechanization and the introduction of more advanced mining techniques will help to achieve this goal. In addition to the present face machinery, i.e., gathering arm loaders/shuttle cars and side dump loaders/chain conveyor combinations, the latest Voest-Alpine AM50 tunneling and roadheading machines have been commissioned for development work. Load-haul-dump machines will be introduced in the near future to ensure higher loading/transport capacities. A double-drum shearer loader with self-advancing supports is due to be commissioned shortly for faster, more efficient longwall mining to supplement conventional bord and pillar mining. In addition, a mechanized open cast mine has come on stream, and a walking dragline will soon be delivered to the mine for removing overburden. The projected annual output from this mine will be about 2,000,000 tons. (LTN)

  5. Automated Methods of Corrosion Measurements

    DEFF Research Database (Denmark)

    Andersen, Jens Enevold Thaulov

    . Mechanical control, recording, and data processing must therefore be automated to a high level of precision and reliability. These general techniques and the apparatus involved have been described extensively. The automated methods of such high-resolution microscopy coordinated with computerized...

  6. Opening up Library Automation Software

    Science.gov (United States)

    Breeding, Marshall

    2009-01-01

    Throughout the history of library automation, the author has seen a steady advancement toward more open systems. In the early days of library automation, when proprietary systems dominated, the need for standards was paramount since other means of inter-operability and data exchange weren't possible. Today's focus on Application Programming…

  7. Automation, Performance and International Competition

    DEFF Research Database (Denmark)

    Kromann, Lene; Sørensen, Anders

    productivity growth than other firms. Moreover, automation improves the efficiency of all stages of the production process by reducing setup time, run time, and inspection time and increasing uptime and quantity produced per worker. The efficiency improvement varies by type of automation....

  8. Automated separation for heterogeneous immunoassays

    OpenAIRE

    Truchaud, A.; Barclay, J; Yvert, J. P.; Capolaghi, B.

    1991-01-01

    Beside general requirements for modern automated systems, immunoassay automation involves specific requirements as a separation step for heterogeneous immunoassays. Systems are designed according to the solid phase selected: dedicated or open robots for coated tubes and wells, systems nearly similar to chemistry analysers in the case of magnetic particles, and a completely original design for those using porous and film materials.

  9. Automated Test-Form Generation

    Science.gov (United States)

    van der Linden, Wim J.; Diao, Qi

    2011-01-01

    In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…

  10. Automated Methods Of Corrosion Measurements

    DEFF Research Database (Denmark)

    Bech-Nielsen, Gregers; Andersen, Jens Enevold Thaulov; Reeve, John Ch;

    1997-01-01

    The chapter describes the following automated measurements: Corrosion Measurements by Titration, Imaging Corrosion by Scanning Probe Microscopy, Critical Pitting Temperature and Application of the Electrochemical Hydrogen Permeation Cell.......The chapter describes the following automated measurements: Corrosion Measurements by Titration, Imaging Corrosion by Scanning Probe Microscopy, Critical Pitting Temperature and Application of the Electrochemical Hydrogen Permeation Cell....

  11. GarlicESTdb: an online database and mining tool for garlic EST sequences

    Directory of Open Access Journals (Sweden)

    Choi Sang-Haeng

    2009-05-01

    Full Text Available Abstract Background Allium sativum., commonly known as garlic, is a species in the onion genus (Allium, which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. Description GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition software technology (JSP/EJB/JavaServlet for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation

  12. Automated Training for Algorithms That Learn from Genomic Data

    OpenAIRE

    Gokcen Cilingir; Broschat, Shira L.

    2015-01-01

    Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervis...

  13. Data mining in healthcare: decision making and precision

    Directory of Open Access Journals (Sweden)

    Ionuţ ŢĂRANU

    2016-05-01

    Full Text Available The trend of application of data mining in healthcare today is increased because the health sector is rich with information and data mining has become a necessity. Healthcare organizations generate and collect large volumes of information to a daily basis. Use of information technology enables automation of data mining and knowledge that help bring some interesting patterns which means eliminating manual tasks and easy data extraction directly from electronic records, electronic transfer system that will secure medical records, save lives and reduce the cost of medical services as well as enabling early detection of infectious diseases on the basis of advanced data collection. Data mining can enable healthcare organizations to anticipate trends in the patient's medical condition and behaviour proved by analysis of prospects different and by making connections between seemingly unrelated information. The raw data from healthcare organizations are voluminous and heterogeneous. It needs to be collected and stored in organized form and their integration allows the formation unite medical information system. Data mining in health offers unlimited possibilities for analyzing different data models less visible or hidden to common analysis techniques. These patterns can be used by healthcare practitioners to make forecasts, put diagnoses, and set treatments for patients in healthcare organizations.

  14. Automated Standard Hazard Tool

    Science.gov (United States)

    Stebler, Shane

    2014-01-01

    The current system used to generate standard hazard reports is considered cumbersome and iterative. This study defines a structure for this system's process in a clear, algorithmic way so that standard hazard reports and basic hazard analysis may be completed using a centralized, web-based computer application. To accomplish this task, a test server is used to host a prototype of the tool during development. The prototype is configured to easily integrate into NASA's current server systems with minimal alteration. Additionally, the tool is easily updated and provides NASA with a system that may grow to accommodate future requirements and possibly, different applications. Results of this project's success are outlined in positive, subjective reviews complete by payload providers and NASA Safety and Mission Assurance personnel. Ideally, this prototype will increase interest in the concept of standard hazard automation and lead to the full-scale production of a user-ready application.

  15. Expedition automated flow fluorometer

    Science.gov (United States)

    Krikun, V. A.; Salyuk, P. A.

    2015-11-01

    This paper describes an apparatus and operation of automated flow-through dual-channel fluorometer for studying the fluorescence of dissolved organic matter, and the fluorescence of phytoplankton cells with open and closed reaction centers in sea areas with oligotrophic and eutrophic water type. The step-by step excitation by two semiconductor lasers or two light-emitting diodes is realized in the current device. The excitation wavelengths are 405nm and 532nm in the default configuration. Excitation radiation of each light source can be changed with different durations, intensities and repetition rate. Registration of the fluorescence signal carried out by two photo-multipliers with different optical filters of 580-600 nm and 680-700 nm band pass diapasons. The configuration of excitation sources and spectral diapasons of registered radiation can be changed due to decided tasks.

  16. Robust automated knowledge capture.

    Energy Technology Data Exchange (ETDEWEB)

    Stevens-Adams, Susan Marie; Abbott, Robert G.; Forsythe, James Chris; Trumbo, Michael Christopher Stefan; Haass, Michael Joseph; Hendrickson, Stacey M. Langfitt

    2011-10-01

    This report summarizes research conducted through the Sandia National Laboratories Robust Automated Knowledge Capture Laboratory Directed Research and Development project. The objective of this project was to advance scientific understanding of the influence of individual cognitive attributes on decision making. The project has developed a quantitative model known as RumRunner that has proven effective in predicting the propensity of an individual to shift strategies on the basis of task and experience related parameters. Three separate studies are described which have validated the basic RumRunner model. This work provides a basis for better understanding human decision making in high consequent national security applications, and in particular, the individual characteristics that underlie adaptive thinking.

  17. Berkeley automated supernova search

    International Nuclear Information System (INIS)

    The Berkeley automated supernova search employs a computer controlled 36-inch telescope and charge coupled device (CCD) detector to image 2500 galaxies per night. A dedicated minicomputer compares each galaxy image with stored reference data to identify supernovae in real time. The threshold for detection is m/sub v/ = 18.8. We plan to monitor roughly 500 galaxies in Virgo and closer every night, and an additional 6000 galaxies out to 70 Mpc on a three night cycle. This should yield very early detection of several supernovae per year for detailed study, and reliable premaximum detection of roughly 100 supernovae per year for statistical studies. The search should be operational in mid-1982

  18. Automated synthetic scene generation

    Science.gov (United States)

    Givens, Ryan N.

    Physics-based simulations generate synthetic imagery to help organizations anticipate system performance of proposed remote sensing systems. However, manually constructing synthetic scenes which are sophisticated enough to capture the complexity of real-world sites can take days to months depending on the size of the site and desired fidelity of the scene. This research, sponsored by the Air Force Research Laboratory's Sensors Directorate, successfully developed an automated approach to fuse high-resolution RGB imagery, lidar data, and hyperspectral imagery and then extract the necessary scene components. The method greatly reduces the time and money required to generate realistic synthetic scenes and developed new approaches to improve material identification using information from all three of the input datasets.

  19. Automated Electrostatics Environmental Chamber

    Science.gov (United States)

    Calle, Carlos; Lewis, Dean C.; Buchanan, Randy K.; Buchanan, Aubri

    2005-01-01

    The Mars Electrostatics Chamber (MEC) is an environmental chamber designed primarily to create atmospheric conditions like those at the surface of Mars to support experiments on electrostatic effects in the Martian environment. The chamber is equipped with a vacuum system, a cryogenic cooling system, an atmospheric-gas replenishing and analysis system, and a computerized control system that can be programmed by the user and that provides both automation and options for manual control. The control system can be set to maintain steady Mars-like conditions or to impose temperature and pressure variations of a Mars diurnal cycle at any given season and latitude. In addition, the MEC can be used in other areas of research because it can create steady or varying atmospheric conditions anywhere within the wide temperature, pressure, and composition ranges between the extremes of Mars-like and Earth-like conditions.

  20. [From automation to robotics].

    Science.gov (United States)

    1985-01-01

    The introduction of automation into the laboratory of biology seems to be unavoidable. But at which cost, if it is necessary to purchase a new machine for every new application? Fortunately the same image processing techniques, belonging to a theoretic framework called Mathematical Morphology, may be used in visual inspection tasks, both in car industry and in the biology lab. Since the market for industrial robotics applications is much higher than the market of biomedical applications, the price of image processing devices drops, and becomes sometimes less than the price of a complete microscope equipment. The power of the image processing methods of Mathematical Morphology will be illustrated by various examples, as automatic silver grain counting in autoradiography, determination of HLA genotype, electrophoretic gels analysis, automatic screening of cervical smears... Thus several heterogeneous applications may share the same image processing device, provided there is a separate and devoted work station for each of them. PMID:4091303