WorldWideScience

Sample records for machine learning ml

  1. OpenML : An R package to connect to the machine learning platform OpenML

    NARCIS (Netherlands)

    Casalicchio, G.; Bossek, J.; Lang, M.; Kirchhoff, D.; Kerschke, P.; Hofner, B.; Seibold, H.; Vanschoren, J.; Bischl, B.

    2017-01-01

    OpenML is an online machine learning platform where researchers can easily share data, machine learning tasks and experiments as well as organize them online to work and collaborate more efficiently. In this paper, we present an R package to interface with the OpenML platform and illustrate its

  2. ML Confidential : machine learning on encrypted data

    NARCIS (Netherlands)

    Graepel, T.; Lauter, K.; Naehrig, M.

    2012-01-01

    We demonstrate that by using a recently proposed somewhat homomorphic encryption (SHE) scheme it is possible to delegate the execution of a machine learning (ML) algorithm to a compute service while retaining confidentiality of the training and test data. Since the computational complexity of the

  3. AstroML: "better, faster, cheaper" towards state-of-the-art data mining and machine learning

    Science.gov (United States)

    Ivezic, Zeljko; Connolly, Andrew J.; Vanderplas, Jacob

    2015-01-01

    We present AstroML, a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy, and distributed under an open license. AstroML contains a growing library of statistical and machine learning routines for analyzing astronomical data in Python, loaders for several open astronomical datasets (such as SDSS and other recent major surveys), and a large suite of examples of analyzing and visualizing astronomical datasets. AstroML is especially suitable for introducing undergraduate students to numerical research projects and for graduate students to rapidly undertake cutting-edge research. The long-term goal of astroML is to provide a community repository for fast Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics (see http://www.astroml.org).

  4. Machine learning (ML)-guided OPC using basis functions of polar Fourier transform

    Science.gov (United States)

    Choi, Suhyeong; Shim, Seongbo; Shin, Youngsoo

    2016-03-01

    With shrinking feature size, runtime has become a limitation of model-based OPC (MB-OPC). A few machine learning-guided OPC (ML-OPC) have been studied as candidates for next-generation OPC, but they all employ too many parameters (e.g. local densities), which set their own limitations. We propose to use basis functions of polar Fourier transform (PFT) as parameters of ML-OPC. Since PFT functions are orthogonal each other and well reflect light phenomena, the number of parameters can significantly be reduced without loss of OPC accuracy. Experiments demonstrate that our new ML-OPC achieves 80% reduction in OPC time and 35% reduction in the error of predicted mask bias when compared to conventional ML-OPC.

  5. ML Confidential : machine learning on encrypted data

    NARCIS (Netherlands)

    Graepel, T.; Lauter, K.; Naehrig, M.; Kwon, T.; Lee, M.-K.; Kwon, D.

    2013-01-01

    We demonstrate that, by using a recently proposed leveled homomorphic encryption scheme, it is possible to delegate the execution of a machine learning algorithm to a computing service while retaining con¿dentiality of the training and test data. Since the computational complexity of the homomorphic

  6. Quantum Machine Learning

    OpenAIRE

    Romero García, Cristian

    2017-01-01

    [EN] In a world in which accessible information grows exponentially, the selection of the appropriate information turns out to be an extremely relevant problem. In this context, the idea of Machine Learning (ML), a subfield of Artificial Intelligence, emerged to face problems in data mining, pattern recognition, automatic prediction, among others. Quantum Machine Learning is an interdisciplinary research area combining quantum mechanics with methods of ML, in which quantum properties allow fo...

  7. Machine Learning of Musical Gestures

    OpenAIRE

    Caramiaux, Baptiste; Tanaka, Atau

    2013-01-01

    We present an overview of machine learning (ML) techniques and theirapplication in interactive music and new digital instruments design. We firstgive to the non-specialist reader an introduction to two ML tasks,classification and regression, that are particularly relevant for gesturalinteraction. We then present a review of the literature in current NIMEresearch that uses ML in musical gesture analysis and gestural sound control.We describe the ways in which machine learning is useful for cre...

  8. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  9. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  10. ML-o-Scope: A Diagnostic Visualization System for Deep Machine Learning Pipelines

    Science.gov (United States)

    2014-05-16

    Huawei , Intel, Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware, WANdisco and Yahoo!. ML-o-scope: a diagnostic visualization system for deep machine...Facebook, GameOnTalis, Guavus, HP, Huawei , Intel, Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware, WANdisco and Yahoo!. References [1] Bruna, J., and

  11. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  12. Introduction to Machine Learning: Class Notes 67577

    OpenAIRE

    Shashua, Amnon

    2009-01-01

    Introduction to Machine learning covering Statistical Inference (Bayes, EM, ML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering), and PAC learning (the Formal model, VC dimension, Double Sampling theorem).

  13. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  14. Machine learning for epigenetics and future medical applications.

    Science.gov (United States)

    Holder, Lawrence B; Haque, M Muksitul; Skinner, Michael K

    2017-07-03

    Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.

  15. Machine learning for healthcare technologies

    CERN Document Server

    Clifton, David A

    2016-01-01

    This book brings together chapters on the state-of-the-art in machine learning (ML) as it applies to the development of patient-centred technologies, with a special emphasis on 'big data' and mobile data.

  16. Machine learning for adaptive many-core machines a practical approach

    CERN Document Server

    Lopes, Noel

    2015-01-01

    The overwhelming data produced everyday and the increasing performance and cost requirements of applications?are transversal to a wide range of activities in society, from science to industry. In particular, the magnitude and complexity of the tasks that Machine Learning (ML) algorithms have to solve are driving the need to devise adaptive many-core machines that scale well with the volume of data, or in other words, can handle Big Data.This book gives a concise view on how to extend the applicability of well-known ML algorithms in Graphics Processing Unit (GPU) with data scalability in mind.

  17. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  18. Quantum machine learning: a classical perspective

    Science.gov (United States)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed. PMID:29434508

  19. Quantum machine learning: a classical perspective.

    Science.gov (United States)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  20. Quantum machine learning: a classical perspective

    Science.gov (United States)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  1. Machine learning for epigenetics and future medical applications

    OpenAIRE

    Holder, Lawrence B.; Haque, M. Muksitul; Skinner, Michael K.

    2017-01-01

    ABSTRACT Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems w...

  2. Using Machine Learning in Adversarial Environments.

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Warren Leon [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-02-01

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approaches only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.

  3. The ATLAS Higgs Machine Learning Challenge

    CERN Document Server

    Cowan, Glen; The ATLAS collaboration; Bourdarios, Claire

    2015-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 1990s with Artificial Neural Net and more recently with Boosted Decision Trees, Random Forest etc. Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, data scientists are developing new Machine Learning algorithms to extract meaning from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, and at the same time data scientists have advanced algorithms: the goal of the HiggsML project was to bring the two together by a “challenge”: participants from all over the world and any scientific background could compete online to obtain the best Higgs to tau tau signal significance on a set of ATLAS fully simulated Monte Carlo signal and background. Instead of HEP physicists browsing through machine learning papers and trying to infer which new algorithms might be useful for HEP, then c...

  4. Higgs Machine Learning Challenge 2014

    CERN Document Server

    Olivier, A-P; Bourdarios, C ; LAL / Orsay; Goldfarb, S ; University of Michigan

    2014-01-01

    High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics. We are now exploring new ways to improve the cross-fertilization of the two fields by setting up a data challenge, following the footsteps of, among others, the astrophysics community (dark matter and galaxy zoo challenges) and neurobiology (connectomics and decoding the human brain). The organization committee consists of ATLAS physicists and machine learning researchers. The Challenge will run from Monday 12th to September 2014.

  5. Strategies and Principles of Distributed Machine Learning on Big Data

    Directory of Open Access Journals (Sweden)

    Eric P. Xing

    2016-06-01

    Full Text Available The rise of big data has led to new demands for machine learning (ML systems to learn complex models, with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions thereupon. In order to run ML algorithms at such scales, on a distributed cluster with tens to thousands of machines, it is often the case that significant engineering efforts are required—and one might fairly ask whether such engineering truly falls within the domain of ML research. Taking the view that “big” ML systems can benefit greatly from ML-rooted statistical and algorithmic insights—and that ML researchers should therefore not shy away from such systems design—we discuss a series of principles and strategies distilled from our recent efforts on industrial-scale ML solutions. These principles and strategies span a continuum from application, to engineering, and to theoretical research and development of big ML systems and architectures, with the goal of understanding how to make them efficient, generally applicable, and supported with convergence and scaling guarantees. They concern four key questions that traditionally receive little attention in ML research: How can an ML program be distributed over a cluster? How can ML computation be bridged with inter-machine communication? How can such communication be performed? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typically seen in traditional computer programs, and by dissecting successful cases to reveal how we have harnessed these principles to design and develop both high-performance distributed ML software as well as general-purpose ML frameworks, we present opportunities for ML researchers and practitioners to further shape and enlarge the area

  6. Interactive machine learning for health informatics: when do we need the human-in-the-loop?

    Science.gov (United States)

    Holzinger, Andreas

    2016-06-01

    Machine learning (ML) is the fastest growing field in computer science, and health informatics is among the greatest challenges. The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. Most ML researchers concentrate on automatic machine learning (aML), where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from big data with many training sets. However, in the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive machine learning (iML) may be of help, having its roots in reinforcement learning, preference learning, and active learning. The term iML is not yet well used, so we define it as "algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human." This "human-in-the-loop" can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem, reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase.

  7. Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.

    Science.gov (United States)

    Luo, Gang; Stone, Bryan L; Johnson, Michael D; Tarczy-Hornoch, Peter; Wilcox, Adam B; Mooney, Sean D; Sheng, Xiaoming; Haug, Peter J; Nkoy, Flory L

    2017-08-29

    To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets. This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care

  8. In silico machine learning methods in drug development.

    Science.gov (United States)

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases.

  9. Unintended consequences of machine learning in medicine?

    Science.gov (United States)

    McDonald, Laura; Ramagopalan, Sreeram V; Cox, Andrew P; Oguz, Mustafa

    2017-01-01

    Machine learning (ML) has the potential to significantly aid medical practice. However, a recent article highlighted some negative consequences that may arise from using ML decision support in medicine. We argue here that whilst the concerns raised by the authors may be appropriate, they are not specific to ML, and thus the article may lead to an adverse perception about this technique in particular. Whilst ML is not without its limitations like any methodology, a balanced view is needed in order to not hamper its use in potentially enabling better patient care.

  10. Machine learning versus knowledge based classification of legal texts

    NARCIS (Netherlands)

    de Maat, E.; Krabben, K.; Winkels, R.; Winkels, R.G.F.

    2010-01-01

    This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but

  11. MLitB: machine learning in the browser

    Directory of Open Access Journals (Sweden)

    Edward Meeds

    2015-07-01

    Full Text Available With few exceptions, the field of Machine Learning (ML research has largely ignored the browser as a computational engine. Beyond an educational resource for ML, the browser has vast potential to not only improve the state-of-the-art in ML research, but also, inexpensively and on a massive scale, to bring sophisticated ML learning and prediction to the public at large. This paper introduces MLitB, a prototype ML framework written entirely in Javascript, capable of performing large-scale distributed computing with heterogeneous classes of devices. The development of MLitB has been driven by several underlying objectives whose aim is to make ML learning and usage ubiquitous (by using ubiquitous compute devices, cheap and effortlessly distributed, and collaborative. This is achieved by allowing every internet capable device to run training algorithms and predictive models with no software installation and by saving models in universally readable formats. Our prototype library is capable of training deep neural networks with synchronized, distributed stochastic gradient descent. MLitB offers several important opportunities for novel ML research, including: development of distributed learning algorithms, advancement of web GPU algorithms, novel field and mobile applications, privacy preserving computing, and green grid-computing. MLitB is available as open source software.

  12. Machine Learning of Fault Friction

    Science.gov (United States)

    Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.

    2017-12-01

    We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025

  13. Towards a Future Reallocation of Work between Humans and Machines – Taxonomy of Tasks and Interaction Types in the Context of Machine Learning

    OpenAIRE

    Traumer, Fabian; Oeste-Reiß, Sarah; Leimeister, Jan Marco

    2017-01-01

    In today’s race for competitive advantages, more and more companies implement innovations in artificial intelligence and machine learning (ML). Although these machines take over tasks that have been executed by humans, they will not make human workforce obsolete. To leverage the potentials of ML, collaboration between humans and machines is necessary. Before collaboration processes can be developed, a classification of tasks in the field of ML is needed. Therefore, we present a taxonomy for t...

  14. The ATLAS Higgs machine learning challenge

    CERN Document Server

    Davey, W; The ATLAS collaboration; Rousseau, D; Cowan, G; Kegl, B; Germain-Renaud, C; Guyon, I

    2014-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 90's with Artificial Neural Net for example, more recently with Boosted Decision Trees, Random Forest etc... Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, Data Scientists are developing new Machine Learning algorithms to extract sense from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, data scientists have advanced algorithms: the goal of the HiggsML project is to bring the two together by a “challenge”: participants from all over the world and any scientific background can compete online ( https://www.kaggle.com/c/higgs-boson ) to obtain the best Higgs to tau tau signal significance on a set of ATLAS full simulated Monte Carlo signal and background. Winners with the best scores will receive money prizes ; authors of the best method (most usable) will be invited t...

  15. Supporting visual quality assessment with machine learning

    NARCIS (Netherlands)

    Gastaldo, P.; Zunino, R.; Redi, J.

    2013-01-01

    Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality

  16. RED-ML

    DEFF Research Database (Denmark)

    Xiong, Heng; Liu, Dongbing; Li, Qiye

    2017-01-01

    using diverse RNA-seq datasets, we have developed a software tool, RED-ML: RNA Editing Detection based on Machine learning (pronounced as "red ML"). The input to RED-ML can be as simple as a single BAM file, while it can also take advantage of matched genomic variant information when available...... accurately detect novel RNA editing sites without relying on curated RNA editing databases. We have also made this tool freely available via GitHub . We have developed a highly accurate, speedy and general-purpose tool for RNA editing detection using RNA-seq data....... With the availability of RED-ML, it is now possible to conveniently make RNA editing a routine analysis of RNA-seq. We believe this can greatly benefit the RNA editing research community and has profound impact to accelerate our understanding of this intriguing posttranscriptional modification process....

  17. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective

    International Nuclear Information System (INIS)

    Kang, John; Schwartz, Russell; Flickinger, John; Beriwal, Sushil

    2015-01-01

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models. These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, “spam” filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the “barrier to entry” for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods—logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)—and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation.

  18. Acceleration of saddle-point searches with machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, Andrew A., E-mail: andrew-peterson@brown.edu [School of Engineering, Brown University, Providence, Rhode Island 02912 (United States)

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  19. Acceleration of saddle-point searches with machine learning

    International Nuclear Information System (INIS)

    Peterson, Andrew A.

    2016-01-01

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  20. Acceleration of saddle-point searches with machine learning.

    Science.gov (United States)

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  1. Improving orbit prediction accuracy through supervised machine learning

    Science.gov (United States)

    Peng, Hao; Bai, Xiaoli

    2018-05-01

    Due to the lack of information such as the space environment condition and resident space objects' (RSOs') body characteristics, current orbit predictions that are solely grounded on physics-based models may fail to achieve required accuracy for collision avoidance and have led to satellite collisions already. This paper presents a methodology to predict RSOs' trajectories with higher accuracy than that of the current methods. Inspired by the machine learning (ML) theory through which the models are learned based on large amounts of observed data and the prediction is conducted without explicitly modeling space objects and space environment, the proposed ML approach integrates physics-based orbit prediction algorithms with a learning-based process that focuses on reducing the prediction errors. Using a simulation-based space catalog environment as the test bed, the paper demonstrates three types of generalization capability for the proposed ML approach: (1) the ML model can be used to improve the same RSO's orbit information that is not available during the learning process but shares the same time interval as the training data; (2) the ML model can be used to improve predictions of the same RSO at future epochs; and (3) the ML model based on a RSO can be applied to other RSOs that share some common features.

  2. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective.

    Science.gov (United States)

    Kang, John; Schwartz, Russell; Flickinger, John; Beriwal, Sushil

    2015-12-01

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models. These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, "spam" filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the "barrier to entry" for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods--logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)--and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. International Conference ML4CPS 2016

    CERN Document Server

    Niggemann, Oliver; Kühnert, Christian

    2017-01-01

    The work presents new approaches to Machine Learning for Cyber Physical Systems, experiences and visions. It contains some selected papers from the international Conference ML4CPS – Machine Learning for Cyber Physical Systems, which was held in Karlsruhe, September 29th, 2016. Cyber Physical Systems are characterized by their ability to adapt and to learn: They analyze their environment and, based on observations, they learn patterns, correlations and predictive models. Typical applications are condition monitoring, predictive maintenance, image processing and diagnosis. Machine Learning is the key technology for these developments. The Editors Prof. Dr.-Ing. Jürgen Beyerer is Professor at the Department for Interactive Real-Time Systems at the Karlsruhe Institute of Technology. In addition he manages the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB. Prof. Dr. Oliver Niggemann is Professor for Embedded Software Engineering. His research interests are in the field of Di...

  4. Exploration of Machine Learning Approaches to Predict Pavement Performance

    Science.gov (United States)

    2018-03-23

    Machine learning (ML) techniques were used to model and predict pavement condition index (PCI) for various pavement types using a variety of input variables. The primary objective of this research was to develop and assess PCI predictive models for t...

  5. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    Science.gov (United States)

    S K, Somasundaram; P, Alli

    2017-11-09

    The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection

  6. Machine learning \\& artificial intelligence in the quantum domain

    OpenAIRE

    Dunjko, Vedran; Briegel, Hans J.

    2017-01-01

    Quantum information technologies, and intelligent learning systems, are both emergent technologies that will likely have a transforming impact on our society. The respective underlying fields of research -- quantum information (QI) versus machine learning (ML) and artificial intelligence (AI) -- have their own specific challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been probing the question to what extent th...

  7. 1st International Conference on Machine Learning for Cyber Physical Systems and Industry 4.0

    CERN Document Server

    Beyerer, Jürgen

    2016-01-01

    The work presents new approaches to Machine Learning for Cyber Physical Systems, experiences and visions. It contains some selected papers from the international Conference ML4CPS – Machine Learning for Cyber Physical Systems, which was held in Lemgo, October 1-2, 2015. Cyber Physical Systems are characterized by their ability to adapt and to learn: They analyze their environment and, based on observations, they learn patterns, correlations and predictive models. Typical applications are condition monitoring, predictive maintenance, image processing and diagnosis. Machine Learning is the key technology for these developments.

  8. Machine Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  9. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective

    Energy Technology Data Exchange (ETDEWEB)

    Kang, John [Medical Scientist Training Program, University of Pittsburgh-Carnegie Mellon University, Pittsburgh, Pennsylvania (United States); Schwartz, Russell [Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania (United States); Flickinger, John [Departments of Radiation Oncology and Neurological Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania (United States); Beriwal, Sushil, E-mail: beriwals@upmc.edu [Department of Radiation Oncology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania (United States)

    2015-12-01

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models. These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, “spam” filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the “barrier to entry” for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods—logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)—and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation.

  10. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  11. Robustness and prediction accuracy of machine learning for objective visual quality assessment

    OpenAIRE

    HINES, ANDREW

    2014-01-01

    PUBLISHED Lisbon, Portugal Machine Learning (ML) is a powerful tool to support the development of objective visual quality assessment metrics, serving as a substitute model for the perceptual mechanisms acting in visual quality appreciation. Nevertheless, the reli- ability of ML-based techniques within objective quality as- sessment metrics is often questioned. In this study, the ro- bustness of ML in supporting objective quality assessment is investigated, specific...

  12. Machine Learning Techniques for Modelling Short Term Land-Use Change

    Directory of Open Access Journals (Sweden)

    Mileva Samardžić-Petrović

    2017-11-01

    Full Text Available The representation of land use change (LUC is often achieved by using data-driven methods that include machine learning (ML techniques. The main objectives of this research study are to implement three ML techniques, Decision Trees (DT, Neural Networks (NN, and Support Vector Machines (SVM for LUC modeling, in order to compare these three ML techniques and to find the appropriate data representation. The ML techniques are applied on the case study of LUC in three municipalities of the City of Belgrade, the Republic of Serbia, using historical geospatial data sets and considering nine land use classes. The ML models were built and assessed using two different time intervals. The information gain ranking technique and the recursive attribute elimination procedure were implemented to find the most informative attributes that were related to LUC in the study area. The results indicate that all three ML techniques can be used effectively for short-term forecasting of LUC, but the SVM achieved the highest agreement of predicted changes.

  13. Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

    Directory of Open Access Journals (Sweden)

    Jiali Du

    2014-12-01

    Full Text Available This paper discusses the application of computational linguistics in the machine learning (ML system for the processing of garden path sentences. ML is closely related to artificial intelligence and linguistic cognition. The rapid and efficient processing of the complex structures is an effective method to test the system. By means of parsing the garden path sentence, we draw a conclusion that the integration of theoretical and statistical methods is helpful for the development of ML system.

  14. ROBUSTNESS AND PREDICTION ACCURACY OF MACHINE LEARNING FOR OBJECTIVE VISUAL QUALITY ASSESSMENT

    OpenAIRE

    Hines, Andrew; Kendrick, Paul; Barri, Adriaan; Narwaria, Manish; Redi, Judith A.

    2014-01-01

    Machine Learning (ML) is a powerful tool to support the development of objective visual quality assessment metrics, serving as a substitute model for the perceptual mechanisms acting in visual quality appreciation. Nevertheless, the reliability of ML-based techniques within objective quality assessment metrics is often questioned. In this study, the robustness of ML in supporting objective quality assessment is investigated, specifically when the feature set adopted for prediction is suboptim...

  15. Machine learning in computational biology to accelerate high-throughput protein expression

    DEFF Research Database (Denmark)

    Sastry, Anand; Monk, Jonathan M.; Tegel, Hanna

    2017-01-01

    and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide...... the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. Availability and implementation: We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template...

  16. Supervised Machine Learning for Population Genetics: A New Paradigm

    Science.gov (United States)

    Schrider, Daniel R.; Kern, Andrew D.

    2018-01-01

    As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics. PMID:29331490

  17. Wind Power Ramp Events Prediction with Hybrid Machine Learning Regression Techniques and Reanalysis Data

    Directory of Open Access Journals (Sweden)

    Laura Cornejo-Bueno

    2017-11-01

    Full Text Available Wind Power Ramp Events (WPREs are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this paper, we tackle the problem of predicting WPREs by applying Machine Learning (ML regression techniques. Our approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, we have explored the feasibility of a number of state-of-the-art ML regression techniques, such as support vector regression, artificial neural networks (multi-layer perceptrons and extreme learning machines and Gaussian processes to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this paper because of its accuracy and high resolution (in both spatial and temporal domains. Aiming at validating the feasibility of our predicting approach, we have carried out an extensive experimental work using real data from three wind farms in Spain, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.

  18. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    Directory of Open Access Journals (Sweden)

    Philippe Burlina

    Full Text Available To evaluate the use of ultrasound coupled with machine learning (ML and deep learning (DL techniques for automated or semi-automated classification of myositis.Eighty subjects comprised of 19 with inclusion body myositis (IBM, 14 with polymyositis (PM, 14 with dermatomyositis (DM, and 33 normal (N subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally were acquired. We considered three problems of classification including (A normal vs. affected (DM, PM, IBM; (B normal vs. IBM patients; and (C IBM vs. other types of myositis (DM or PM. We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification.The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A, 86.6% ± 2.4% for (B and 74.8% ± 3.9% for (C, while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A, 84.3% ± 2.3% for (B and 68.9% ± 2.5% for (C.This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  19. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    Science.gov (United States)

    Burlina, Philippe; Billings, Seth; Joshi, Neil; Albayda, Jemima

    2017-01-01

    To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  20. Stacking machine learning classifiers to identify Higgs bosons at the LHC

    International Nuclear Information System (INIS)

    Alves, A.

    2017-01-01

    Machine learning (ML) algorithms have been employed in the problem of classifying signal and background events with high accuracy in particle physics. In this paper, we compare the performance of a widespread ML technique, namely, stacked generalization , against the results of two state-of-art algorithms: (1) a deep neural network (DNN) in the task of discovering a new neutral Higgs boson and (2) a scalable machine learning system for tree boosting, in the Standard Model Higgs to tau leptons channel, both at the 8 TeV LHC. In a cut-and-count analysis, stacking three algorithms performed around 16% worse than DNN but demanding far less computation efforts, however, the same stacking outperforms boosted decision trees. Using the stacked classifiers in a multivariate statistical analysis (MVA), on the other hand, significantly enhances the statistical significance compared to cut-and-count in both Higgs processes, suggesting that combining an ensemble of simpler and faster ML algorithms with MVA tools is a better approach than building a complex state-of-art algorithm for cut-and-count.

  1. Machine Learning.

    Science.gov (United States)

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  2. Use of machine learning approaches for novel drug discovery.

    Science.gov (United States)

    Lima, Angélica Nakagawa; Philot, Eric Allison; Trossini, Gustavo Henrique Goulart; Scott, Luis Paulo Barbour; Maltarollo, Vinícius Gonçalves; Honorio, Kathia Maria

    2016-01-01

    The use of computational tools in the early stages of drug development has increased in recent decades. Machine learning (ML) approaches have been of special interest, since they can be applied in several steps of the drug discovery methodology, such as prediction of target structure, prediction of biological activity of new ligands through model construction, discovery or optimization of hits, and construction of models that predict the pharmacokinetic and toxicological (ADMET) profile of compounds. This article presents an overview on some applications of ML techniques in drug design. These techniques can be employed in ligand-based drug design (LBDD) and structure-based drug design (SBDD) studies, such as similarity searches, construction of classification and/or prediction models of biological activity, prediction of secondary structures and binding sites docking and virtual screening. Successful cases have been reported in the literature, demonstrating the efficiency of ML techniques combined with traditional approaches to study medicinal chemistry problems. Some ML techniques used in drug design are: support vector machine, random forest, decision trees and artificial neural networks. Currently, an important application of ML techniques is related to the calculation of scoring functions used in docking and virtual screening assays from a consensus, combining traditional and ML techniques in order to improve the prediction of binding sites and docking solutions.

  3. HLS4ML: deploying deep learning on FPGAs for L1 trigger and Data Acquisition

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Machine learning is becoming ubiquitous across HEP. There is great potential to improve trigger and DAQ performances with it. However, the exploration of such techniques within the field in low latency/power FPGAs has just begun. We present HLS4ML, a user-friendly software, based on High-Level Synthesis (HLS), designed to deploy network architectures on FPGAs. As a case study, we use HLS4ML for boosted-jet tagging with deep networks at the LHC. We show how neural networks can be made fit the resources available on modern FPGAs, thanks to network pruning and quantization. We map out resource usage and latency versus network architectures, to identify the typical problem complexity that HLS4ML could deal with. We discuss possible applications in current and future HEP experiments.

  4. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  5. Can machine learning explain human learning?

    NARCIS (Netherlands)

    Vahdat, M.; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

    2016-01-01

    Learning Analytics (LA) has a major interest in exploring and understanding the learning process of humans and, for this purpose, benefits from both Cognitive Science, which studies how humans learn, and Machine Learning, which studies how algorithms learn from data. Usually, Machine Learning is

  6. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  7. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  8. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing.

    Science.gov (United States)

    Oyetunde, Tolutola; Bao, Forrest Sheng; Chen, Jiung-Wen; Martin, Hector Garcia; Tang, Yinjie J

    2018-05-03

    Genome scale modeling (GSM) predicts the performance of microbial workhorses and helps identify beneficial gene targets. GSM integrated with intracellular flux dynamics, omics, and thermodynamics have shown remarkable progress in both elucidating complex cellular phenomena and computational strain design (CSD). Nonetheless, these models still show high uncertainty due to a poor understanding of innate pathway regulations, metabolic burdens, and other factors (such as stress tolerance and metabolite channeling). Besides, the engineered hosts may have genetic mutations or non-genetic variations in bioreactor conditions and thus CSD rarely foresees fermentation rate and titer. Metabolic models play important role in design-build-test-learn cycles for strain improvement, and machine learning (ML) may provide a viable complementary approach for driving strain design and deciphering cellular processes. In order to develop quality ML models, knowledge engineering leverages and standardizes the wealth of information in literature (e.g., genomic/phenomic data, synthetic biology strategies, and bioprocess variables). Data driven frameworks can offer new constraints for mechanistic models to describe cellular regulations, to design pathways, to search gene targets, and to estimate fermentation titer/rate/yield under specified growth conditions (e.g., mixing, nutrients, and O 2 ). This review highlights the scope of information collections, database constructions, and machine learning techniques (such as deep learning and transfer learning), which may facilitate "Learn and Design" for strain development. Copyright © 2018. Published by Elsevier Inc.

  9. An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.

    Science.gov (United States)

    Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein

    2017-12-22

    The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.

  10. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  11. Machine Learning and Inverse Problem in Geodynamics

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R.

    2017-12-01

    During the past few decades numerical modeling and traditional HPC have been widely deployed in many diverse fields for problem solutions. However, in recent years the rapid emergence of machine learning (ML), a subfield of the artificial intelligence (AI), in many fields of sciences, engineering, and finance seems to mark a turning point in the replacement of traditional modeling procedures with artificial intelligence-based techniques. The study of the circulation in the interior of Earth relies on the study of high pressure mineral physics, geochemistry, and petrology where the number of the mantle parameters is large and the thermoelastic parameters are highly pressure- and temperature-dependent. More complexity arises from the fact that many of these parameters that are incorporated in the numerical models as input parameters are not yet well established. In such complex systems the application of machine learning algorithms can play a valuable role. Our focus in this study is the application of supervised machine learning (SML) algorithms in predicting mantle properties with the emphasis on SML techniques in solving the inverse problem. As a sample problem we focus on the spin transition in ferropericlase and perovskite that may cause slab and plume stagnation at mid-mantle depths. The degree of the stagnation depends on the degree of negative density anomaly at the spin transition zone. The training and testing samples for the machine learning models are produced by the numerical convection models with known magnitudes of density anomaly (as the class labels of the samples). The volume fractions of the stagnated slabs and plumes which can be considered as measures for the degree of stagnation are assigned as sample features. The machine learning models can determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at mid-mantle depths. Employing support vector machine (SVM) algorithms we show that SML techniques

  12. Advanced methods in NDE using machine learning approaches

    Science.gov (United States)

    Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

    2018-04-01

    Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability

  13. Quantitative forecasting of PTSD from early trauma responses: A Machine Learning application

    DEFF Research Database (Denmark)

    Galatzer-Levy, I. R.; Karstoft, K. I.; Statnikov, A.

    2014-01-01

    -traumatic stress disorder (PTSD) is plausible given the disorder's salient onset and the abundance of putative biological and clinical risk indicators. This work evaluates the ability of Machine Learning (ML) forecasting approaches to identify and integrate a panel of unique predictive characteristics...... algorithm identified a set of predictors that rendered all others redundant. Support Vector Machines (SVMs) as well as other ML classification algorithms were used to evaluate the forecasting accuracy of i) ML selected features, ii) all available features without selection, and iii) Acute Stress Disorder......). The feature selection algorithm identified 16 predictors, present in >= 95% cross-validation trials. The accuracy of predicting non-remitting PTSD from that set (AUC = .77) did not differ from predicting from all available information (AUC = .78). Predicting from ASD symptoms was not better then chance (AUC...

  14. Machine learning and radiology.

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  15. Modeling Music Emotion Judgments Using Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Naresh N. Vempala

    2018-01-01

    Full Text Available Emotion judgments and five channels of physiological data were obtained from 60 participants listening to 60 music excerpts. Various machine learning (ML methods were used to model the emotion judgments inclusive of neural networks, linear regression, and random forests. Input for models of perceived emotion consisted of audio features extracted from the music recordings. Input for models of felt emotion consisted of physiological features extracted from the physiological recordings. Models were trained and interpreted with consideration of the classic debate in music emotion between cognitivists and emotivists. Our models supported a hybrid position wherein emotion judgments were influenced by a combination of perceived and felt emotions. In comparing the different ML approaches that were used for modeling, we conclude that neural networks were optimal, yielding models that were flexible as well as interpretable. Inspection of a committee machine, encompassing an ensemble of networks, revealed that arousal judgments were predominantly influenced by felt emotion, whereas valence judgments were predominantly influenced by perceived emotion.

  16. Machine Learning for High-Throughput Stress Phenotyping in Plants.

    Science.gov (United States)

    Singh, Arti; Ganapathysubramanian, Baskar; Singh, Asheesh Kumar; Sarkar, Soumik

    2016-02-01

    Advances in automated and high-throughput imaging technologies have resulted in a deluge of high-resolution images and sensor data of plants. However, extracting patterns and features from this large corpus of data requires the use of machine learning (ML) tools to enable data assimilation and feature identification for stress phenotyping. Four stages of the decision cycle in plant stress phenotyping and plant breeding activities where different ML approaches can be deployed are (i) identification, (ii) classification, (iii) quantification, and (iv) prediction (ICQP). We provide here a comprehensive overview and user-friendly taxonomy of ML tools to enable the plant community to correctly and easily apply the appropriate ML tools and best-practice guidelines for various biotic and abiotic stress traits. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  18. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  19. A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

    NARCIS (Netherlands)

    Melo, Rita; Fieldhouse, Robert; Melo, André; Correia, João D G; Cordeiro, Maria Natália D S; Gümüş, Zeynep H; Costa, Joaquim; Bonvin, Alexandre M J J; de Sousa Moreira, Irina

    2016-01-01

    Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model

  20. Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization.

    Science.gov (United States)

    Turk, Samo; Merget, Benjamin; Rippmann, Friedrich; Fulle, Simone

    2017-12-26

    Matched molecular pair (MMP) analyses are widely used in compound optimization projects to gain insights into structure-activity relationships (SAR). The analysis is traditionally done via statistical methods but can also be employed together with machine learning (ML) approaches to extrapolate to novel compounds. The here introduced MMP/ML method combines a fragment-based MMP implementation with different machine learning methods to obtain automated SAR decomposition and prediction. To test the prediction capabilities and model transferability, two different compound optimization scenarios were designed: (1) "new fragments" which occurs when exploring new fragments for a defined compound series and (2) "new static core and transformations" which resembles for instance the identification of a new compound series. Very good results were achieved by all employed machine learning methods especially for the new fragments case, but overall deep neural network models performed best, allowing reliable predictions also for the new static core and transformations scenario, where comprehensive SAR knowledge of the compound series is missing. Furthermore, we show that models trained on all available data have a higher generalizability compared to models trained on focused series and can extend beyond chemical space covered in the training data. Thus, coupling MMP with deep neural networks provides a promising approach to make high quality predictions on various data sets and in different compound optimization scenarios.

  1. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  2. Machine learning applications in cancer prognosis and prediction.

    Science.gov (United States)

    Kourou, Konstantina; Exarchos, Themis P; Exarchos, Konstantinos P; Karamouzis, Michalis V; Fotiadis, Dimitrios I

    2015-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

  3. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  4. Video Quality Assessment and Machine Learning: Performance and Interpretability

    DEFF Research Database (Denmark)

    Søgaard, Jacob; Forchhammer, Søren; Korhonen, Jari

    2015-01-01

    In this work we compare a simple and a complex Machine Learning (ML) method used for the purpose of Video Quality Assessment (VQA). The simple ML method chosen is the Elastic Net (EN), which is a regularized linear regression model and easier to interpret. The more complex method chosen is Support...... Vector Regression (SVR), which has gained popularity in VQA research. Additionally, we present an ML-based feature selection method. Also, it is investigated how well the methods perform when tested on videos from other datasets. Our results show that content-independent cross-validation performance...... on a single dataset can be misleading and that in the case of very limited training and test data, especially in regards to different content as is the case for many video datasets, a simple ML approach is the better choice....

  5. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  6. Introduction to machine learning

    OpenAIRE

    Baştanlar, Yalın; Özuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning app...

  7. Machine learning methods for clinical forms analysis in mental health.

    Science.gov (United States)

    Strauss, John; Peguero, Arturo Martinez; Hirst, Graeme

    2013-01-01

    In preparation for a clinical information system implementation, the Centre for Addiction and Mental Health (CAMH) Clinical Information Transformation project completed multiple preparation steps. An automated process was desired to supplement the onerous task of manual analysis of clinical forms. We used natural language processing (NLP) and machine learning (ML) methods for a series of 266 separate clinical forms. For the investigation, documents were represented by feature vectors. We used four ML algorithms for our examination of the forms: cluster analysis, k-nearest neigh-bours (kNN), decision trees and support vector machines (SVM). Parameters for each algorithm were optimized. SVM had the best performance with a precision of 64.6%. Though we did not find any method sufficiently accurate for practical use, to our knowledge this approach to forms has not been used previously in mental health.

  8. Deep learning: Using machine learning to study biological vision

    OpenAIRE

    Majaj, Najib; Pelli, Denis

    2017-01-01

    Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.

  9. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  10. Machine learning and medical imaging

    CERN Document Server

    Shen, Dinggang; Sabuncu, Mert

    2016-01-01

    Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, a...

  11. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  12. Using Machine Learning to Predict MCNP Bias

    Energy Technology Data Exchange (ETDEWEB)

    Grechanuk, Pavel Aleksandrovi [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2018-01-09

    For many real-world applications in radiation transport where simulations are compared to experimental measurements, like in nuclear criticality safety, the bias (simulated - experimental keff) in the calculation is an extremely important quantity used for code validation. The objective of this project is to accurately predict the bias of MCNP6 [1] criticality calculations using machine learning (ML) algorithms, with the intention of creating a tool that can complement the current nuclear criticality safety methods. In the latest release of MCNP6, the Whisper tool is available for criticality safety analysts and includes a large catalogue of experimental benchmarks, sensitivity profiles, and nuclear data covariance matrices. This data, coming from 1100+ benchmark cases, is used in this study of ML algorithms for criticality safety bias predictions.

  13. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  14. Model-based machine learning.

    Science.gov (United States)

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  15. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  16. Machine Learning and Neurosurgical Outcome Prediction: A Systematic Review.

    Science.gov (United States)

    Senders, Joeky T; Staples, Patrick C; Karhade, Aditya V; Zaki, Mark M; Gormley, William B; Broekman, Marike L D; Smith, Timothy R; Arnaout, Omar

    2018-01-01

    Accurate measurement of surgical outcomes is highly desirable to optimize surgical decision-making. An important element of surgical decision making is identification of the patient cohort that will benefit from surgery before the intervention. Machine learning (ML) enables computers to learn from previous data to make accurate predictions on new data. In this systematic review, we evaluate the potential of ML for neurosurgical outcome prediction. A systematic search in the PubMed and Embase databases was performed to identify all potential relevant studies up to January 1, 2017. Thirty studies were identified that evaluated ML algorithms used as prediction models for survival, recurrence, symptom improvement, and adverse events in patients undergoing surgery for epilepsy, brain tumor, spinal lesions, neurovascular disease, movement disorders, traumatic brain injury, and hydrocephalus. Depending on the specific prediction task evaluated and the type of input features included, ML models predicted outcomes after neurosurgery with a median accuracy and area under the receiver operating curve of 94.5% and 0.83, respectively. Compared with logistic regression, ML models performed significantly better and showed a median absolute improvement in accuracy and area under the receiver operating curve of 15% and 0.06, respectively. Some studies also demonstrated a better performance in ML models compared with established prognostic indices and clinical experts. In the research setting, ML has been studied extensively, demonstrating an excellent performance in outcome prediction for a wide range of neurosurgical conditions. However, future studies should investigate how ML can be implemented as a practical tool supporting neurosurgical care. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Student Modeling and Machine Learning

    OpenAIRE

    Sison , Raymund; Shimura , Masamichi

    1998-01-01

    After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine le...

  18. Online transfer learning with extreme learning machine

    Science.gov (United States)

    Yin, Haibo; Yang, Yun-an

    2017-05-01

    In this paper, we propose a new transfer learning algorithm for online training. The proposed algorithm, which is called Online Transfer Extreme Learning Machine (OTELM), is based on Online Sequential Extreme Learning Machine (OSELM) while it introduces Semi-Supervised Extreme Learning Machine (SSELM) to transfer knowledge from the source to the target domain. With the manifold regularization, SSELM picks out instances from the source domain that are less relevant to those in the target domain to initialize the online training, so as to improve the classification performance. Experimental results demonstrate that the proposed OTELM can effectively use instances in the source domain to enhance the learning performance.

  19. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

    Science.gov (United States)

    Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao

    2017-11-01

    Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Creativity in Machine Learning

    OpenAIRE

    Thoma, Martin

    2016-01-01

    Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summary of the current work and give people who are new to machine learning some starting points.

  1. An introduction and overview of machine learning in neurosurgical care.

    Science.gov (United States)

    Senders, Joeky T; Zaki, Mark M; Karhade, Aditya V; Chang, Bliss; Gormley, William B; Broekman, Marike L; Smith, Timothy R; Arnaout, Omar

    2018-01-01

    Machine learning (ML) is a branch of artificial intelligence that allows computers to learn from large complex datasets without being explicitly programmed. Although ML is already widely manifest in our daily lives in various forms, the considerable potential of ML has yet to find its way into mainstream medical research and day-to-day clinical care. The complex diagnostic and therapeutic modalities used in neurosurgery provide a vast amount of data that is ideally suited for ML models. This systematic review explores ML's potential to assist and improve neurosurgical care. A systematic literature search was performed in the PubMed and Embase databases to identify all potentially relevant studies up to January 1, 2017. All studies were included that evaluated ML models assisting neurosurgical treatment. Of the 6,402 citations identified, 221 studies were selected after subsequent title/abstract and full-text screening. In these studies, ML was used to assist surgical treatment of patients with epilepsy, brain tumors, spinal lesions, neurovascular pathology, Parkinson's disease, traumatic brain injury, and hydrocephalus. Across multiple paradigms, ML was found to be a valuable tool for presurgical planning, intraoperative guidance, neurophysiological monitoring, and neurosurgical outcome prediction. ML has started to find applications aimed at improving neurosurgical care by increasing the efficiency and precision of perioperative decision-making. A thorough validation of specific ML models is essential before implementation in clinical neurosurgical care. To bridge the gap between research and clinical care, practical and ethical issues should be considered parallel to the development of these techniques.

  2. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  3. Geminivirus data warehouse: a database enriched with machine learning approaches.

    Science.gov (United States)

    Silva, Jose Cleydson F; Carvalho, Thales F M; Basso, Marcos F; Deguchi, Michihito; Pereira, Welison A; Sobrinho, Roberto R; Vidigal, Pedro M P; Brustolini, Otávio J B; Silva, Fabyano F; Dal-Bianco, Maximiller; Fontes, Renildes L F; Santos, Anésia A; Zerbini, Francisco Murilo; Cerqueira, Fabio R; Fontes, Elizabeth P B

    2017-05-05

    The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.

  4. Separation of pulsar signals from noise using supervised machine learning algorithms

    Science.gov (United States)

    Bethapudi, S.; Desai, S.

    2018-04-01

    We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP), Adaboost, Gradient Boosting Classifier (GBC), and XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pipeline. This dataset was previously used for the cross-validation of the SPINN-based machine learning engine, obtained from the reprocessing of the HTRU-S survey data (Morello et al., 2014). We have used the Synthetic Minority Over-sampling Technique (SMOTE) to deal with high-class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean for both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in Morello et al. (2014), for the same recall value.

  5. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  6. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Louppe, Gilles; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu

    2012-01-01

    Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings....

  7. On-the-Fly Machine Learning of Atomic Potential in Density Functional Theory Structure Optimization

    Science.gov (United States)

    Jacobsen, T. L.; Jørgensen, M. S.; Hammer, B.

    2018-01-01

    Machine learning (ML) is used to derive local stability information for density functional theory calculations of systems in relation to the recently discovered SnO2 (110 )-(4 ×1 ) reconstruction. The ML model is trained on (structure, total energy) relations collected during global minimum energy search runs with an evolutionary algorithm (EA). While being built, the ML model is used to guide the EA, thereby speeding up the overall rate by which the EA succeeds. Inspection of the local atomic potentials emerging from the model further shows chemically intuitive patterns.

  8. Enhancement of plant metabolite fingerprinting by machine learning.

    Science.gov (United States)

    Scott, Ian M; Vermeer, Cornelia P; Liakata, Maria; Corol, Delia I; Ward, Jane L; Lin, Wanchang; Johnson, Helen E; Whitehead, Lynne; Kular, Baldeep; Baker, John M; Walsh, Sean; Dave, Anuja; Larson, Tony R; Graham, Ian A; Wang, Trevor L; King, Ross D; Draper, John; Beale, Michael H

    2010-08-01

    Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by (1)H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, (1)H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted.

  9. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  10. A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

    Directory of Open Access Journals (Sweden)

    Li Zhen

    2008-05-01

    Full Text Available Abstract Background Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML methods. Results The classification performance of Artificial Neural Networks (ANN, K-Nearest Neighbors (KNN, Linear Discriminant Analysis (LDA, Naïve Bayes (NB, Recursive Partitioning and Regression Trees (RPART, and Support Vector Machines (SVM in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5 were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion We have developed a novel simulation model to evaluate machine learning methods for the

  11. Application of Machine Learning Techniques in Aquaculture

    OpenAIRE

    Rahman, Akhlaqur; Tasnim, Sumaira

    2014-01-01

    In this paper we present applications of different machine learning algorithms in aquaculture. Machine learning algorithms learn models from historical data. In aquaculture historical data are obtained from farm practices, yields, and environmental data sources. Associations between these different variables can be obtained by applying machine learning algorithms to historical data. In this paper we present applications of different machine learning algorithms in aquaculture applications.

  12. Machine Learning and Applied Linguistics

    OpenAIRE

    Vajjala, Sowmya

    2018-01-01

    This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.

  13. TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning

    OpenAIRE

    Tang, Yuan

    2016-01-01

    TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to simplify the process of creating, configuring, training, evaluating, and experimenting a machine learning model. TF.Learn integrates a wide range of state-of-art machine learning algorithms built on top of TensorFlow's low level APIs for small to large-scale supervised and unsupervised problems. This module focuses on bringing machine learning t...

  14. Machine learning & artificial intelligence in the quantum domain: a review of recent progress.

    Science.gov (United States)

    Dunjko, Vedran; Briegel, Hans J

    2018-03-05

    Quantum information technologies, on the one hand, and intelligent learning systems, on the other, are both emergent technologies that are likely to have a transformative impact on our society in the future. The respective underlying fields of basic research-quantum information versus machine learning (ML) and artificial intelligence (AI)-have their own specific questions and challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been probing the question of the extent to which these fields can indeed learn and benefit from each other. Quantum ML explores the interaction between quantum computing and ML, investigating how results and techniques from one field can be used to solve the problems of the other. Recently we have witnessed significant breakthroughs in both directions of influence. For instance, quantum computing is finding a vital application in providing speed-ups for ML problems, critical in our 'big data' world. Conversely, ML already permeates many cutting-edge technologies and may become instrumental in advanced quantum technologies. Aside from quantum speed-up in data analysis, or classical ML optimization used in quantum experiments, quantum enhancements have also been (theoretically) demonstrated for interactive learning tasks, highlighting the potential of quantum-enhanced learning agents. Finally, works exploring the use of AI for the very design of quantum experiments and for performing parts of genuine research autonomously, have reported their first successes. Beyond the topics of mutual enhancement-exploring what ML/AI can do for quantum physics and vice versa-researchers have also broached the fundamental issue of quantum generalizations of learning and AI concepts. This deals with questions of the very meaning of learning and intelligence in a world that is fully described by quantum mechanics. In this review, we describe the main ideas, recent developments and

  15. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  16. Machine-Learning Research

    OpenAIRE

    Dietterich, Thomas G.

    1997-01-01

    Machine-learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (1) the improvement of classification accuracy by learning ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

  17. An introduction to machine learning with Scikit-Learn

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  18. Machine-Learning-Based Future Received Signal Strength Prediction Using Depth Images for mmWave Communications

    OpenAIRE

    Okamoto, Hironao; Nishio, Takayuki; Nakashima, Kota; Koda, Yusuke; Yamamoto, Koji; Morikura, Masahiro; Asai, Yusuke; Miyatake, Ryo

    2018-01-01

    This paper discusses a machine-learning (ML)-based future received signal strength (RSS) prediction scheme using depth camera images for millimeter-wave (mmWave) networks. The scheme provides the future RSS prediction of any mmWave links within the camera's view, including links where nodes are not transmitting frames. This enables network controllers to conduct network operations before line-of-sight path blockages degrade the RSS. Using the ML techniques, the prediction scheme automatically...

  19. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  20. Machine learning systems

    Energy Technology Data Exchange (ETDEWEB)

    Forsyth, R

    1984-05-01

    With the dramatic rise of expert systems has come a renewed interest in the fuel that drives them-knowledge. For it is specialist knowledge which gives expert systems their power. But extracting knowledge from human experts in symbolic form has proved arduous and labour-intensive. So the idea of machine learning is enjoying a renaissance. Machine learning is any automatic improvement in the performance of a computer system over time, as a result of experience. Thus a learning algorithm seeks to do one or more of the following: cover a wider range of problems, deliver more accurate solutions, obtain answers more cheaply, and simplify codified knowledge. 6 references.

  1. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  2. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  3. Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels

    Science.gov (United States)

    Dral, Pavlo O.; Owens, Alec; Yurchenko, Sergei N.; Thiel, Walter

    2017-06-01

    We present an efficient approach for generating highly accurate molecular potential energy surfaces (PESs) using self-correcting, kernel ridge regression (KRR) based machine learning (ML). We introduce structure-based sampling to automatically assign nuclear configurations from a pre-defined grid to the training and prediction sets, respectively. Accurate high-level ab initio energies are required only for the points in the training set, while the energies for the remaining points are provided by the ML model with negligible computational cost. The proposed sampling procedure is shown to be superior to random sampling and also eliminates the need for training several ML models. Self-correcting machine learning has been implemented such that each additional layer corrects errors from the previous layer. The performance of our approach is demonstrated in a case study on a published high-level ab initio PES of methyl chloride with 44 819 points. The ML model is trained on sets of different sizes and then used to predict the energies for tens of thousands of nuclear configurations within seconds. The resulting datasets are utilized in variational calculations of the vibrational energy levels of CH3Cl. By using both structure-based sampling and self-correction, the size of the training set can be kept small (e.g., 10% of the points) without any significant loss of accuracy. In ab initio rovibrational spectroscopy, it is thus possible to reduce the number of computationally costly electronic structure calculations through structure-based sampling and self-correcting KRR-based machine learning by up to 90%.

  4. Machine Learning an algorithmic perspective

    CERN Document Server

    Marsland, Stephen

    2009-01-01

    Traditional books on machine learning can be divided into two groups - those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement le

  5. Machine learning improves the accuracy of myocardial perfusion scintigraphy results

    International Nuclear Information System (INIS)

    Groselj, C.; Kukar, M.

    2002-01-01

    Objective: Machine learning (ML) an artificial intelligence method has in last decade proved to be an useful tool in many fields of decision making, also in some fields of medicine. By reports, its decision accuracy usually exceeds the human one. Aim: To assess applicability of ML in interpretation of the stress myocardial perfusion scintigraphy results in coronary artery disease diagnostic process. Patients and methods: The 327 patient's data of planar stress myocardial perfusion scintigraphy were reevaluated in usual way. Comparing them with the results of coronary angiography the sensitivity, specificity and accuracy of the investigation were computed. The data were digitized and the decision procedure repeated by ML program 'Naive Bayesian classifier'. As the ML is able to simultaneously manipulate with whatever number of data, all reachable disease connected data (regarding history, habitus, risk factors, stress results) were added. The sensitivity, specificity and accuracy of scintigraphy were expressed in this way. The results of both decision procedures were compared. Conclusion: Using ML method, 19 more patients out of 327 (5.8%) were correctly diagnosed by stress myocardial perfusion scintigraphy. In this way ML could be an important tool for myocardial perfusion scintigraphy decision making

  6. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  7. The Three Pillars of Machine Programming

    OpenAIRE

    Gottschlich, Justin; Solar-Lezama, Armando; Tatbul, Nesime; Carbin, Michael; Rinard, Martin; Barzilay, Regina; Amarasinghe, Saman; Tenenbaum, Joshua B; Mattson, Tim

    2018-01-01

    In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in t...

  8. The influence of negative training set size on machine learning-based virtual screening.

    Science.gov (United States)

    Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J

    2014-01-01

    The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.

  9. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  10. Machine learning for medical ultrasound: status, methods, and future opportunities.

    Science.gov (United States)

    Brattain, Laura J; Telfer, Brian A; Dhyani, Manish; Grajo, Joseph R; Samir, Anthony E

    2018-04-01

    Ultrasound (US) imaging is the most commonly performed cross-sectional diagnostic imaging modality in the practice of medicine. It is low-cost, non-ionizing, portable, and capable of real-time image acquisition and display. US is a rapidly evolving technology with significant challenges and opportunities. Challenges include high inter- and intra-operator variability and limited image quality control. Tremendous opportunities have arisen in the last decade as a result of exponential growth in available computational power coupled with progressive miniaturization of US devices. As US devices become smaller, enhanced computational capability can contribute significantly to decreasing variability through advanced image processing. In this paper, we review leading machine learning (ML) approaches and research directions in US, with an emphasis on recent ML advances. We also present our outlook on future opportunities for ML techniques to further improve clinical workflow and US-based disease diagnosis and characterization.

  11. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  12. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  13. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  14. Quantum Machine Learning

    Science.gov (United States)

    Biswas, Rupak

    2018-01-01

    Quantum computing promises an unprecedented ability to solve intractable problems by harnessing quantum mechanical effects such as tunneling, superposition, and entanglement. The Quantum Artificial Intelligence Laboratory (QuAIL) at NASA Ames Research Center is the space agency's primary facility for conducting research and development in quantum information sciences. QuAIL conducts fundamental research in quantum physics but also explores how best to exploit and apply this disruptive technology to enable NASA missions in aeronautics, Earth and space sciences, and space exploration. At the same time, machine learning has become a major focus in computer science and captured the imagination of the public as a panacea to myriad big data problems. In this talk, we will discuss how classical machine learning can take advantage of quantum computing to significantly improve its effectiveness. Although we illustrate this concept on a quantum annealer, other quantum platforms could be used as well. If explored fully and implemented efficiently, quantum machine learning could greatly accelerate a wide range of tasks leading to new technologies and discoveries that will significantly change the way we solve real-world problems.

  15. Lamb wave based automatic damage detection using matching pursuit and machine learning

    International Nuclear Information System (INIS)

    Agarwal, Sushant; Mitra, Mira

    2014-01-01

    In this study, matching pursuit (MP) has been tested with machine learning algorithms such as artificial neural networks (ANNs) and support vector machines (SVMs) to automate the process of damage detection in metallic plates. Here, damage detection is done using the Lamb wave response in a thin aluminium plate simulated using a finite element (FE) method. To reduce the complexity of the Lamb wave response, only the A 0 mode is excited and sensed. The procedure adopted for damage detection consists of three major steps, involving signal processing and machine learning (ML). In the first step, MP is used for de-noising and enhancing the sparsity of the database. In the existing literature, MP is used to decompose any signal into a linear combination of waveforms that are selected from a redundant dictionary. In this work, MP is deployed in two stages to make the database sparse as well as to de-noise it. After using MP on the database, it is then passed as input data for ML classifiers. ANN and SVM are used to detect the location of the potential damage from the reduced data. The study demonstrates that the SVM is a robust classifier in the presence of noise and is more efficient than the ANN. Out-of-sample data are used for the validation of the trained and tested classifier. Trained classifiers are found to be successful in the detection of damage with a detection rate of more than 95%. (paper)

  16. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  17. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  18. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  19. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  20. Machine Learning in Medicine

    Science.gov (United States)

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  1. Machine learning techniques for optical communication system optimization

    DEFF Research Database (Denmark)

    Zibar, Darko; Wass, Jesper; Thrane, Jakob

    In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction.......In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction....

  2. Machine vision systems using machine learning for industrial product inspection

    Science.gov (United States)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  3. Multivariate Analysis and Machine Learning in Cerebral Palsy Research.

    Science.gov (United States)

    Zhang, Jing

    2017-01-01

    Cerebral palsy (CP), a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML) approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.

  4. Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.

    Science.gov (United States)

    Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad M; Qureshi, Waqas T; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J; Al-Mallah, Mouaz H

    2017-12-19

    Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness

  5. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  6. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  7. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior......Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...

  8. Learning About Climate and Atmospheric Models Through Machine Learning

    Science.gov (United States)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  9. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  10. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    International Nuclear Information System (INIS)

    Dral, Pavlo O.; Lilienfeld, O. Anatole von; Thiel, Walter

    2015-01-01

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempirical OM2 method using a set of 6095 constitutional isomers C 7 H 10 O 2 , for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules

  11. Book review: A first course in Machine Learning

    DEFF Research Database (Denmark)

    Ortiz-Arroyo, Daniel

    2016-01-01

    "The new edition of A First Course in Machine Learning by Rogers and Girolami is an excellent introduction to the use of statistical methods in machine learning. The book introduces concepts such as mathematical modeling, inference, and prediction, providing ‘just in time’ the essential background...... to change models and parameter values to make [it] easier to understand and apply these models in real applications. The authors [also] introduce more advanced, state-of-the-art machine learning methods, such as Gaussian process models and advanced mixture models, which are used across machine learning....... This makes the book interesting not only to students with little or no background in machine learning but also to more advanced graduate students interested in statistical approaches to machine learning." —Daniel Ortiz-Arroyo, Associate Professor, Aalborg University Esbjerg, Denmark...

  12. Machine learning methods for metabolic pathway prediction

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2010-01-01

    Full Text Available Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

  13. Machine learning methods for metabolic pathway prediction

    Science.gov (United States)

    2010-01-01

    Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. PMID:20064214

  14. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  15. Classifying bent radio galaxies from a mixture of point-like/extended images with Machine Learning.

    Science.gov (United States)

    Bastien, David; Oozeer, Nadeem; Somanah, Radhakrishna

    2017-05-01

    The hypothesis that bent radio sources are supposed to be found in rich, massive galaxy clusters and the avalibility of huge amount of data from radio surveys have fueled our motivation to use Machine Learning (ML) to identify bent radio sources and as such use them as tracers for galaxy clusters. The shapelet analysis allowed us to decompose radio images into 256 features that could be fed into the ML algorithm. Additionally, ideas from the field of neuro-psychology helped us to consider training the machine to identify bent galaxies at different orientations. From our analysis, we found that the Random Forest algorithm was the most effective with an accuracy rate of 92% for a classification of point and extended sources as well as an accuracy of 80% for bent and unbent classification.

  16. Using Machine Learning to Advance Personality Assessment and Theory.

    Science.gov (United States)

    Bleidorn, Wiebke; Hopwood, Christopher James

    2018-05-01

    Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have focused on the associations between social media and other digital records with established personality measures. The goal of this article is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation, and provide recommendations for how to use machine learning to advance our understanding of personality.

  17. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  18. Yield curve and Recession Forecasting in a Machine Learning Framework

    OpenAIRE

    Theophilos Papadimitriou; Periklis Gogas; Maria Matthaiou; Efthymia Chrysanthidou

    2014-01-01

    In this paper, we investigate the forecasting ability of the yield curve in terms of the U.S. real GDP cycle. More specifically, within a Machine Learning (ML) framework, we use data from a variety of short (treasury bills) and long term interest rates (bonds) for the period from 1976:Q3 to 2011:Q4 in conjunction with the real GDP for the same period, to create a model that can successfully forecast output fluctuations (inflation and output gaps) around its long-run trend. We focus our attent...

  19. Gaussian processes for machine learning.

    Science.gov (United States)

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

  20. MLnet report: training in Europe on machine learning

    OpenAIRE

    Ellebrecht, Mario; Morik, Katharina

    1999-01-01

    Machine learning techniques offer opportunities for a variety of applications and the theory of machine learning investigates problems that are of interest for other fields of computer science (e.g., complexity theory, logic programming, pattern recognition). However, the impacts of machine learning can only be recognized by those who know the techniques and are able to apply them. Hence, teaching machine learning is necessary before this field can diversify computer science. In order ...

  1. An introduction to quantum machine learning

    Science.gov (United States)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  2. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  3. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

    Science.gov (United States)

    Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

    2011-01-01

    Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433

  4. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM...

  5. Considerations upon the Machine Learning Technologies

    OpenAIRE

    Alin Munteanu; Cristina Ofelia Sofran

    2006-01-01

    Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  6. Machine learning of molecular properties: Locality and active learning

    Science.gov (United States)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  7. Clojure for machine learning

    CERN Document Server

    Wali, Akhil

    2014-01-01

    A book that brings out the strengths of Clojure programming that have to facilitate machine learning. Each topic is described in substantial detail, and examples and libraries in Clojure are also demonstrated.This book is intended for Clojure developers who want to explore the area of machine learning. Basic understanding of the Clojure programming language is required, but thorough acquaintance with the standard Clojure library or any libraries are not required. Familiarity with theoretical concepts and notation of mathematics and statistics would be an added advantage.

  8. Considerations upon the Machine Learning Technologies

    Directory of Open Access Journals (Sweden)

    Alin Munteanu

    2006-01-01

    Full Text Available Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  9. Multivariate Analysis and Machine Learning in Cerebral Palsy Research

    Directory of Open Access Journals (Sweden)

    Jing Zhang

    2017-12-01

    Full Text Available Cerebral palsy (CP, a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.

  10. Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology.

    Science.gov (United States)

    Min, Hua; Mobahi, Hedyeh; Irvin, Katherine; Avramovic, Sanja; Wojtusiak, Janusz

    2017-09-16

    Bio-ontologies are becoming increasingly important in knowledge representation and in the machine learning (ML) fields. This paper presents a ML approach that incorporates bio-ontologies and its application to the SEER-MHOS dataset to discover patterns of patient characteristics that impact the ability to perform activities of daily living (ADLs). Bio-ontologies are used to provide computable knowledge for ML methods to "understand" biomedical data. This retrospective study included 723 cancer patients from the SEER-MHOS dataset. Two ML methods were applied to create predictive models for ADL disabilities for the first year after a patient's cancer diagnosis. The first method is a standard rule learning algorithm; the second is that same algorithm additionally equipped with methods for reasoning with ontologies. The models showed that a patient's race, ethnicity, smoking preference, treatment plan and tumor characteristics including histology, staging, cancer site, and morphology were predictors for ADL performance levels one year after cancer diagnosis. The ontology-guided ML method was more accurate at predicting ADL performance levels (P ontologies. This study demonstrated that bio-ontologies can be harnessed to provide medical knowledge for ML algorithms. The presented method demonstrates that encoding specific types of hierarchical relationships to guide rule learning is possible, and can be extended to other types of semantic relationships present in biomedical ontologies. The ontology-guided ML method achieved better performance than the method without ontologies. The presented method can also be used to promote the effectiveness and efficiency of ML in healthcare, in which use of background knowledge and consistency with existing clinical expertise is critical.

  11. Using human brain activity to guide machine learning.

    Science.gov (United States)

    Fong, Ruth C; Scheirer, Walter J; Cox, David D

    2018-03-29

    Machine learning is a field of computer science that builds algorithms that learn. In many cases, machine learning algorithms are used to recreate a human ability like adding a caption to a photo, driving a car, or playing a game. While the human brain has long served as a source of inspiration for machine learning, little effort has been made to directly use data collected from working brains as a guide for machine learning algorithms. Here we demonstrate a new paradigm of "neurally-weighted" machine learning, which takes fMRI measurements of human brain activity from subjects viewing images, and infuses these data into the training process of an object recognition learning algorithm to make it more consistent with the human brain. After training, these neurally-weighted classifiers are able to classify images without requiring any additional neural data. We show that our neural-weighting approach can lead to large performance gains when used with traditional machine vision features, as well as to significant improvements with already high-performing convolutional neural network features. The effectiveness of this approach points to a path forward for a new class of hybrid machine learning algorithms which take both inspiration and direct constraints from neuronal data.

  12. A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES

    International Nuclear Information System (INIS)

    Xu Xiaoying; Ho, Shirley; Trac, Hy; Schneider, Jeff; Ntampaka, Michelle; Poczos, Barnabas

    2013-01-01

    We investigate machine learning (ML) techniques for predicting the number of galaxies (N gal ) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N gal . In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N gal by training our algorithms on the following six halo properties: number of particles, M 200 , σ v , v max , half-mass radius, and spin. For Millennium, our predicted N gal values have a mean-squared error (MSE) of ∼0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to ∼5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N gal . Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M star , low M star ). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs

  13. Machine learning applications in genetics and genomics.

    Science.gov (United States)

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  14. Teaching machine learning to design students

    NARCIS (Netherlands)

    Vlist, van der B.J.J.; van de Westelaken, H.F.M.; Bartneck, C.; Hu, J.; Ahn, R.M.C.; Barakova, E.I.; Delbressine, F.L.M.; Feijs, L.M.G.; Pan, Z.; Zhang, X.; El Rhalibi, A.

    2008-01-01

    Machine learning is a key technology to design and create intelligent systems, products, and related services. Like many other design departments, we are faced with the challenge to teach machine learning to design students, who often do not have an inherent affinity towards technology. We

  15. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology

    Directory of Open Access Journals (Sweden)

    Ashley I. Heinson

    2017-02-01

    Full Text Available Reverse vaccinology (RV is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML techniques to distinguish bacterial protective antigens (BPAs from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM classifier that could discriminate BPAs (n = 200 from non-BPAs (n = 200 with an area under the curve (AUC of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  16. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology

    KAUST Repository

    Heinson, Ashley

    2017-02-01

    Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  17. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology

    KAUST Repository

    Heinson, Ashley; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher

    2017-01-01

    Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  18. Adaptive Machine Aids to Learning.

    Science.gov (United States)

    Starkweather, John A.

    With emphasis on man-machine relationships and on machine evolution, computer-assisted instruction (CAI) is examined in this paper. The discussion includes the background of machine assistance to learning, the current status of CAI, directions of development, the development of criteria for successful instruction, meeting the needs of users,…

  19. Using Machine Learning to Predict Student Performance

    OpenAIRE

    Pojon, Murat

    2017-01-01

    This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance. Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification ...

  20. Improving the accuracy of myocardial perfusion scintigraphy results by machine learning method

    International Nuclear Information System (INIS)

    Groselj, C.; Kukar, M.

    2002-01-01

    Full text: Machine learning (ML) as rapidly growing artificial intelligence subfield has already proven in last decade to be a useful tool in many fields of decision making, also in some fields of medicine. Its decision accuracy usually exceeds the human one. To assess applicability of ML in interpretation the results of stress myocardial perfusion scintigraphy for CAD diagnosis. The 327 patient's data of planar stress myocardial perfusion scintigraphy were reevaluated in usual way. Comparing them with the results of coronary angiography the sensitivity, specificity and accuracy for the investigation was computed. The data were digitized and the decision procedure repeated by ML program 'Naive Bayesian classifier'. As the ML is able to simultaneously manipulate of whatever number of data, all reachable disease connected data (regarding history, habitus, risk factors, stress results) were added. The sensitivity, specificity and accuracy for scintigraphy were expressed in this way. The results of both decision procedures were compared. With ML method 19 patients more out of 327 (5.8 %) were correctly diagnosed by stress myocardial perfusion scintigraphy. ML could be an important tool for decision making in myocardial perfusion scintigraphy. (author)

  1. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  2. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  3. Promises of Machine Learning Approaches in Prediction of Absorption of Compounds.

    Science.gov (United States)

    Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar

    2018-01-01

    The Machine Learning (ML) is one of the fastest developing techniques in the prediction and evaluation of important pharmacokinetic properties such as absorption, distribution, metabolism and excretion. The availability of a large number of robust validation techniques for prediction models devoted to pharmacokinetics has significantly enhanced the trust and authenticity in ML approaches. There is a series of prediction models generated and used for rapid screening of compounds on the basis of absorption in last one decade. Prediction of absorption of compounds using ML models has great potential across the pharmaceutical industry as a non-animal alternative to predict absorption. However, these prediction models still have to go far ahead to develop the confidence similar to conventional experimental methods for estimation of drug absorption. Some of the general concerns are selection of appropriate ML methods and validation techniques in addition to selecting relevant descriptors and authentic data sets for the generation of prediction models. The current review explores published models of ML for the prediction of absorption using physicochemical properties as descriptors and their important conclusions. In addition, some critical challenges in acceptance of ML models for absorption are also discussed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  5. Research on machine learning framework based on random forest algorithm

    Science.gov (United States)

    Ren, Qiong; Cheng, Hui; Han, Hai

    2017-03-01

    With the continuous development of machine learning, industry and academia have released a lot of machine learning frameworks based on distributed computing platform, and have been widely used. However, the existing framework of machine learning is limited by the limitations of machine learning algorithm itself, such as the choice of parameters and the interference of noises, the high using threshold and so on. This paper introduces the research background of machine learning framework, and combined with the commonly used random forest algorithm in machine learning classification algorithm, puts forward the research objectives and content, proposes an improved adaptive random forest algorithm (referred to as ARF), and on the basis of ARF, designs and implements the machine learning framework.

  6. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    Science.gov (United States)

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  7. Statistical and Machine Learning forecasting methods: Concerns and ways forward.

    Science.gov (United States)

    Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.

  8. BENCHMARKING MACHINE LEARNING TECHNIQUES FOR SOFTWARE DEFECT DETECTION

    OpenAIRE

    Saiqa Aleem; Luiz Fernando Capretz; Faheem Ahmed

    2015-01-01

    Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques help developers to retrieve useful information after the classification and enable them to analyse data...

  9. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  10. Machine learning algorithms for the creation of clinical healthcare enterprise systems

    Science.gov (United States)

    Mandal, Indrajit

    2017-10-01

    Clinical recommender systems are increasingly becoming popular for improving modern healthcare systems. Enterprise systems are persuasively used for creating effective nurse care plans to provide nurse training, clinical recommendations and clinical quality control. A novel design of a reliable clinical recommender system based on multiple classifier system (MCS) is implemented. A hybrid machine learning (ML) ensemble based on random subspace method and random forest is presented. The performance accuracy and robustness of proposed enterprise architecture are quantitatively estimated to be above 99% and 97%, respectively (above 95% confidence interval). The study then extends to experimental analysis of the clinical recommender system with respect to the noisy data environment. The ranking of items in nurse care plan is demonstrated using machine learning algorithms (MLAs) to overcome the drawback of the traditional association rule method. The promising experimental results are compared against the sate-of-the-art approaches to highlight the advancement in recommendation technology. The proposed recommender system is experimentally validated using five benchmark clinical data to reinforce the research findings.

  11. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  12. Machine learning enhanced optical distance sensor

    Science.gov (United States)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  13. An introduction to quantum machine learning

    OpenAIRE

    Schuld, M.; Sinayskiy, I.; Petruccione, F.

    2014-01-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum compute...

  14. Machine learning, social learning and the governance of self-driving cars.

    Science.gov (United States)

    Stilgoe, Jack

    2018-02-01

    Self-driving cars, a quintessentially 'smart' technology, are not born smart. The algorithms that control their movements are learning as the technology emerges. Self-driving cars represent a high-stakes test of the powers of machine learning, as well as a test case for social learning in technology governance. Society is learning about the technology while the technology learns about society. Understanding and governing the politics of this technology means asking 'Who is learning, what are they learning and how are they learning?' Focusing on the successes and failures of social learning around the much-publicized crash of a Tesla Model S in 2016, I argue that trajectories and rhetorics of machine learning in transport pose a substantial governance challenge. 'Self-driving' or 'autonomous' cars are misnamed. As with other technologies, they are shaped by assumptions about social needs, solvable problems, and economic opportunities. Governing these technologies in the public interest means improving social learning by constructively engaging with the contingencies of machine learning.

  15. Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database.

    Science.gov (United States)

    Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee

    2017-07-01

    Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.

  16. Trends in Machine Learning for Signal Processing

    DEFF Research Database (Denmark)

    Adali, Tulay; Miller, David J.; Diamantaras, Konstantinos I.

    2011-01-01

    By putting the accent on learning from the data and the environment, the Machine Learning for SP (MLSP) Technical Committee (TC) provides the essential bridge between the machine learning and SP communities. While the emphasis in MLSP is on learning and data-driven approaches, SP defines the main...... applications of interest, and thus the constraints and requirements on solutions, which include computational efficiency, online adaptation, and learning with limited supervision/reference data....

  17. Downscaling Coarse Scale Microwave Soil Moisture Product using Machine Learning

    Science.gov (United States)

    Abbaszadeh, P.; Moradkhani, H.; Yan, H.

    2016-12-01

    Soil moisture (SM) is a key variable in partitioning and examining the global water-energy cycle, agricultural planning, and water resource management. It is also strongly coupled with climate change, playing an important role in weather forecasting and drought monitoring and prediction, flood modeling and irrigation management. Although satellite retrievals can provide an unprecedented information of soil moisture at a global-scale, the products might be inadequate for basin scale study or regional assessment. To improve the spatial resolution of SM, this work presents a novel approach based on Machine Learning (ML) technique that allows for downscaling of the satellite soil moisture to fine resolution. For this purpose, the SMAP L-band radiometer SM products were used and conditioned on the Variable Infiltration Capacity (VIC) model prediction to describe the relationship between the coarse and fine scale soil moisture data. The proposed downscaling approach was applied to a western US basin and the products were compared against the available SM data from in-situ gauge stations. The obtained results indicated a great potential of the machine learning technique to derive the fine resolution soil moisture information that is currently used for land data assimilation applications.

  18. Status Checking System of Home Appliances using machine learning

    Directory of Open Access Journals (Sweden)

    Yoon Chi-Yurl

    2017-01-01

    Full Text Available This paper describes status checking system of home appliances based on machine learning, which can be applied to existing household appliances without networking function. Designed status checking system consists of sensor modules, a wireless communication module, cloud server, android application and a machine learning algorithm. The developed system applied to washing machine analyses and judges the four-kinds of appliance’s status such as staying, washing, rinsing and spin-drying. The measurements of sensor and transmission of sensing data are operated on an Arduino board and the data are transmitted to cloud server in real time. The collected data are parsed by an Android application and injected into the machine learning algorithm for learning the status of the appliances. The machine learning algorithm compares the stored learning data with collected real-time data from the appliances. Our results are expected to contribute as a base technology to design an automatic control system based on machine learning technology for household appliances in real-time.

  19. Trustless Machine Learning Contracts; Evaluating and Exchanging Machine Learning Models on the Ethereum Blockchain

    OpenAIRE

    Kurtulmus, A. Besir; Daniel, Kenny

    2018-01-01

    Using blockchain technology, it is possible to create contracts that offer a reward in exchange for a trained machine learning model for a particular data set. This would allow users to train machine learning models for a reward in a trustless manner. The smart contract will use the blockchain to automatically validate the solution, so there would be no debate about whether the solution was correct or not. Users who submit the solutions won't have counterparty risk that they won't get paid fo...

  20. A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Xu Xiaoying; Ho, Shirley; Trac, Hy; Schneider, Jeff; Ntampaka, Michelle [McWilliams Center for Cosmology, Department of Physics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (United States); Poczos, Barnabas [School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (United States)

    2013-08-01

    We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.

  1. Machine learning in cardiovascular medicine: are we there yet?

    Science.gov (United States)

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  2. Machine Learning: developing an image recognition program : with Python, Scikit Learn and OpenCV

    OpenAIRE

    Nguyen, Minh

    2016-01-01

    Machine Learning is one of the most debated topic in computer world these days, especially after the first Computer Go program has beaten human Go world champion. Among endless application of Machine Learning, image recognition, which problem is processing enormous amount of data from dynamic input. This thesis will present the basic concept of Machine Learning, Machine Learning algorithms, Python programming language and Scikit Learn – a simple and efficient tool for data analysis in P...

  3. Bypassing the Kohn-Sham equations with machine learning.

    Science.gov (United States)

    Brockherde, Felix; Vogt, Leslie; Li, Li; Tuckerman, Mark E; Burke, Kieron; Müller, Klaus-Robert

    2017-10-11

    Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields. Machine learning holds the promise of learning the energy functional via examples, bypassing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing larger systems and/or longer time-scales to be tackled, but attempts to machine-learn this functional have been limited by the need to find its derivative. The present work overcomes this difficulty by directly learning the density-potential and energy-density maps for test systems and various molecules. We perform the first molecular dynamics simulation with a machine-learned density functional on malonaldehyde and are able to capture the intramolecular proton transfer process. Learning density models now allows the construction of accurate density functionals for realistic molecular systems.Machine learning allows electronic structure calculations to access larger system sizes and, in dynamical simulations, longer time scales. Here, the authors perform such a simulation using a machine-learned density functional that avoids direct solution of the Kohn-Sham equations.

  4. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  5. Machine learning in radiation oncology theory and applications

    CERN Document Server

    El Naqa, Issam; Murphy, Martin J

    2015-01-01

    ​This book provides a complete overview of the role of machine learning in radiation oncology and medical physics, covering basic theory, methods, and a variety of applications in medical physics and radiotherapy. An introductory section explains machine learning, reviews supervised and unsupervised learning methods, discusses performance evaluation, and summarizes potential applications in radiation oncology. Detailed individual sections are then devoted to the use of machine learning in quality assurance; computer-aided detection, including treatment planning and contouring; image-guided rad

  6. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  7. MLBCD: a machine learning tool for big clinical data.

    Science.gov (United States)

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  8. Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM).

    Science.gov (United States)

    Dipnall, J F; Pasco, J A; Berk, M; Williams, L J; Dodd, S; Jacka, F N; Meyer, D

    2017-01-01

    Key lifestyle-environ risk factors are operative for depression, but it is unclear how risk factors cluster. Machine-learning (ML) algorithms exist that learn, extract, identify and map underlying patterns to identify groupings of depressed individuals without constraints. The aim of this research was to use a large epidemiological study to identify and characterise depression clusters through "Graphing lifestyle-environs using machine-learning methods" (GLUMM). Two ML algorithms were implemented: unsupervised Self-organised mapping (SOM) to create GLUMM clusters and a supervised boosted regression algorithm to describe clusters. Ninety-six "lifestyle-environ" variables were used from the National health and nutrition examination study (2009-2010). Multivariate logistic regression validated clusters and controlled for possible sociodemographic confounders. The SOM identified two GLUMM cluster solutions. These solutions contained one dominant depressed cluster (GLUMM5-1, GLUMM7-1). Equal proportions of members in each cluster rated as highly depressed (17%). Alcohol consumption and demographics validated clusters. Boosted regression identified GLUMM5-1 as more informative than GLUMM7-1. Members were more likely to: have problems sleeping; unhealthy eating; ≤2 years in their home; an old home; perceive themselves underweight; exposed to work fumes; experienced sex at ≤14 years; not perform moderate recreational activities. A positive relationship between GLUMM5-1 (OR: 7.50, Pdepression was found, with significant interactions with those married/living with partner (P=0.001). Using ML based GLUMM to form ordered depressive clusters from multitudinous lifestyle-environ variables enabled a deeper exploration of the heterogeneous data to uncover better understandings into relationships between the complex mental health factors. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  9. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    Science.gov (United States)

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-12-01

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  10. Machine learning and medicine: book review and commentary.

    Science.gov (United States)

    Koprowski, Robert; Foster, Kenneth R

    2018-02-01

    This article is a review of the book "Master machine learning algorithms, discover how they work and implement them from scratch" (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com . An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book.

  11. Deep learning versus traditional machine learning methods for aggregated energy demand prediction

    NARCIS (Netherlands)

    Paterakis, N.G.; Mocanu, E.; Gibescu, M.; Stappers, B.; van Alst, W.

    2018-01-01

    In this paper the more advanced, in comparison with traditional machine learning approaches, deep learning methods are explored with the purpose of accurately predicting the aggregated energy consumption. Despite the fact that a wide range of machine learning methods have been applied to

  12. Learning Activity Packets for Milling Machines. Unit I--Introduction to Milling Machines.

    Science.gov (United States)

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to identify parts and attachments of vertical and horizontal milling machines, identify work-holding devices, state safety rules, and…

  13. Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population.

    Science.gov (United States)

    Arsanjani, Reza; Dey, Damini; Khachatryan, Tigran; Shalev, Aryeh; Hayes, Sean W; Fish, Mathews; Nakanishi, Rine; Germano, Guido; Berman, Daniel S; Slomka, Piotr

    2015-10-01

    We aimed to investigate if early revascularization in patients with suspected coronary artery disease can be effectively predicted by integrating clinical data and quantitative image features derived from perfusion SPECT (MPS) by machine learning (ML) approach. 713 rest (201)Thallium/stress (99m)Technetium MPS studies with correlating invasive angiography with 372 revascularization events (275 PCI/97 CABG) within 90 days after MPS (91% within 30 days) were considered. Transient ischemic dilation, stress combined supine/prone total perfusion deficit (TPD), supine rest and stress TPD, exercise ejection fraction, and end-systolic volume, along with clinical parameters including patient gender, history of hypertension and diabetes mellitus, ST-depression on baseline ECG, ECG and clinical response during stress, and post-ECG probability by boosted ensemble ML algorithm (LogitBoost) to predict revascularization events. These features were selected using an automated feature selection algorithm from all available clinical and quantitative data (33 parameters). Tenfold cross-validation was utilized to train and test the prediction model. The prediction of revascularization by ML algorithm was compared to standalone measures of perfusion and visual analysis by two experienced readers utilizing all imaging, quantitative, and clinical data. The sensitivity of machine learning (ML) (73.6% ± 4.3%) for prediction of revascularization was similar to one reader (73.9% ± 4.6%) and standalone measures of perfusion (75.5% ± 4.5%). The specificity of ML (74.7% ± 4.2%) was also better than both expert readers (67.2% ± 4.9% and 66.0% ± 5.0%, P < .05), but was similar to ischemic TPD (68.3% ± 4.9%, P < .05). The receiver operator characteristics areas under curve for ML (0.81 ± 0.02) was similar to reader 1 (0.81 ± 0.02) but superior to reader 2 (0.72 ± 0.02, P < .01) and standalone measure of perfusion (0.77 ± 0.02, P < .01). ML approach is comparable or better than

  14. Machine Learning-based Virtual Screening and Its Applications to Alzheimer's Drug Discovery: A Review.

    Science.gov (United States)

    Carpenter, Kristy A; Huang, Xudong

    2018-06-07

    Virtual Screening (VS) has emerged as an important tool in the drug development process, as it conducts efficient in silico searches over millions of compounds, ultimately increasing yields of potential drug leads. As a subset of Artificial Intelligence (AI), Machine Learning (ML) is a powerful way of conducting VS for drug leads. ML for VS generally involves assembling a filtered training set of compounds, comprised of known actives and inactives. After training the model, it is validated and, if sufficiently accurate, used on previously unseen databases to screen for novel compounds with desired drug target binding activity. The study aims to review ML-based methods used for VS and applications to Alzheimer's disease (AD) drug discovery. To update the current knowledge on ML for VS, we review thorough backgrounds, explanations, and VS applications of the following ML techniques: Naïve Bayes (NB), k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Random Forests (RF), and Artificial Neural Networks (ANN). All techniques have found success in VS, but the future of VS is likely to lean more heavily toward the use of neural networks - and more specifically, Convolutional Neural Networks (CNN), which are a subset of ANN that utilize convolution. We additionally conceptualize a work flow for conducting ML-based VS for potential therapeutics of for AD, a complex neurodegenerative disease with no known cure and prevention. This both serves as an example of how to apply the concepts introduced earlier in the review and as a potential workflow for future implementation. Different ML techniques are powerful tools for VS, and they have advantages and disadvantages albeit. ML-based VS can be applied to AD drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Machine Learning for Security

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  16. Organizational Learning Supported by Machine Learning Models Coupled with General Explanation Methods: A Case of B2B Sales Forecasting

    Directory of Open Access Journals (Sweden)

    Bohanec Marko

    2017-08-01

    Full Text Available Background and Purpose: The process of business to business (B2B sales forecasting is a complex decision-making process. There are many approaches to support this process, but mainly it is still based on the subjective judgment of a decision-maker. The problem of B2B sales forecasting can be modeled as a classification problem. However, top performing machine learning (ML models are black boxes and do not support transparent reasoning. The purpose of this research is to develop an organizational model using ML model coupled with general explanation methods. The goal is to support the decision-maker in the process of B2B sales forecasting.

  17. Machine learning in heart failure: ready for prime time.

    Science.gov (United States)

    Awan, Saqib Ejaz; Sohel, Ferdous; Sanfilippo, Frank Mario; Bennamoun, Mohammed; Dwivedi, Girish

    2018-03-01

    The aim of this review is to present an up-to-date overview of the application of machine learning methods in heart failure including diagnosis, classification, readmissions and medication adherence. Recent studies have shown that the application of machine learning techniques may have the potential to improve heart failure outcomes and management, including cost savings by improving existing diagnostic and treatment support systems. Recently developed deep learning methods are expected to yield even better performance than traditional machine learning techniques in performing complex tasks by learning the intricate patterns hidden in big medical data. The review summarizes the recent developments in the application of machine and deep learning methods in heart failure management.

  18. Novel Breast Imaging and Machine Learning: Predicting Breast Lesion Malignancy at Cone-Beam CT Using Machine Learning Techniques.

    Science.gov (United States)

    Uhlig, Johannes; Uhlig, Annemarie; Kunze, Meike; Beissbarth, Tim; Fischer, Uwe; Lotz, Joachim; Wienbeck, Susanne

    2018-05-24

    The purpose of this study is to evaluate the diagnostic performance of machine learning techniques for malignancy prediction at breast cone-beam CT (CBCT) and to compare them to human readers. Five machine learning techniques, including random forests, back propagation neural networks (BPN), extreme learning machines, support vector machines, and K-nearest neighbors, were used to train diagnostic models on a clinical breast CBCT dataset with internal validation by repeated 10-fold cross-validation. Two independent blinded human readers with profound experience in breast imaging and breast CBCT analyzed the same CBCT dataset. Diagnostic performance was compared using AUC, sensitivity, and specificity. The clinical dataset comprised 35 patients (American College of Radiology density type C and D breasts) with 81 suspicious breast lesions examined with contrast-enhanced breast CBCT. Forty-five lesions were histopathologically proven to be malignant. Among the machine learning techniques, BPNs provided the best diagnostic performance, with AUC of 0.91, sensitivity of 0.85, and specificity of 0.82. The diagnostic performance of the human readers was AUC of 0.84, sensitivity of 0.89, and specificity of 0.72 for reader 1 and AUC of 0.72, sensitivity of 0.71, and specificity of 0.67 for reader 2. AUC was significantly higher for BPN when compared with both reader 1 (p = 0.01) and reader 2 (p Machine learning techniques provide a high and robust diagnostic performance in the prediction of malignancy in breast lesions identified at CBCT. BPNs showed the best diagnostic performance, surpassing human readers in terms of AUC and specificity.

  19. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    Science.gov (United States)

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  20. Parsimonious Wavelet Kernel Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Wang Qin

    2015-11-01

    Full Text Available In this study, a parsimonious scheme for wavelet kernel extreme learning machine (named PWKELM was introduced by combining wavelet theory and a parsimonious algorithm into kernel extreme learning machine (KELM. In the wavelet analysis, bases that were localized in time and frequency to represent various signals effectively were used. Wavelet kernel extreme learning machine (WELM maximized its capability to capture the essential features in “frequency-rich” signals. The proposed parsimonious algorithm also incorporated significant wavelet kernel functions via iteration in virtue of Householder matrix, thus producing a sparse solution that eased the computational burden and improved numerical stability. The experimental results achieved from the synthetic dataset and a gas furnace instance demonstrated that the proposed PWKELM is efficient and feasible in terms of improving generalization accuracy and real time performance.

  1. Virtual screening for cytochromes p450: successes of machine learning filters.

    Science.gov (United States)

    Burton, Julien; Ijjaali, Ismail; Petitet, François; Michel, André; Vercauteren, Daniel P

    2009-05-01

    Cytochromes P450 (CYPs) are crucial targets when predicting the ADME properties (absorption, distribution, metabolism, and excretion) of drugs in development. Particularly, CYPs mediated drug-drug interactions are responsible for major failures in the drug design process. Accurate and robust screening filters are thus needed to predict interactions of potent compounds with CYPs as early as possible in the process. In recent years, more and more 3D structures of various CYP isoforms have been solved, opening the gate of accurate structure-based studies of interactions. Nevertheless, the ligand-based approach still remains popular. This success can be explained by the growing number of available data and the satisfying performances of existing machine learning (ML) methods. The aim of this contribution is to give an overview of the recent achievements in ML applications to CYP datasets. Particularly, popular methods such as support vector machine, decision trees, artificial neural networks, k-nearest neighbors, and partial least squares will be compared as well as the quality of the datasets and the descriptors used. Consensus of different methods will also be discussed. Often reaching 90% of accuracy, the models will be analyzed to highlight the key descriptors permitting the good prediction of CYPs binding.

  2. On the Conditioning of Machine-Learning-Assisted Turbulence Modeling

    Science.gov (United States)

    Wu, Jinlong; Sun, Rui; Wang, Qiqi; Xiao, Heng

    2017-11-01

    Recently, several researchers have demonstrated that machine learning techniques can be used to improve the RANS modeled Reynolds stress by training on available database of high fidelity simulations. However, obtaining improved mean velocity field remains an unsolved challenge, restricting the predictive capability of current machine-learning-assisted turbulence modeling approaches. In this work we define a condition number to evaluate the model conditioning of data-driven turbulence modeling approaches, and propose a stability-oriented machine learning framework to model Reynolds stress. Two canonical flows, the flow in a square duct and the flow over periodic hills, are investigated to demonstrate the predictive capability of the proposed framework. The satisfactory prediction performance of mean velocity field for both flows demonstrates the predictive capability of the proposed framework for machine-learning-assisted turbulence modeling. With showing the capability of improving the prediction of mean flow field, the proposed stability-oriented machine learning framework bridges the gap between the existing machine-learning-assisted turbulence modeling approaches and the demand of predictive capability of turbulence models in real applications.

  3. MACHINE LEARNING TECHNIQUES USED IN BIG DATA

    Directory of Open Access Journals (Sweden)

    STEFANIA LOREDANA NITA

    2016-07-01

    Full Text Available The classical tools used in data analysis are not enough in order to benefit of all advantages of big data. The amount of information is too large for a complete investigation, and the possible connections and relations between data could be missed, because it is difficult or even impossible to verify all assumption over the information. Machine learning is a great solution in order to find concealed correlations or relationships between data, because it runs at scale machine and works very well with large data sets. The more data we have, the more the machine learning algorithm is useful, because it “learns” from the existing data and applies the found rules on new entries. In this paper, we present some machine learning algorithms and techniques used in big data.

  4. Intellectual Property and Machine Learning: An exploratory study

    OpenAIRE

    Øverlier, Lasse

    2017-01-01

    Our research makes a contribution by exemplifying what controls the freedom-to-operate for a company operating in the area of machine learning. Through interviews we demonstrate the industry’s alternating viewpoints to whether copyrighted data used as input to machine learning systems should be viewed differently than copying the data for storage or reproduction. In addition we show that unauthorized use of copyrighted data in machine learning systems is hard to detect with the burden of proo...

  5. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  6. Machine Learning for Healthcare: On the Verge of a Major Shift in Healthcare Epidemiology.

    Science.gov (United States)

    Wiens, Jenna; Shenoy, Erica S

    2018-01-06

    The increasing availability of electronic health data presents a major opportunity in healthcare for both discovery and practical applications to improve healthcare. However, for healthcare epidemiologists to best use these data, computational techniques that can handle large complex datasets are required. Machine learning (ML), the study of tools and methods for identifying patterns in data, can help. The appropriate application of ML to these data promises to transform patient risk stratification broadly in the field of medicine and especially in infectious diseases. This, in turn, could lead to targeted interventions that reduce the spread of healthcare-associated pathogens. In this review, we begin with an introduction to the basics of ML. We then move on to discuss how ML can transform healthcare epidemiology, providing examples of successful applications. Finally, we present special considerations for those healthcare epidemiologists who want to use and apply ML. © The Author(s) 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  7. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    Science.gov (United States)

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  8. Machine Learning and Conflict Prediction: A Use Case

    Directory of Open Access Journals (Sweden)

    Chris Perry

    2013-10-01

    Full Text Available For at least the last two decades, the international community in general and the United Nations specifically have attempted to develop robust, accurate and effective conflict early warning system for conflict prevention. One potential and promising component of integrated early warning systems lies in the field of machine learning. This paper aims at giving conflict analysis a basic understanding of machine learning methodology as well as to test the feasibility and added value of such an approach. The paper finds that the selection of appropriate machine learning methodologies can offer substantial improvements in accuracy and performance. It also finds that even at this early stage in testing machine learning on conflict prediction, full models offer more predictive power than simply using a prior outbreak of violence as the leading indicator of current violence. This suggests that a refined data selection methodology combined with strategic use of machine learning algorithms could indeed offer a significant addition to the early warning toolkit. Finally, the paper suggests a number of steps moving forward to improve upon this initial test methodology.

  9. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis.

    Science.gov (United States)

    Motwani, Manish; Dey, Damini; Berman, Daniel S; Germano, Guido; Achenbach, Stephan; Al-Mallah, Mouaz H; Andreini, Daniele; Budoff, Matthew J; Cademartiri, Filippo; Callister, Tracy Q; Chang, Hyuk-Jae; Chinnaiyan, Kavitha; Chow, Benjamin J W; Cury, Ricardo C; Delago, Augustin; Gomez, Millie; Gransar, Heidi; Hadamitzky, Martin; Hausleiter, Joerg; Hindoyan, Niree; Feuchtner, Gudrun; Kaufmann, Philipp A; Kim, Yong-Jin; Leipsic, Jonathon; Lin, Fay Y; Maffei, Erica; Marques, Hugo; Pontone, Gianluca; Raff, Gilbert; Rubinshtein, Ronen; Shaw, Leslee J; Stehli, Julia; Villines, Todd C; Dunning, Allison; Min, James K; Slomka, Piotr J

    2017-02-14

    Traditional prognostic risk assessment in patients undergoing non-invasive imaging is based upon a limited selection of clinical and imaging findings. Machine learning (ML) can consider a greater number and complexity of variables. Therefore, we investigated the feasibility and accuracy of ML to predict 5-year all-cause mortality (ACM) in patients undergoing coronary computed tomographic angiography (CCTA), and compared the performance to existing clinical or CCTA metrics. The analysis included 10 030 patients with suspected coronary artery disease and 5-year follow-up from the COronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenter registry. All patients underwent CCTA as their standard of care. Twenty-five clinical and 44 CCTA parameters were evaluated, including segment stenosis score (SSS), segment involvement score (SIS), modified Duke index (DI), number of segments with non-calcified, mixed or calcified plaques, age, sex, gender, standard cardiovascular risk factors, and Framingham risk score (FRS). Machine learning involved automated feature selection by information gain ranking, model building with a boosted ensemble algorithm, and 10-fold stratified cross-validation. Seven hundred and forty-five patients died during 5-year follow-up. Machine learning exhibited a higher area-under-curve compared with the FRS or CCTA severity scores alone (SSS, SIS, DI) for predicting all-cause mortality (ML: 0.79 vs. FRS: 0.61, SSS: 0.64, SIS: 0.64, DI: 0.62; PMachine learning combining clinical and CCTA data was found to predict 5-year ACM significantly better than existing clinical or CCTA metrics alone. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2016. For permissions please email: journals.permissions@oup.com.

  10. Spatial extreme learning machines: An application on prediction of disease counts.

    Science.gov (United States)

    Prates, Marcos O

    2018-01-01

    Extreme learning machines have gained a lot of attention by the machine learning community because of its interesting properties and computational advantages. With the increase in collection of information nowadays, many sources of data have missing information making statistical analysis harder or unfeasible. In this paper, we present a new model, coined spatial extreme learning machine, that combine spatial modeling with extreme learning machines keeping the nice properties of both methodologies and making it very flexible and robust. As explained throughout the text, the spatial extreme learning machines have many advantages in comparison with the traditional extreme learning machines. By a simulation study and a real data analysis we present how the spatial extreme learning machine can be used to improve imputation of missing data and uncertainty prediction estimation.

  11. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  12. BEBP: An Poisoning Method Against Machine Learning Based IDSs

    OpenAIRE

    Li, Pan; Liu, Qiang; Zhao, Wentao; Wang, Dongxu; Wang, Siqi

    2018-01-01

    In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). However, practical IDSs generally update their decision module by feeding new data then retraining learning models in a periodical way. Hence, some attacks that comprise the data for training or testing classifiers significantly challenge the detecting capability of machine learning-based IDSs. Poisoning attack, which is one of the most recognized security threats towards machine learning...

  13. Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    Science.gov (United States)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2016-03-01

    Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.

  14. Revisit of Machine Learning Supported Biological and Biomedical Studies.

    Science.gov (United States)

    Yu, Xiang-Tian; Wang, Lu; Zeng, Tao

    2018-01-01

    Generally, machine learning includes many in silico methods to transform the principles underlying natural phenomenon to human understanding information, which aim to save human labor, to assist human judge, and to create human knowledge. It should have wide application potential in biological and biomedical studies, especially in the era of big biological data. To look through the application of machine learning along with biological development, this review provides wide cases to introduce the selection of machine learning methods in different practice scenarios involved in the whole biological and biomedical study cycle and further discusses the machine learning strategies for analyzing omics data in some cutting-edge biological studies. Finally, the notes on new challenges for machine learning due to small-sample high-dimension are summarized from the key points of sample unbalance, white box, and causality.

  15. Statistical and machine learning approaches for network analysis

    CERN Document Server

    Dehmer, Matthias

    2012-01-01

    Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internation

  16. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  17. The potential for machine learning algorithms to improve and reduce the cost of 3-dimensional printing for surgical planning.

    Science.gov (United States)

    Huff, Trevor J; Ludwig, Parker E; Zuniga, Jorge M

    2018-05-01

    3D-printed anatomical models play an important role in medical and research settings. The recent successes of 3D anatomical models in healthcare have led many institutions to adopt the technology. However, there remain several issues that must be addressed before it can become more wide-spread. Of importance are the problems of cost and time of manufacturing. Machine learning (ML) could be utilized to solve these issues by streamlining the 3D modeling process through rapid medical image segmentation and improved patient selection and image acquisition. The current challenges, potential solutions, and future directions for ML and 3D anatomical modeling in healthcare are discussed. Areas covered: This review covers research articles in the field of machine learning as related to 3D anatomical modeling. Topics discussed include automated image segmentation, cost reduction, and related time constraints. Expert commentary: ML-based segmentation of medical images could potentially improve the process of 3D anatomical modeling. However, until more research is done to validate these technologies in clinical practice, their impact on patient outcomes will remain unknown. We have the necessary computational tools to tackle the problems discussed. The difficulty now lies in our ability to collect sufficient data.

  18. Machine learning in the string landscape

    Science.gov (United States)

    Carifio, Jonathan; Halverson, James; Krioukov, Dmitri; Nelson, Brent D.

    2017-09-01

    We utilize machine learning to study the string landscape. Deep data dives and conjecture generation are proposed as useful frameworks for utilizing machine learning in the landscape, and examples of each are presented. A decision tree accurately predicts the number of weak Fano toric threefolds arising from reflexive polytopes, each of which determines a smooth F-theory compactification, and linear regression generates a previously proven conjecture for the gauge group rank in an ensemble of 4/3× 2.96× {10}^{755} F-theory compactifications. Logistic regression generates a new conjecture for when E 6 arises in the large ensemble of F-theory compactifications, which is then rigorously proven. This result may be relevant for the appearance of visible sectors in the ensemble. Through conjecture generation, machine learning is useful not only for numerics, but also for rigorous results.

  19. Evaluation on knowledge extraction and machine learning in ...

    African Journals Online (AJOL)

    Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity. ... No 5S (2017) >. Log in or Register to get access to full text downloads. ... Keywords: ambiguity; lexical knowledge; machine learning; Malay word ...

  20. Machine learning-based kinetic modeling: a robust and reproducible solution for quantitative analysis of dynamic PET data.

    Science.gov (United States)

    Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia

    2017-05-07

    A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.

  1. Machine learning-based kinetic modeling: a robust and reproducible solution for quantitative analysis of dynamic PET data

    Science.gov (United States)

    Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia

    2017-05-01

    A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.

  2. Source localization in an ocean waveguide using supervised machine learning.

    Science.gov (United States)

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  3. Machine Learning in Radiology: Applications Beyond Image Interpretation.

    Science.gov (United States)

    Lakhani, Paras; Prater, Adam B; Hutson, R Kent; Andriole, Kathy P; Dreyer, Keith J; Morey, Jose; Prevedello, Luciano M; Clark, Toshi J; Geis, J Raymond; Itri, Jason N; Hawkins, C Matthew

    2018-02-01

    Much attention has been given to machine learning and its perceived impact in radiology, particularly in light of recent success with image classification in international competitions. However, machine learning is likely to impact radiology outside of image interpretation long before a fully functional "machine radiologist" is implemented in practice. Here, we describe an overview of machine learning, its application to radiology and other domains, and many cases of use that do not involve image interpretation. We hope that better understanding of these potential applications will help radiology practices prepare for the future and realize performance improvement and efficiency gains. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  4. Prostate Cancer Probability Prediction By Machine Learning Technique.

    Science.gov (United States)

    Jović, Srđan; Miljković, Milica; Ivanović, Miljan; Šaranović, Milena; Arsić, Milena

    2017-11-26

    The main goal of the study was to explore possibility of prostate cancer prediction by machine learning techniques. In order to improve the survival probability of the prostate cancer patients it is essential to make suitable prediction models of the prostate cancer. If one make relevant prediction of the prostate cancer it is easy to create suitable treatment based on the prediction results. Machine learning techniques are the most common techniques for the creation of the predictive models. Therefore in this study several machine techniques were applied and compared. The obtained results were analyzed and discussed. It was concluded that the machine learning techniques could be used for the relevant prediction of prostate cancer.

  5. Applications of machine learning in cancer prediction and prognosis.

    Science.gov (United States)

    Cruz, Joseph A; Wishart, David S

    2007-02-11

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  6. Implementing Machine Learning in Radiology Practice and Research.

    Science.gov (United States)

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  7. International Conference on Extreme Learning Machines 2014

    CERN Document Server

    Mao, Kezhi; Cambria, Erik; Man, Zhihong; Toh, Kar-Ann

    2015-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2014, which was held in Singapore, December 8-10, 2014. This conference brought together the researchers and practitioners of Extreme Learning Machine (ELM) from a variety of fields to promote research and development of “learning without iterative tuning”.  The book covers theories, algorithms and applications of ELM. It gives the readers a glance of the most recent advances of ELM.  

  8. International Conference on Extreme Learning Machine 2015

    CERN Document Server

    Mao, Kezhi; Wu, Jonathan; Lendasse, Amaury; ELM 2015; Theory, Algorithms and Applications (I); Theory, Algorithms and Applications (II)

    2016-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2015, which was held in Hangzhou, China, December 15-17, 2015. This conference brought together researchers and engineers to share and exchange R&D experience on both theoretical studies and practical applications of the Extreme Learning Machine (ELM) technique and brain learning. This book covers theories, algorithms ad applications of ELM. It gives readers a glance of the most recent advances of ELM. .

  9. Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

    Science.gov (United States)

    Wallace, Byron C; Noel-Storr, Anna; Marshall, Iain J; Cohen, Aaron M; Smalheiser, Neil R; Thomas, James

    2017-11-01

    Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  10. Materials Screening for the Discovery of New Half-Heuslers: Machine Learning versus ab Initio Methods.

    Science.gov (United States)

    Legrain, Fleur; Carrete, Jesús; van Roekeghem, Ambroise; Madsen, Georg K H; Mingo, Natalio

    2018-01-18

    Machine learning (ML) is increasingly becoming a helpful tool in the search for novel functional compounds. Here we use classification via random forests to predict the stability of half-Heusler (HH) compounds, using only experimentally reported compounds as a training set. Cross-validation yields an excellent agreement between the fraction of compounds classified as stable and the actual fraction of truly stable compounds in the ICSD. The ML model is then employed to screen 71 178 different 1:1:1 compositions, yielding 481 likely stable candidates. The predicted stability of HH compounds from three previous high-throughput ab initio studies is critically analyzed from the perspective of the alternative ML approach. The incomplete consistency among the three separate ab initio studies and between them and the ML predictions suggests that additional factors beyond those considered by ab initio phase stability calculations might be determinant to the stability of the compounds. Such factors can include configurational entropies and quasiharmonic contributions.

  11. Active learning machine learns to create new quantum experiments.

    Science.gov (United States)

    Melnikov, Alexey A; Poulsen Nautrup, Hendrik; Krenn, Mario; Dunjko, Vedran; Tiersch, Markus; Zeilinger, Anton; Briegel, Hans J

    2018-02-06

    How useful can machine learning be in a quantum laboratory? Here we raise the question of the potential of intelligent machines in the context of scientific research. A major motivation for the present work is the unknown reachability of various entanglement classes in quantum experiments. We investigate this question by using the projective simulation model, a physics-oriented approach to artificial intelligence. In our approach, the projective simulation system is challenged to design complex photonic quantum experiments that produce high-dimensional entangled multiphoton states, which are of high interest in modern quantum experiments. The artificial intelligence system learns to create a variety of entangled states and improves the efficiency of their realization. In the process, the system autonomously (re)discovers experimental techniques which are only now becoming standard in modern quantum optical experiments-a trait which was not explicitly demanded from the system but emerged through the process of learning. Such features highlight the possibility that machines could have a significantly more creative role in future research.

  12. Virtual Things for Machine Learning Applications

    OpenAIRE

    Bovet , Gérôme; Ridi , Antonio; Hennebert , Jean

    2014-01-01

    International audience; Internet-of-Things (IoT) devices, especially sensors are pro-ducing large quantities of data that can be used for gather-ing knowledge. In this field, machine learning technologies are increasingly used to build versatile data-driven models. In this paper, we present a novel architecture able to ex-ecute machine learning algorithms within the sensor net-work, presenting advantages in terms of privacy and data transfer efficiency. We first argument that some classes of ...

  13. Model-Agnostic Interpretability of Machine Learning

    OpenAIRE

    Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos

    2016-01-01

    Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred f...

  14. Building machine learning systems with Python

    CERN Document Server

    Coelho, Luis Pedro

    2015-01-01

    This book primarily targets Python developers who want to learn and use Python's machine learning capabilities and gain valuable insights from data to develop effective solutions for business problems.

  15. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  16. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    International Nuclear Information System (INIS)

    Kanemoto, Shigeru; Watanabe, Masaya; Yusa, Noritaka

    2014-01-01

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology

  17. Semi-local machine-learned kinetic energy density functional with third-order gradients of electron density

    Science.gov (United States)

    Seino, Junji; Kageyama, Ryo; Fujinami, Mikito; Ikabata, Yasuhiro; Nakai, Hiromi

    2018-06-01

    A semi-local kinetic energy density functional (KEDF) was constructed based on machine learning (ML). The present scheme adopts electron densities and their gradients up to third-order as the explanatory variables for ML and the Kohn-Sham (KS) kinetic energy density as the response variable in atoms and molecules. Numerical assessments of the present scheme were performed in atomic and molecular systems, including first- and second-period elements. The results of 37 conventional KEDFs with explicit formulae were also compared with those of the ML KEDF with an implicit formula. The inclusion of the higher order gradients reduces the deviation of the total kinetic energies from the KS calculations in a stepwise manner. Furthermore, our scheme with the third-order gradient resulted in the closest kinetic energies to the KS calculations out of the presented functionals.

  18. Machine learning with quantum relative entropy

    International Nuclear Information System (INIS)

    Tsuda, Koji

    2009-01-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  19. Machine learning with quantum relative entropy

    Energy Technology Data Exchange (ETDEWEB)

    Tsuda, Koji [Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tuebingen, 72076 (Germany)], E-mail: koji.tsuda@tuebingen.mpg.de

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  20. Machine Learning Applications to Resting-State Functional MR Imaging Analysis.

    Science.gov (United States)

    Billings, John M; Eder, Maxwell; Flood, William C; Dhami, Devendra Singh; Natarajan, Sriraam; Whitlow, Christopher T

    2017-11-01

    Machine learning is one of the most exciting and rapidly expanding fields within computer science. Academic and commercial research entities are investing in machine learning methods, especially in personalized medicine via patient-level classification. There is great promise that machine learning methods combined with resting state functional MR imaging will aid in diagnosis of disease and guide potential treatment for conditions thought to be impossible to identify based on imaging alone, such as psychiatric disorders. We discuss machine learning methods and explore recent advances. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Learning as a Machine: Crossovers between Humans and Machines

    Science.gov (United States)

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  2. Machine Learning Phases of Strongly Correlated Fermions

    Directory of Open Access Journals (Sweden)

    Kelvin Ch’ng

    2017-08-01

    Full Text Available Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling. We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  3. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  4. Studying depression using imaging and machine learning methods.

    Science.gov (United States)

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  5. MoleculeNet: a benchmark for molecular machine learning.

    Science.gov (United States)

    Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay

    2018-01-14

    Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

  6. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  7. Attacking Machine Learning models as part of a cyber kill chain

    OpenAIRE

    Nguyen, Tam N.

    2017-01-01

    Machine learning is gaining popularity in the network security domain as many more network-enabled devices get connected, as malicious activities become stealthier, and as new technologies like Software Defined Networking emerge. Compromising machine learning model is a desirable goal. In fact, spammers have been quite successful getting through machine learning enabled spam filters for years. While previous works have been done on adversarial machine learning, none has been considered within...

  8. Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.

    Science.gov (United States)

    Gong, Xiajing; Hu, Meng; Zhao, Liang

    2018-05-01

    Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time-to-event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high-dimensional data featured by a large number of predictor variables. Our results showed that ML-based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high-dimensional data. The prediction performances of ML-based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML-based methods provide a powerful tool for time-to-event analysis, with a built-in capacity for high-dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. © 2018 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  9. Adaptive Learning Systems: Beyond Teaching Machines

    Science.gov (United States)

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  10. Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

    Science.gov (United States)

    Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

    2018-02-01

    We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.

  11. Predicting Solar Activity Using Machine-Learning Methods

    Science.gov (United States)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  12. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  13. Simulation-driven machine learning: Bearing fault classification

    Science.gov (United States)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  14. IoT Security Techniques Based on Machine Learning

    OpenAIRE

    Xiao, Liang; Wan, Xiaoyue; Lu, Xiaozhen; Zhang, Yanyong; Wu, Di

    2018-01-01

    Internet of things (IoT) that integrate a variety of devices into networks to provide advanced and intelligent services have to protect user privacy and address attacks such as spoofing attacks, denial of service attacks, jamming and eavesdropping. In this article, we investigate the attack model for IoT systems, and review the IoT security solutions based on machine learning techniques including supervised learning, unsupervised learning and reinforcement learning. We focus on the machine le...

  15. Modern machine learning techniques and their applications in cartoon animation research

    CERN Document Server

    Yu, Jun

    2013-01-01

    The integration of machine learning techniques and cartoon animation research is fast becoming a hot topic. This book helps readers learn the latest machine learning techniques, including patch alignment framework; spectral clustering, graph cuts, and convex relaxation; ensemble manifold learning; multiple kernel learning; multiview subspace learning; and multiview distance metric learning. It then presents the applications of these modern machine learning techniques in cartoon animation research. With these techniques, users can efficiently utilize the cartoon materials to generate animations

  16. The application of machine learning to the modelling of percutaneous absorption: an overview and guide.

    Science.gov (United States)

    Ashrafi, P; Moss, G P; Wilkinson, S C; Davey, N; Sun, Y

    2015-01-01

    Machine learning (ML) methods have been applied to the analysis of a range of biological systems. This paper reviews the application of these methods to the problem domain of skin permeability and addresses critically some of the key issues. Specifically, ML methods offer great potential in both predictive ability and their ability to provide mechanistic insight to, in this case, the phenomena of skin permeation. However, they are beset by perceptions of a lack of transparency and, often, once a ML or related method has been published there is little impetus from other researchers to adopt such methods. This is usually due to the lack of transparency in some methods and the lack of availability of specific coding for running advanced ML methods. This paper reviews critically the application of ML methods to percutaneous absorption and addresses the key issue of transparency by describing in detail - and providing the detailed coding for - the process of running a ML method (in this case, a Gaussian process regression method). Although this method is applied here to the field of percutaneous absorption, it may be applied more broadly to any biological system.

  17. Implementing Machine Learning in the PCWG Tool

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    2016-12-13

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  18. Machine learning and data science in soft materials engineering

    Science.gov (United States)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  19. Machine learning and data science in soft materials engineering.

    Science.gov (United States)

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  20. Machine learning in computational biology to accelerate high-throughput protein expression.

    Science.gov (United States)

    Sastry, Anand; Monk, Jonathan; Tegel, Hanna; Uhlen, Mathias; Palsson, Bernhard O; Rockberg, Johan; Brunk, Elizabeth

    2017-08-15

    The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. ebrunk@ucsd.edu or johanr@biotech.kth.se. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Using Machine Learning to Search for MSSM Higgs Bosons

    CERN Document Server

    Diesing, Rebecca

    2016-01-01

    This paper examines the performance of machine learning in the identification of Minimally Su- persymmetric Standard Model (MSSM) Higgs Bosons, and compares this performance to that of traditional cut strategies. Two boosted decision tree algorithms were tested, scikit-learn and XGBoost. These tests indicated that machine learning can perform significantly better than traditional cuts. However, since machine learning in this form cannot be directly implemented in a real MSSM Higgs analysis, this performance information was instead used to better understand the relationships between training variables. Further studies might use this information to construct an improved cut strategy.

  2. Applications of Machine Learning in Cancer Prediction and Prognosis

    Directory of Open Access Journals (Sweden)

    Joseph A. Cruz

    2006-01-01

    Full Text Available Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25% improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  3. A review of machine learning in obesity.

    Science.gov (United States)

    DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M

    2018-05-01

    Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.

  4. Prediction of selective estrogen receptor beta agonist using open data and machine learning approach.

    Science.gov (United States)

    Niu, Ai-Qin; Xie, Liang-Jun; Wang, Hui; Zhu, Bing; Wang, Sheng-Qi

    2016-01-01

    Estrogen receptors (ERs) are nuclear transcription factors that are involved in the regulation of many complex physiological processes in humans. ERs have been validated as important drug targets for the treatment of various diseases, including breast cancer, ovarian cancer, osteoporosis, and cardiovascular disease. ERs have two subtypes, ER-α and ER-β. Emerging data suggest that the development of subtype-selective ligands that specifically target ER-β could be a more optimal approach to elicit beneficial estrogen-like activities and reduce side effects. Herein, we focused on ER-β and developed its in silico quantitative structure-activity relationship models using machine learning (ML) methods. The chemical structures and ER-β bioactivity data were extracted from public chemogenomics databases. Four types of popular fingerprint generation methods including MACCS fingerprint, PubChem fingerprint, 2D atom pairs, and Chemistry Development Kit extended fingerprint were used as descriptors. Four ML methods including Naïve Bayesian classifier, k-nearest neighbor, random forest, and support vector machine were used to train the models. The range of classification accuracies was 77.10% to 88.34%, and the range of area under the ROC (receiver operating characteristic) curve values was 0.8151 to 0.9475, evaluated by the 5-fold cross-validation. Comparison analysis suggests that both the random forest and the support vector machine are superior for the classification of selective ER-β agonists. Chemistry Development Kit extended fingerprints and MACCS fingerprint performed better in structural representation between active and inactive agonists. These results demonstrate that combining the fingerprint and ML approaches leads to robust ER-β agonist prediction models, which are potentially applicable to the identification of selective ER-β agonists.

  5. On the Use of Machine Learning for Identifying Botnet Network Traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    contemporary approaches use machine learning techniques for identifying malicious traffic. This paper presents a survey of contemporary botnet detection methods that rely on machine learning for identifying botnet network traffic. The paper provides a comprehensive overview on the existing scientific work thus...... contributing to the better understanding of capabilities, limitations and opportunities of using machine learning for identifying botnet traffic. Furthermore, the paper outlines possibilities for the future development of machine learning-based botnet detection systems....

  6. Proceedings of the IEEE Machine Learning for Signal Processing XVII

    DEFF Research Database (Denmark)

    The seventeenth of a series of workshops sponsored by the IEEE Signal Processing Society and organized by the Machine Learning for Signal Processing Technical Committee (MLSP-TC). The field of machine learning has matured considerably in both methodology and real-world application domains and has...... become particularly important for solution of problems in signal processing. As reflected in this collection, machine learning for signal processing combines many ideas from adaptive signal/image processing, learning theory and models, and statistics in order to solve complex real-world signal processing......, and two papers from the winners of the Data Analysis Competition. The program included papers in the following areas: genomic signal processing, pattern recognition and classification, image and video processing, blind signal processing, models, learning algorithms, and applications of machine learning...

  7. Studying depression using imaging and machine learning methods

    Directory of Open Access Journals (Sweden)

    Meenal J. Patel

    2016-01-01

    Full Text Available Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1 presents a background on depression, imaging, and machine learning methodologies; (2 reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3 suggests directions for future depression-related studies.

  8. Prediction of selective estrogen receptor beta agonist using open data and machine learning approach

    Directory of Open Access Journals (Sweden)

    Niu AQ

    2016-07-01

    Full Text Available Ai-qin Niu,1 Liang-jun Xie,2 Hui Wang,1 Bing Zhu,1 Sheng-qi Wang3 1Department of Gynecology, the First People’s Hospital of Shangqiu, Shangqiu, Henan, People’s Republic of China; 2Department of Image Diagnoses, the Third Hospital of Jinan, Jinan, Shandong, People’s Republic of China; 3Department of Mammary Disease, Guangdong Provincial Hospital of Chinese Medicine, the Second Clinical College of Guangzhou University of Chinese Medicine, Guangzhou, People’s Republic of China Background: Estrogen receptors (ERs are nuclear transcription factors that are involved in the regulation of many complex physiological processes in humans. ERs have been validated as important drug targets for the treatment of various diseases, including breast cancer, ovarian cancer, osteoporosis, and cardiovascular disease. ERs have two subtypes, ER-α and ER-β. Emerging data suggest that the development of subtype-selective ligands that specifically target ER-β could be a more optimal approach to elicit beneficial estrogen-like activities and reduce side effects. Methods: Herein, we focused on ER-β and developed its in silico quantitative structure-activity relationship models using machine learning (ML methods. Results: The chemical structures and ER-β bioactivity data were extracted from public chemogenomics databases. Four types of popular fingerprint generation methods including MACCS fingerprint, PubChem fingerprint, 2D atom pairs, and Chemistry Development Kit extended fingerprint were used as descriptors. Four ML methods including Naïve Bayesian classifier, k-nearest neighbor, random forest, and support vector machine were used to train the models. The range of classification accuracies was 77.10% to 88.34%, and the range of area under the ROC (receiver operating characteristic curve values was 0.8151 to 0.9475, evaluated by the 5-fold cross-validation. Comparison analysis suggests that both the random forest and the support vector machine are superior

  9. Learning Machines Implemented on Non-Deterministic Hardware

    OpenAIRE

    Gupta, Suyog; Sindhwani, Vikas; Gopalakrishnan, Kailash

    2014-01-01

    This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most part -- oblivious to the details of the underlying hardware-level implementations. The hardware/software co-design methodology advocated here hinges on the deployment of compute-intensive machine learning kernels onto compute platforms that trade-off deter...

  10. Computer vision and machine learning for robust phenotyping in genome-wide studies.

    Science.gov (United States)

    Zhang, Jiaoping; Naik, Hsiang Sing; Assefa, Teshale; Sarkar, Soumik; Reddy, R V Chowda; Singh, Arti; Ganapathysubramanian, Baskar; Singh, Asheesh K

    2017-03-08

    Traditional evaluation of crop biotic and abiotic stresses are time-consuming and labor-intensive limiting the ability to dissect the genetic basis of quantitative traits. A machine learning (ML)-enabled image-phenotyping pipeline for the genetic studies of abiotic stress iron deficiency chlorosis (IDC) of soybean is reported. IDC classification and severity for an association panel of 461 diverse plant-introduction accessions was evaluated using an end-to-end phenotyping workflow. The workflow consisted of a multi-stage procedure including: (1) optimized protocols for consistent image capture across plant canopies, (2) canopy identification and registration from cluttered backgrounds, (3) extraction of domain expert informed features from the processed images to accurately represent IDC expression, and (4) supervised ML-based classifiers that linked the automatically extracted features with expert-rating equivalent IDC scores. ML-generated phenotypic data were subsequently utilized for the genome-wide association study and genomic prediction. The results illustrate the reliability and advantage of ML-enabled image-phenotyping pipeline by identifying previously reported locus and a novel locus harboring a gene homolog involved in iron acquisition. This study demonstrates a promising path for integrating the phenotyping pipeline into genomic prediction, and provides a systematic framework enabling robust and quicker phenotyping through ground-based systems.

  11. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  12. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  13. Transductive and matched-pair machine learning for difficult target detection problems

    Science.gov (United States)

    Theiler, James

    2014-06-01

    This paper will describe the application of two non-traditional kinds of machine learning (transductive machine learning and the more recently proposed matched-pair machine learning) to the target detection problem. The approach combines explicit domain knowledge to model the target signal with a more agnostic machine-learning approach to characterize the background. The concept is illustrated with simulated data from an elliptically-contoured background distribution, on which a subpixel target of known spectral signature but unknown spatial extent has been implanted.

  14. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  15. Quantitative forecasting of PTSD from early trauma responses: a Machine Learning application.

    Science.gov (United States)

    Galatzer-Levy, Isaac R; Karstoft, Karen-Inge; Statnikov, Alexander; Shalev, Arieh Y

    2014-12-01

    There is broad interest in predicting the clinical course of mental disorders from early, multimodal clinical and biological information. Current computational models, however, constitute a significant barrier to realizing this goal. The early identification of trauma survivors at risk of post-traumatic stress disorder (PTSD) is plausible given the disorder's salient onset and the abundance of putative biological and clinical risk indicators. This work evaluates the ability of Machine Learning (ML) forecasting approaches to identify and integrate a panel of unique predictive characteristics and determine their accuracy in forecasting non-remitting PTSD from information collected within 10 days of a traumatic event. Data on event characteristics, emergency department observations, and early symptoms were collected in 957 trauma survivors, followed for fifteen months. An ML feature selection algorithm identified a set of predictors that rendered all others redundant. Support Vector Machines (SVMs) as well as other ML classification algorithms were used to evaluate the forecasting accuracy of i) ML selected features, ii) all available features without selection, and iii) Acute Stress Disorder (ASD) symptoms alone. SVM also compared the prediction of a) PTSD diagnostic status at 15 months to b) posterior probability of membership in an empirically derived non-remitting PTSD symptom trajectory. Results are expressed as mean Area Under Receiver Operating Characteristics Curve (AUC). The feature selection algorithm identified 16 predictors, present in ≥ 95% cross-validation trials. The accuracy of predicting non-remitting PTSD from that set (AUC = .77) did not differ from predicting from all available information (AUC = .78). Predicting from ASD symptoms was not better then chance (AUC = .60). The prediction of PTSD status was less accurate than that of membership in a non-remitting trajectory (AUC = .71). ML methods may fill a critical gap in forecasting PTSD. The

  16. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  17. A review of supervised machine learning applied to ageing research.

    Science.gov (United States)

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  18. Quantum machine learning what quantum computing means to data mining

    CERN Document Server

    Wittek, Peter

    2014-01-01

    Quantum Machine Learning bridges the gap between abstract developments in quantum computing and the applied research on machine learning. Paring down the complexity of the disciplines involved, it focuses on providing a synthesis that explains the most important machine learning algorithms in a quantum framework. Theoretical advances in quantum computing are hard to follow for computer scientists, and sometimes even for researchers involved in the field. The lack of a step-by-step guide hampers the broader understanding of this emergent interdisciplinary body of research. Quantum Machine L

  19. Machine learning models in breast cancer survival prediction.

    Science.gov (United States)

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of

  20. Machine learning modelling for predicting soil liquefaction susceptibility

    Directory of Open Access Journals (Sweden)

    P. Samui

    2011-01-01

    Full Text Available This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN based on multi-layer perceptions (MLP that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N160] and cyclic stress ratio (CSR. Further, an attempt has been made to simplify the models, requiring only the two parameters [(N160 and peck ground acceleration (amax/g], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  1. A machine learning model with human cognitive biases capable of learning from small and biased datasets.

    Science.gov (United States)

    Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro

    2018-05-09

    Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.

  2. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  3. Teraflop-scale Incremental Machine Learning

    OpenAIRE

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-u...

  4. Recent Advances in Predictive (Machine) Learning

    Energy Technology Data Exchange (ETDEWEB)

    Friedman, J

    2004-01-24

    Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

  5. Distinguishing butchery cut marks from crocodile bite marks through machine learning methods.

    Science.gov (United States)

    Domínguez-Rodrigo, Manuel; Baquedano, Enrique

    2018-04-10

    All models of evolution of human behaviour depend on the correct identification and interpretation of bone surface modifications (BSM) on archaeofaunal assemblages. Crucial evolutionary features, such as the origin of stone tool use, meat-eating, food-sharing, cooperation and sociality can only be addressed through confident identification and interpretation of BSM, and more specifically, cut marks. Recently, it has been argued that linear marks with the same properties as cut marks can be created by crocodiles, thereby questioning whether secure cut mark identifications can be made in the Early Pleistocene fossil record. Powerful classification methods based on multivariate statistics and machine learning (ML) algorithms have previously successfully discriminated cut marks from most other potentially confounding BSM. However, crocodile-made marks were marginal to or played no role in these comparative analyses. Here, for the first time, we apply state-of-the-art ML methods on crocodile linear BSM and experimental butchery cut marks, showing that the combination of multivariate taphonomy and ML methods provides accurate identification of BSM, including cut and crocodile bite marks. This enables empirically-supported hominin behavioural modelling, provided that these methods are applied to fossil assemblages.

  6. Machine Learning-based discovery of closures for reduced models of dynamical systems

    Science.gov (United States)

    Pan, Shaowu; Duraisamy, Karthik

    2017-11-01

    Despite the successful application of machine learning (ML) in fields such as image processing and speech recognition, only a few attempts has been made toward employing ML to represent the dynamics of complex physical systems. Previous attempts mostly focus on parameter calibration or data-driven augmentation of existing models. In this work we present a ML framework to discover closure terms in reduced models of dynamical systems and provide insights into potential problems associated with data-driven modeling. Based on exact closure models for linear system, we propose a general linear closure framework from viewpoint of optimization. The framework is based on trapezoidal approximation of convolution term. Hyperparameters that need to be determined include temporal length of memory effect, number of sampling points, and dimensions of hidden states. To circumvent the explicit specification of memory effect, a general framework inspired from neural networks is also proposed. We conduct both a priori and posteriori evaluations of the resulting model on a number of non-linear dynamical systems. This work was supported in part by AFOSR under the project ``LES Modeling of Non-local effects using Statistical Coarse-graining'' with Dr. Jean-Luc Cambier as the technical monitor.

  7. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop

  8. Contemporary machine learning: techniques for practitioners in the physical sciences

    Science.gov (United States)

    Spears, Brian

    2017-10-01

    Machine learning is the science of using computers to find relationships in data without explicitly knowing or programming those relationships in advance. Often without realizing it, we employ machine learning every day as we use our phones or drive our cars. Over the last few years, machine learning has found increasingly broad application in the physical sciences. This most often involves building a model relationship between a dependent, measurable output and an associated set of controllable, but complicated, independent inputs. The methods are applicable both to experimental observations and to databases of simulated output from large, detailed numerical simulations. In this tutorial, we will present an overview of current tools and techniques in machine learning - a jumping-off point for researchers interested in using machine learning to advance their work. We will discuss supervised learning techniques for modeling complicated functions, beginning with familiar regression schemes, then advancing to more sophisticated decision trees, modern neural networks, and deep learning methods. Next, we will cover unsupervised learning and techniques for reducing the dimensionality of input spaces and for clustering data. We'll show example applications from both magnetic and inertial confinement fusion. Along the way, we will describe methods for practitioners to help ensure that their models generalize from their training data to as-yet-unseen test data. We will finally point out some limitations to modern machine learning and speculate on some ways that practitioners from the physical sciences may be particularly suited to help. This work was performed by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  9. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  10. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    Science.gov (United States)

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  11. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    Science.gov (United States)

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  12. Study of Environmental Data Complexity using Extreme Learning Machine

    Science.gov (United States)

    Leuenberger, Michael; Kanevski, Mikhail

    2017-04-01

    The main goals of environmental data science using machine learning algorithm deal, in a broad sense, around the calibration, the prediction and the visualization of hidden relationship between input and output variables. In order to optimize the models and to understand the phenomenon under study, the characterization of the complexity (at different levels) should be taken into account. Therefore, the identification of the linear or non-linear behavior between input and output variables adds valuable information for the knowledge of the phenomenon complexity. The present research highlights and investigates the different issues that can occur when identifying the complexity (linear/non-linear) of environmental data using machine learning algorithm. In particular, the main attention is paid to the description of a self-consistent methodology for the use of Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. By applying two ELM models (with linear and non-linear activation functions) and by comparing their efficiency, quantification of the linearity can be evaluated. The considered approach is accompanied by simulated and real high dimensional and multivariate data case studies. In conclusion, the current challenges and future development in complexity quantification using environmental data mining are discussed. References - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.

  13. Efficient tuning in supervised machine learning

    NARCIS (Netherlands)

    Koch, Patrick

    2013-01-01

    The tuning of learning algorithm parameters has become more and more important during the last years. With the fast growth of computational power and available memory databases have grown dramatically. This is very challenging for the tuning of parameters arising in machine learning, since the

  14. Combining Formal Logic and Machine Learning for Sentiment Analysis

    DEFF Research Database (Denmark)

    Petersen, Niklas Christoffer; Villadsen, Jørgen

    2014-01-01

    This paper presents a formal logical method for deep structural analysis of the syntactical properties of texts using machine learning techniques for efficient syntactical tagging. To evaluate the method it is used for entity level sentiment analysis as an alternative to pure machine learning...

  15. Machine Learning Principles Can Improve Hip Fracture Prediction

    DEFF Research Database (Denmark)

    Kruse, Christian; Eiken, Pia; Vestergaard, Peter

    2017-01-01

    Apply machine learning principles to predict hip fractures and estimate predictor importance in Dual-energy X-ray absorptiometry (DXA)-scanned men and women. Dual-energy X-ray absorptiometry data from two Danish regions between 1996 and 2006 were combined with national Danish patient data.......89 [0.82; 0.95], but with poor calibration in higher probabilities. A ten predictor subset (BMD, biochemical cholesterol and liver function tests, penicillin use and osteoarthritis diagnoses) achieved a test AUC of 0.86 [0.78; 0.94] using an “xgbTree” model. Machine learning can improve hip fracture...... prediction beyond logistic regression using ensemble models. Compiling data from international cohorts of longer follow-up and performing similar machine learning procedures has the potential to further improve discrimination and calibration....

  16. Machine learning for Big Data analytics in plants.

    Science.gov (United States)

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Advances in Machine Learning and Data Mining for Astronomy

    Science.gov (United States)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  18. Machine Learning via Mathematical Programming

    National Research Council Canada - National Science Library

    Mamgasarian, Olivi

    1999-01-01

    Mathematical programming approaches were applied to a variety of problems in machine learning in order to gain deeper understanding of the problems and to come up with new and more efficient computational algorithms...

  19. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  20. Development of E-Learning Materials for Machining Safety Education

    Science.gov (United States)

    Nakazawa, Tsuyoshi; Mita, Sumiyoshi; Matsubara, Masaaki; Takashima, Takeo; Tanaka, Koichi; Izawa, Satoru; Kawamura, Takashi

    We developed two e-learning materials for Manufacturing Practice safety education: movie learning materials and hazard-detection learning materials. Using these video and sound media, students can learn how to operate machines safely with movie learning materials, which raise the effectiveness of preparation and review for manufacturing practice. Using these materials, students can realize safety operation well. Students can apply knowledge learned in lectures to the detection of hazards and use study methods for hazard detection during machine operation using the hazard-detection learning materials. Particularly, the hazard-detection learning materials raise students‧ safety consciousness and increase students‧ comprehension of knowledge from lectures and comprehension of operations during Manufacturing Practice.

  1. Modeling Geomagnetic Variations using a Machine Learning Framework

    Science.gov (United States)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  2. Interactive Algorithms for Unsupervised Machine Learning

    Science.gov (United States)

    2015-06-01

    in Neural Information Processing Systems, 2013. 14 [3] Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi. On combinato- rial...Myung Jin Choi, Vincent Y F Tan , Animashree Anandkumar, and Alan S Willsky. Learn- ing Latent Tree Graphical Models. Journal of Machine Learning

  3. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    Science.gov (United States)

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  4. Introduction to machine learning for brain imaging.

    Science.gov (United States)

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.

  5. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  6. Comparative Performance Analysis of Machine Learning Techniques for Software Bug Detection

    OpenAIRE

    Saiqa Aleem; Luiz Fernando Capretz; Faheem Ahmed

    2015-01-01

    Machine learning techniques can be used to analyse data from different perspectives and enable developers to retrieve useful information. Machine learning techniques are proven to be useful in terms of software bug prediction. In this paper, a comparative performance analysis of different machine learning techniques is explored f or software bug prediction on public available data sets. Results showed most of the mac ...

  7. Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse.

    Science.gov (United States)

    Garrard, Peter; Rentoumi, Vassiliki; Gesierich, Benno; Miller, Bruce; Gorno-Tempini, Maria Luisa

    2014-06-01

    Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can 'learn' from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Image Classification, Deep Learning and Convolutional Neural Networks : A Comparative Study of Machine Learning Frameworks

    OpenAIRE

    Airola, Rasmus; Hager, Kristoffer

    2017-01-01

    The use of machine learning and specifically neural networks is a growing trend in software development, and has grown immensely in the last couple of years in the light of an increasing need to handle big data and large information flows. Machine learning has a broad area of application, such as human-computer interaction, predicting stock prices, real-time translation, and self driving vehicles. Large companies such as Microsoft and Google have already implemented machine learning in some o...

  9. Machine learning molecular dynamics for the simulation of infrared spectra.

    Science.gov (United States)

    Gastegger, Michael; Behler, Jörg; Marquetand, Philipp

    2017-10-01

    Machine learning has emerged as an invaluable tool in many research areas. In the present work, we harness this power to predict highly accurate molecular infrared spectra with unprecedented computational efficiency. To account for vibrational anharmonic and dynamical effects - typically neglected by conventional quantum chemistry approaches - we base our machine learning strategy on ab initio molecular dynamics simulations. While these simulations are usually extremely time consuming even for small molecules, we overcome these limitations by leveraging the power of a variety of machine learning techniques, not only accelerating simulations by several orders of magnitude, but also greatly extending the size of systems that can be treated. To this end, we develop a molecular dipole moment model based on environment dependent neural network charges and combine it with the neural network potential approach of Behler and Parrinello. Contrary to the prevalent big data philosophy, we are able to obtain very accurate machine learning models for the prediction of infrared spectra based on only a few hundreds of electronic structure reference points. This is made possible through the use of molecular forces during neural network potential training and the introduction of a fully automated sampling scheme. We demonstrate the power of our machine learning approach by applying it to model the infrared spectra of a methanol molecule, n -alkanes containing up to 200 atoms and the protonated alanine tripeptide, which at the same time represents the first application of machine learning techniques to simulate the dynamics of a peptide. In all of these case studies we find an excellent agreement between the infrared spectra predicted via machine learning models and the respective theoretical and experimental spectra.

  10. Component Pin Recognition Using Algorithms Based on Machine Learning

    Science.gov (United States)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  11. Crystal structure representations for machine learning models of formation energies

    Energy Technology Data Exchange (ETDEWEB)

    Faber, Felix [Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel Switzerland; Lindmaa, Alexander [Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping Sweden; von Lilienfeld, O. Anatole [Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel Switzerland; Argonne Leadership Computing Facility, Argonne National Laboratory, 9700 S. Cass Avenue Lemont Illinois 60439; Armiento, Rickard [Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping Sweden

    2015-04-20

    We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organic molecules have been successful using a Coulomb matrix representation of the molecule. We consider three ways to generalize such representations to periodic systems: (i) a matrix where each element is related to the Ewald sum of the electrostatic interaction between two different atoms in the unit cell repeated over the lattice; (ii) an extended Coulomb-like matrix that takes into account a number of neighboring unit cells; and (iii) an ansatz that mimics the periodicity and the basic features of the elements in the Ewald sum matrix using a sine function of the crystal coordinates of the atoms. The representations are compared for a Laplacian kernel with Manhattan norm, trained to reproduce formation energies using a dataset of 3938 crystal structures obtained from the Materials Project. For training sets consisting of 3000 crystals, the generalization error in predicting formation energies of new structures corresponds to (i) 0.49, (ii) 0.64, and (iii) 0.37eV/atom for the respective representations.

  12. A Machine LearningFramework to Forecast Wave Conditions

    Science.gov (United States)

    Zhang, Y.; James, S. C.; O'Donncha, F.

    2017-12-01

    Recently, significant effort has been undertaken to quantify and extract wave energy because it is renewable, environmental friendly, abundant, and often close to population centers. However, a major challenge is the ability to accurately and quickly predict energy production, especially across a 48-hour cycle. Accurate forecasting of wave conditions is a challenging undertaking that typically involves solving the spectral action-balance equation on a discretized grid with high spatial resolution. The nature of the computations typically demands high-performance computing infrastructure. Using a case-study site at Monterey Bay, California, a machine learning framework was trained to replicate numerically simulated wave conditions at a fraction of the typical computational cost. Specifically, the physics-based Simulating WAves Nearshore (SWAN) model, driven by measured wave conditions, nowcast ocean currents, and wind data, was used to generate training data for machine learning algorithms. The model was run between April 1st, 2013 and May 31st, 2017 generating forecasts at three-hour intervals yielding 11,078 distinct model outputs. SWAN-generated fields of 3,104 wave heights and a characteristic period could be replicated through simple matrix multiplications using the mapping matrices from machine learning algorithms. In fact, wave-height RMSEs from the machine learning algorithms (9 cm) were less than those for the SWAN model-verification exercise where those simulations were compared to buoy wave data within the model domain (>40 cm). The validated machine learning approach, which acts as an accurate surrogate for the SWAN model, can now be used to perform real-time forecasts of wave conditions for the next 48 hours using available forecasted boundary wave conditions, ocean currents, and winds. This solution has obvious applications to wave-energy generation as accurate wave conditions can be forecasted with over a three-order-of-magnitude reduction in

  13. Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method

    Directory of Open Access Journals (Sweden)

    Muneki Yasuda

    2018-04-01

    Full Text Available The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the most important models in Markov random fields. However, the inference and learning problems in the Boltzmann machine are NP-hard. The investigation of an effective learning algorithm for the Boltzmann machine is one of the most important challenges in the field of statistical machine learning. In this paper, we study Boltzmann machine learning based on the (first-order spatial Monte Carlo integration method, referred to as the 1-SMCI learning method, which was proposed in the author’s previous paper. In the first part of this paper, we compare the method with the maximum pseudo-likelihood estimation (MPLE method using a theoretical and a numerical approaches, and show the 1-SMCI learning method is more effective than the MPLE. In the latter part, we compare the 1-SMCI learning method with other effective methods, ratio matching and minimum probability flow, using a numerical experiment, and show the 1-SMCI learning method outperforms them.

  14. Proceedings of IEEE Machine Learning for Signal Processing Workshop XVI

    DEFF Research Database (Denmark)

    Larsen, Jan

    These proceedings contains refereed papers presented at the sixteenth IEEE Workshop on Machine Learning for Signal Processing (MLSP'2006), held in Maynooth, Co. Kildare, Ireland, September 6-8, 2006. This is a continuation of the IEEE Workshops on Neural Networks for Signal Processing (NNSP......). The name of the Technical Committee, hence of the Workshop, was changed to Machine Learning for Signal Processing in September 2003 to better reflect the areas represented by the Technical Committee. The conference is organized by the Machine Learning for Signal Processing Technical Committee...... the same standard as the printed version and facilitates the reading and searching of the papers. The field of machine learning has matured considerably in both methodology and real-world application domains and has become particularly important for solution of problems in signal processing. As reflected...

  15. Machine-learning techniques for family demography: an application of random forests to the analysis of divorce determinants in Germany

    OpenAIRE

    Arpino, Bruno; Le Moglie, Marco; Mencarini, Letizia

    2018-01-01

    Demographers often analyze the determinants of life-course events with parametric regression-type approaches. Here, we present a class of nonparametric approaches, broadly defined as machine learning (ML) techniques, and discuss advantages and disadvantages of a popular type known as random forest. We argue that random forests can be useful either as a substitute, or a complement, to more standard parametric regression modeling. Our discussion of random forests is intuitive and...

  16. Comparison between extreme learning machine and wavelet neural networks in data classification

    Science.gov (United States)

    Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri

    2017-03-01

    Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.

  17. Distributed Extreme Learning Machine for Nonlinear Learning over Network

    Directory of Open Access Journals (Sweden)

    Songyan Huang

    2015-02-01

    Full Text Available Distributed data collection and analysis over a network are ubiquitous, especially over a wireless sensor network (WSN. To our knowledge, the data model used in most of the distributed algorithms is linear. However, in real applications, the linearity of systems is not always guaranteed. In nonlinear cases, the single hidden layer feedforward neural network (SLFN with radial basis function (RBF hidden neurons has the ability to approximate any continuous functions and, thus, may be used as the nonlinear learning system. However, confined by the communication cost, using the distributed version of the conventional algorithms to train the neural network directly is usually prohibited. Fortunately, based on the theorems provided in the extreme learning machine (ELM literature, we only need to compute the output weights of the SLFN. Computing the output weights itself is a linear learning problem, although the input-output mapping of the overall SLFN is still nonlinear. Using the distributed algorithmto cooperatively compute the output weights of the SLFN, we obtain a distributed extreme learning machine (dELM for nonlinear learning in this paper. This dELM is applied to the regression problem and classification problem to demonstrate its effectiveness and advantages.

  18. Financial signal processing and machine learning

    CERN Document Server

    Kulkarni,Sanjeev R; Dmitry M. Malioutov

    2016-01-01

    The modern financial industry has been required to deal with large and diverse portfolios in a variety of asset classes often with limited market data available. Financial Signal Processing and Machine Learning unifies a number of recent advances made in signal processing and machine learning for the design and management of investment portfolios and financial engineering. This book bridges the gap between these disciplines, offering the latest information on key topics including characterizing statistical dependence and correlation in high dimensions, constructing effective and robust risk measures, and their use in portfolio optimization and rebalancing. The book focuses on signal processing approaches to model return, momentum, and mean reversion, addressing theoretical and implementation aspects. It highlights the connections between portfolio theory, sparse learning and compressed sensing, sparse eigen-portfolios, robust optimization, non-Gaussian data-driven risk measures, graphical models, causal analy...

  19. Large-scale Machine Learning in High-dimensional Datasets

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen

    Over the last few decades computers have gotten to play an essential role in our daily life, and data is now being collected in various domains at a faster pace than ever before. This dissertation presents research advances in four machine learning fields that all relate to the challenges imposed...... are better at modeling local heterogeneities. In the field of machine learning for neuroimaging, we introduce learning protocols for real-time functional Magnetic Resonance Imaging (fMRI) that allow for dynamic intervention in the human decision process. Specifically, the model exploits the structure of f...

  20. Improved Extreme Learning Machine and Its Application in Image Quality Assessment

    Directory of Open Access Journals (Sweden)

    Li Mao

    2014-01-01

    Full Text Available Extreme learning machine (ELM is a new class of single-hidden layer feedforward neural network (SLFN, which is simple in theory and fast in implementation. Zong et al. propose a weighted extreme learning machine for learning data with imbalanced class distribution, which maintains the advantages from original ELM. However, the current reported ELM and its improved version are only based on the empirical risk minimization principle, which may suffer from overfitting. To solve the overfitting troubles, in this paper, we incorporate the structural risk minimization principle into the (weighted ELM, and propose a modified (weighted extreme learning machine (M-ELM and M-WELM. Experimental results show that our proposed M-WELM outperforms the current reported extreme learning machine algorithm in image quality assessment.

  1. Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions.

    Science.gov (United States)

    Kassahun, Yohannes; Yu, Bingbin; Tibebu, Abraham Temesgen; Stoyanov, Danail; Giannarou, Stamatia; Metzen, Jan Hendrik; Vander Poorten, Emmanuel

    2016-04-01

    Advances in technology and computing play an increasingly important role in the evolution of modern surgical techniques and paradigms. This article reviews the current role of machine learning (ML) techniques in the context of surgery with a focus on surgical robotics (SR). Also, we provide a perspective on the future possibilities for enhancing the effectiveness of procedures by integrating ML in the operating room. The review is focused on ML techniques directly applied to surgery, surgical robotics, surgical training and assessment. The widespread use of ML methods in diagnosis and medical image computing is beyond the scope of the review. Searches were performed on PubMed and IEEE Explore using combinations of keywords: ML, surgery, robotics, surgical and medical robotics, skill learning, skill analysis and learning to perceive. Studies making use of ML methods in the context of surgery are increasingly being reported. In particular, there is an increasing interest in using ML for developing tools to understand and model surgical skill and competence or to extract surgical workflow. Many researchers begin to integrate this understanding into the control of recent surgical robots and devices. ML is an expanding field. It is popular as it allows efficient processing of vast amounts of data for interpreting and real-time decision making. Already widely used in imaging and diagnosis, it is believed that ML will also play an important role in surgery and interventional treatments. In particular, ML could become a game changer into the conception of cognitive surgical robots. Such robots endowed with cognitive skills would assist the surgical team also on a cognitive level, such as possibly lowering the mental load of the team. For example, ML could help extracting surgical skill, learned through demonstration by human experts, and could transfer this to robotic skills. Such intelligent surgical assistance would significantly surpass the state of the art in surgical

  2. An active role for machine learning in drug development

    Science.gov (United States)

    Murphy, Robert F.

    2014-01-01

    Due to the complexity of biological systems, cutting-edge machine-learning methods will be critical for future drug development. In particular, machine-vision methods to extract detailed information from imaging assays and active-learning methods to guide experimentation will be required to overcome the dimensionality problem in drug development. PMID:21587249

  3. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  4. Machine learning-based dual-energy CT parametric mapping.

    Science.gov (United States)

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Al Helo, Rose; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C; Rassouli, Negin; Gilkeson, Robert C; Traughber, Bryan J; Cheng, Chee-Wai; Muzic, Raymond F

    2018-05-22

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Zeff), relative electron density (ρe), mean excitation energy (Ix), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 seconds. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency. . © 2018 Institute of Physics and Engineering in

  5. Mutual learning in a tree parity machine and its application to cryptography

    International Nuclear Information System (INIS)

    Rosen-Zvi, Michal; Klein, Einat; Kanter, Ido; Kinzel, Wolfgang

    2002-01-01

    Mutual learning of a pair of tree parity machines with continuous and discrete weight vectors is studied analytically. The analysis is based on a mapping procedure that maps the mutual learning in tree parity machines onto mutual learning in noisy perceptrons. The stationary solution of the mutual learning in the case of continuous tree parity machines depends on the learning rate where a phase transition from partial to full synchronization is observed. In the discrete case the learning process is based on a finite increment and a full synchronized state is achieved in a finite number of steps. The synchronization of discrete parity machines is introduced in order to construct an ephemeral key-exchange protocol. The dynamic learning of a third tree parity machine (an attacker) that tries to imitate one of the two machines while the two still update their weight vectors is also analyzed. In particular, the synchronization times of the naive attacker and the flipping attacker recently introduced in Ref. 9 are analyzed. All analytical results are found to be in good agreement with simulation results

  6. BELM: Bayesian extreme learning machine.

    Science.gov (United States)

    Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J

    2011-03-01

    The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.

  7. Bidirectional extreme learning machine for regression problem and its learning effectiveness.

    Science.gov (United States)

    Yang, Yimin; Wang, Yaonan; Yuan, Xiaofang

    2012-09-01

    It is clear that the learning effectiveness and learning speed of neural networks are in general far slower than required, which has been a major bottleneck for many applications. Recently, a simple and efficient learning method, referred to as extreme learning machine (ELM), was proposed by Huang , which has shown that, compared to some conventional methods, the training time of neural networks can be reduced by a thousand times. However, one of the open problems in ELM research is whether the number of hidden nodes can be further reduced without affecting learning effectiveness. This brief proposes a new learning algorithm, called bidirectional extreme learning machine (B-ELM), in which some hidden nodes are not randomly selected. In theory, this algorithm tends to reduce network output error to 0 at an extremely early learning stage. Furthermore, we find a relationship between the network output error and the network output weights in the proposed B-ELM. Simulation results demonstrate that the proposed method can be tens to hundreds of times faster than other incremental ELM algorithms.

  8. Corporate Disruption in the Science of Machine Learning

    OpenAIRE

    Work, Sam

    2016-01-01

    This MSc dissertation considers the effects of the current corporate interest on researchers in the field of machine learning. Situated within the field's cyclical history of academic, public and corporate interest, this dissertation investigates how current researchers view recent developments and negotiate their own research practices within an environment of increased commercial interest and funding. The original research consists of in-depth interviews with 12 machine learning researchers...

  9. Application of machine learning methods in bioinformatics

    Science.gov (United States)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  10. Machine Learning Optimization of Evolvable Artificial Cells

    DEFF Research Database (Denmark)

    Caschera, F.; Rasmussen, S.; Hanczyc, M.

    2011-01-01

    can be explored. A machine learning approach (Evo-DoE) could be applied to explore this experimental space and define optimal interactions according to a specific fitness function. Herein an implementation of an evolutionary design of experiments to optimize chemical and biochemical systems based...... on a machine learning process is presented. The optimization proceeds over generations of experiments in iterative loop until optimal compositions are discovered. The fitness function is experimentally measured every time the loop is closed. Two examples of complex systems, namely a liposomal drug formulation...

  11. What is the machine learning?

    Science.gov (United States)

    Chang, Spencer; Cohen, Timothy; Ostdiek, Bryan

    2018-03-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables—aided by physical intuition—that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus nonlinear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

  12. Machine learning for micro-tomography

    Science.gov (United States)

    Parkinson, Dilworth Y.; Pelt, Daniël. M.; Perciano, Talita; Ushizima, Daniela; Krishnan, Harinarayan; Barnard, Harold S.; MacDowell, Alastair A.; Sethian, James

    2017-09-01

    Machine learning has revolutionized a number of fields, but many micro-tomography users have never used it for their work. The micro-tomography beamline at the Advanced Light Source (ALS), in collaboration with the Center for Applied Mathematics for Energy Research Applications (CAMERA) at Lawrence Berkeley National Laboratory, has now deployed a series of tools to automate data processing for ALS users using machine learning. This includes new reconstruction algorithms, feature extraction tools, and image classification and recommen- dation systems for scientific image. Some of these tools are either in automated pipelines that operate on data as it is collected or as stand-alone software. Others are deployed on computing resources at Berkeley Lab-from workstations to supercomputers-and made accessible to users through either scripting or easy-to-use graphical interfaces. This paper presents a progress report on this work.

  13. A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.

    Science.gov (United States)

    Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei

    2017-09-21

    In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.

  14. An efficient flow-based botnet detection using supervised machine learning

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2014-01-01

    Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper...... introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs...... to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates...

  15. ClearTK 2.0: Design Patterns for Machine Learning in UIMA

    OpenAIRE

    Bethard, Steven; Ogren, Philip; Becker, Lee

    2014-01-01

    ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, r...

  16. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    Science.gov (United States)

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

  17. Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    There is a deep connection between machine learning and jet physics - after all, jets are defined by unsupervised learning algorithms. Jet physics has been a driving force for studying modern machine learning in high energy physics. Domain specific challenges require new techniques to make full use of the algorithms. A key focus is on understanding how and what the algorithms learn. Modern machine learning techniques for jet physics are demonstrated for classification, regression, and generation. In addition to providing powerful baseline performance, we show how to train complex models directly on data and to generate sparse stacked images with non-uniform granularity.

  18. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  19. Informatics and machine learning to define the phenotype.

    Science.gov (United States)

    Basile, Anna Okula; Ritchie, Marylyn DeRiggi

    2018-03-01

    For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns. Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data. Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.

  20. Machine learning in manufacturing: advantages, challenges, and applications

    Directory of Open Access Journals (Sweden)

    Thorsten Wuest

    2016-01-01

    Full Text Available The nature of manufacturing systems faces ever more complex, dynamic and at times even chaotic behaviors. In order to being able to satisfy the demand for high-quality products in an efficient manner, it is essential to utilize all means available. One area, which saw fast pace developments in terms of not only promising results but also usability, is machine learning. Promising an answer to many of the old and new challenges of manufacturing, machine learning is widely discussed by researchers and practitioners alike. However, the field is very broad and even confusing which presents a challenge and a barrier hindering wide application. Here, this paper contributes in presenting an overview of available machine learning techniques and structuring this rather complicated area. A special focus is laid on the potential benefit, and examples of successful applications in a manufacturing environment.

  1. Painting galaxies into dark matter halos using machine learning

    Science.gov (United States)

    Agarwal, Shankar; Davé, Romeel; Bassett, Bruce A.

    2018-05-01

    We develop a machine learning (ML) framework to populate large dark matter-only simulations with baryonic galaxies. Our ML framework takes input halo properties including halo mass, environment, spin, and recent growth history, and outputs central galaxy and halo baryonic properties including stellar mass (M*), star formation rate (SFR), metallicity (Z), neutral (H I) and molecular (H_2) hydrogen mass. We apply this to the MUFASA cosmological hydrodynamic simulation, and show that it recovers the mean trends of output quantities with halo mass highly accurately, including following the sharp drop in SFR and gas in quenched massive galaxies. However, the scatter around the mean relations is under-predicted. Examining galaxies individually, at z = 0 the stellar mass and metallicity are accurately recovered (σ ≲ 0.2 dex), but SFR and H I show larger scatter (σ ≳ 0.3 dex); these values improve somewhat at z = 1, 2. Remarkably, ML quantitatively recovers second parameter trends in galaxy properties, e.g. that galaxies with higher gas content and lower metallicity have higher SFR at a given M*. Testing various ML algorithms, we find that none perform significantly better than the others, nor does ensembling improve performance, likely because none of the algorithms reproduce the large observed scatter around the mean properties. For the random forest algorithm, we find that halo mass and nearby (˜200 kpc) environment are the most important predictive variables followed by growth history, while halo spin and ˜Mpc scale environment are not important. Finally we study the impact of additionally inputting key baryonic properties M*, SFR, and Z, as would be available e.g. from an equilibrium model, and show that particularly providing the SFR enables H I to be recovered substantially more accurately.

  2. Machine Learning Based Localization and Classification with Atomic Magnetometers

    Science.gov (United States)

    Deans, Cameron; Griffin, Lewis D.; Marmugi, Luca; Renzoni, Ferruccio

    2018-01-01

    We demonstrate identification of position, material, orientation, and shape of objects imaged by a Rb 85 atomic magnetometer performing electromagnetic induction imaging supported by machine learning. Machine learning maximizes the information extracted from the images created by the magnetometer, demonstrating the use of hidden data. Localization 2.6 times better than the spatial resolution of the imaging system and successful classification up to 97% are obtained. This circumvents the need of solving the inverse problem and demonstrates the extension of machine learning to diffusive systems, such as low-frequency electrodynamics in media. Automated collection of task-relevant information from quantum-based electromagnetic imaging will have a relevant impact from biomedicine to security.

  3. An Interactive Web-based Learning System for Assisting Machining Technology Education

    Directory of Open Access Journals (Sweden)

    Min Jou

    2008-05-01

    Full Text Available The key technique of manufacturing methods is machining. The degree of technique of machining directly affects the quality of the product. Therefore, the machining technique is of primary importance in promoting student practice ability during the training process. Currently, practical training is applied in shop floor to discipline student’s practice ability. Much time and cost are used to teach these techniques. Particularly, computerized machines are continuously increasing in use. The development of educating engineers on computerized machines becomes much more difficult than with traditional machines. This is because of the limitation of the extremely expensive cost of teaching. The quality and quantity of teaching cannot always be promoted in this respect. The traditional teaching methods can not respond well to the needs of the future. Therefore, this research aims to the following topics; (1.Propose the teaching strategies for the students to learning machining processing planning through web-based learning system. (2.Establish on-line teaching material for the computer-aided manufacturing courses including CNC coding method, CNC simulation. (3.Develop the virtual machining laboratory to bring the machining practical training to web-based learning system. (4.Integrate multi-media and virtual laboratory in the developed e-learning web-based system to enhance the effectiveness of machining education through web-based system.

  4. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  5. A distributed algorithm for machine learning

    Science.gov (United States)

    Chen, Shihong

    2018-04-01

    This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.

  6. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  7. Utvärdering av Amazon Machine Learning för taggsystem

    OpenAIRE

    Madosh, Farzana; Lundsten, Erik

    2017-01-01

    How companies deal with machine learning is currently a highly-discussed topic, as it can facilitate corporate manual work by training computers to recognize patterns and thus automate the working procedure. However, this requires resources and knowledge in the field. As a result, various companies like Amazon and Google provide machine learning services without requiring the user to have deep knowledge in the area. This study evaluates Amazon Machine Learning program for a tag system with da...

  8. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    Science.gov (United States)

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  9. A comparative analysis of support vector machines and extreme learning machines.

    Science.gov (United States)

    Liu, Xueyi; Gao, Chuanhou; Li, Ping

    2012-09-01

    The theory of extreme learning machines (ELMs) has recently become increasingly popular. As a new learning algorithm for single-hidden-layer feed-forward neural networks, an ELM offers the advantages of low computational cost, good generalization ability, and ease of implementation. Hence the comparison and model selection between ELMs and other kinds of state-of-the-art machine learning approaches has become significant and has attracted many research efforts. This paper performs a comparative analysis of the basic ELMs and support vector machines (SVMs) from two viewpoints that are different from previous works: one is the Vapnik-Chervonenkis (VC) dimension, and the other is their performance under different training sample sizes. It is shown that the VC dimension of an ELM is equal to the number of hidden nodes of the ELM with probability one. Additionally, their generalization ability and computational complexity are exhibited with changing training sample size. ELMs have weaker generalization ability than SVMs for small sample but can generalize as well as SVMs for large sample. Remarkably, great superiority in computational speed especially for large-scale sample problems is found in ELMs. The results obtained can provide insight into the essential relationship between them, and can also serve as complementary knowledge for their past experimental and theoretical comparisons. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. 2015 International Conference on Machine Learning and Signal Processing

    CERN Document Server

    Woo, Wai; Sulaiman, Hamzah; Othman, Mohd; Saat, Mohd

    2016-01-01

    This book presents important research findings and recent innovations in the field of machine learning and signal processing. A wide range of topics relating to machine learning and signal processing techniques and their applications are addressed in order to provide both researchers and practitioners with a valuable resource documenting the latest advances and trends. The book comprises a careful selection of the papers submitted to the 2015 International Conference on Machine Learning and Signal Processing (MALSIP 2015), which was held on 15–17 December 2015 in Ho Chi Minh City, Vietnam with the aim of offering researchers, academicians, and practitioners an ideal opportunity to disseminate their findings and achievements. All of the included contributions were chosen by expert peer reviewers from across the world on the basis of their interest to the community. In addition to presenting the latest in design, development, and research, the book provides access to numerous new algorithms for machine learni...

  11. Machine Learning applications in CMS

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine Learning is used in many aspects of CMS data taking, monitoring, processing and analysis. We review a few of these use cases and the most recent developments, with an outlook to future applications in the LHC Run III and for the High-Luminosity phase.

  12. IEEE International Workshop on Machine Learning for Signal Processing: Preface

    DEFF Research Database (Denmark)

    Tao, Jianhua

    The 21st IEEE International Workshop on Machine Learning for Signal Processing will be held in Beijing, China, on September 18–21, 2011. The workshop series is the major annual technical event of the IEEE Signal Processing Society's Technical Committee on Machine Learning for Signal Processing...

  13. Application of Machine Learning to Rotorcraft Health Monitoring

    Science.gov (United States)

    Cody, Tyler; Dempsey, Paula J.

    2017-01-01

    Machine learning is a powerful tool for data exploration and model building with large data sets. This project aimed to use machine learning techniques to explore the inherent structure of data from rotorcraft gear tests, relationships between features and damage states, and to build a system for predicting gear health for future rotorcraft transmission applications. Classical machine learning techniques are difficult, if not irresponsible to apply to time series data because many make the assumption of independence between samples. To overcome this, Hidden Markov Models were used to create a binary classifier for identifying scuffing transitions and Recurrent Neural Networks were used to leverage long distance relationships in predicting discrete damage states. When combined in a workflow, where the binary classifier acted as a filter for the fatigue monitor, the system was able to demonstrate accuracy in damage state prediction and scuffing identification. The time dependent nature of the data restricted data exploration to collecting and analyzing data from the model selection process. The limited amount of available data was unable to give useful information, and the division of training and testing sets tended to heavily influence the scores of the models across combinations of features and hyper-parameters. This work built a framework for tracking scuffing and fatigue on streaming data and demonstrates that machine learning has much to offer rotorcraft health monitoring by using Bayesian learning and deep learning methods to capture the time dependent nature of the data. Suggested future work is to implement the framework developed in this project using a larger variety of data sets to test the generalization capabilities of the models and allow for data exploration.

  14. RG-inspired machine learning for lattice field theory

    Directory of Open Access Journals (Sweden)

    Foreman Sam

    2018-01-01

    Full Text Available Machine learning has been a fast growing field of research in several areas dealing with large datasets. We report recent attempts to use renormalization group (RG ideas in the context of machine learning. We examine coarse graining procedures for perceptron models designed to identify the digits of the MNIST data. We discuss the correspondence between principal components analysis (PCA and RG flows across the transition for worm configurations of the 2D Ising model. Preliminary results regarding the logarithmic divergence of the leading PCA eigenvalue were presented at the conference. More generally, we discuss the relationship between PCA and observables in Monte Carlo simulations and the possibility of reducing the number of learning parameters in supervised learning based on RG inspired hierarchical ansatzes.

  15. RG-inspired machine learning for lattice field theory

    Science.gov (United States)

    Foreman, Sam; Giedt, Joel; Meurice, Yannick; Unmuth-Yockey, Judah

    2018-03-01

    Machine learning has been a fast growing field of research in several areas dealing with large datasets. We report recent attempts to use renormalization group (RG) ideas in the context of machine learning. We examine coarse graining procedures for perceptron models designed to identify the digits of the MNIST data. We discuss the correspondence between principal components analysis (PCA) and RG flows across the transition for worm configurations of the 2D Ising model. Preliminary results regarding the logarithmic divergence of the leading PCA eigenvalue were presented at the conference. More generally, we discuss the relationship between PCA and observables in Monte Carlo simulations and the possibility of reducing the number of learning parameters in supervised learning based on RG inspired hierarchical ansatzes.

  16. Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

    Science.gov (United States)

    Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

    2017-08-01

    At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

  17. A Teaching System To Learn Programming: the Programmer's Learning Machine

    OpenAIRE

    Quinson , Martin; Oster , Gérald

    2015-01-01

    International audience; The Programmer's Learning Machine (PLM) is an interactive exerciser for learning programming and algorithms. Using an integrated and graphical environment that provides a short feedback loop, it allows students to learn in a (semi)-autonomous way. This generic platform also enables teachers to create specific programming microworlds that match their teaching goals. This paper discusses our design goals and motivations, introduces the existing material and the proposed ...

  18. Machine learning methods to predict child posttraumatic stress: a proof of concept study.

    Science.gov (United States)

    Saxe, Glenn N; Ma, Sisi; Ren, Jiwen; Aliferis, Constantin

    2017-07-10

    The care of traumatized children would benefit significantly from accurate predictive models for Posttraumatic Stress Disorder (PTSD), using information available around the time of trauma. Machine Learning (ML) computational methods have yielded strong results in recent applications across many diseases and data types, yet they have not been previously applied to childhood PTSD. Since these methods have not been applied to this complex and debilitating disorder, there is a great deal that remains to be learned about their application. The first step is to prove the concept: Can ML methods - as applied in other fields - produce predictive classification models for childhood PTSD? Additionally, we seek to determine if specific variables can be identified - from the aforementioned predictive classification models - with putative causal relations to PTSD. ML predictive classification methods - with causal discovery feature selection - were applied to a data set of 163 children hospitalized with an injury and PTSD was determined three months after hospital discharge. At the time of hospitalization, 105 risk factor variables were collected spanning a range of biopsychosocial domains. Seven percent of subjects had a high level of PTSD symptoms. A predictive classification model was discovered with significant predictive accuracy. A predictive model constructed based on subsets of potentially causally relevant features achieves similar predictivity compared to the best predictive model constructed with all variables. Causal Discovery feature selection methods identified 58 variables of which 10 were identified as most stable. In this first proof-of-concept application of ML methods to predict childhood Posttraumatic Stress we were able to determine both predictive classification models for childhood PTSD and identify several causal variables. This set of techniques has great potential for enhancing the methodological toolkit in the field and future studies should seek to

  19. The application of machine learning techniques in the clinical drug therapy.

    Science.gov (United States)

    Meng, Huan-Yu; Jin, Wan-Lin; Yan, Cheng-Kai; Yang, Huan

    2018-05-25

    The development of a novel drug is an extremely complicated process that includes the target identification, design and manufacture, and proper therapy of the novel drug, as well as drug dose selection, drug efficacy evaluation, and adverse drug reaction control. Due to the limited resources, high costs, long duration, and low hit-to-lead ratio in the development of pharmacogenetics and computer technology, machine learning techniques have assisted novel drug development and have gradually received more attention by researchers. According to current research, machine learning techniques are widely applied in the process of the discovery of new drugs and novel drug targets, the decision surrounding proper therapy and drug dose, and the prediction of drug efficacy and adverse drug reactions. In this article, we discussed the history, workflow, and advantages and disadvantages of machine learning techniques in the processes mentioned above. Although the advantages of machine learning techniques are fairly obvious, the application of machine learning techniques is currently limited. With further research, the application of machine techniques in drug development could be much more widespread and could potentially be one of the major methods used in drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. Machine-Learning Algorithms to Code Public Health Spending Accounts.

    Science.gov (United States)

    Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

    Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

  1. Machine learning strategies for systems with invariance properties

    Science.gov (United States)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  2. Machine learning methods without tears: a primer for ecologists.

    Science.gov (United States)

    Olden, Julian D; Lawler, Joshua J; Poff, N LeRoy

    2008-06-01

    Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.

  3. Statistical and Machine Learning Models to Predict Programming Performance

    OpenAIRE

    Bergin, Susan

    2006-01-01

    This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant...

  4. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  5. Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity

    Science.gov (United States)

    Huang, Bing; von Lilienfeld, O. Anatole

    2016-10-01

    The predictive accuracy of Machine Learning (ML) models of molecular properties depends on the choice of the molecular representation. Inspired by the postulates of quantum mechanics, we introduce a hierarchy of representations which meet uniqueness and target similarity criteria. To systematically control target similarity, we simply rely on interatomic many body expansions, as implemented in universal force-fields, including Bonding, Angular (BA), and higher order terms. Addition of higher order contributions systematically increases similarity to the true potential energy and predictive accuracy of the resulting ML models. We report numerical evidence for the performance of BAML models trained on molecular properties pre-calculated at electron-correlated and density functional theory level of theory for thousands of small organic molecules. Properties studied include enthalpies and free energies of atomization, heat capacity, zero-point vibrational energies, dipole-moment, polarizability, HOMO/LUMO energies and gap, ionization potential, electron affinity, and electronic excitations. After training, BAML predicts energies or electronic properties of out-of-sample molecules with unprecedented accuracy and speed.

  6. Machine Learning Techniques for Stellar Light Curve Classification

    Science.gov (United States)

    Hinners, Trisha A.; Tat, Kevin; Thorp, Rachel

    2018-07-01

    We apply machine learning techniques in an attempt to predict and classify stellar properties from noisy and sparse time-series data. We preprocessed over 94 GB of Kepler light curves from the Mikulski Archive for Space Telescopes (MAST) to classify according to 10 distinct physical properties using both representation learning and feature engineering approaches. Studies using machine learning in the field have been primarily done on simulated data, making our study one of the first to use real light-curve data for machine learning approaches. We tuned our data using previous work with simulated data as a template and achieved mixed results between the two approaches. Representation learning using a long short-term memory recurrent neural network produced no successful predictions, but our work with feature engineering was successful for both classification and regression. In particular, we were able to achieve values for stellar density, stellar radius, and effective temperature with low error (∼2%–4%) and good accuracy (∼75%) for classifying the number of transits for a given star. The results show promise for improvement for both approaches upon using larger data sets with a larger minority class. This work has the potential to provide a foundation for future tools and techniques to aid in the analysis of astrophysical data.

  7. Modelling tick abundance using machine learning techniques and satellite imagery

    DEFF Research Database (Denmark)

    Kjær, Lene Jung; Korslund, L.; Kjelland, V.

    satellite images to run Boosted Regression Tree machine learning algorithms to predict overall distribution (presence/absence of ticks) and relative tick abundance of nymphs and larvae in southern Scandinavia. For nymphs, the predicted abundance had a positive correlation with observed abundance...... the predicted distribution of larvae was mostly even throughout Denmark, it was primarily around the coastlines in Norway and Sweden. Abundance was fairly low overall except in some fragmented patches corresponding to forested habitats in the region. Machine learning techniques allow us to predict for larger...... the collected ticks for pathogens and using the same machine learning techniques to develop prevalence maps of the ScandTick region....

  8. Machine Learning Method Applied in Readout System of Superheated Droplet Detector

    Science.gov (United States)

    Liu, Yi; Sullivan, Clair Julia; d'Errico, Francesco

    2017-07-01

    Direct readability is one advantage of superheated droplet detectors in neutron dosimetry. Utilizing such a distinct characteristic, an imaging readout system analyzes image of the detector for neutron dose readout. To improve the accuracy and precision of algorithms in the imaging readout system, machine learning algorithms were developed. Deep learning neural network and support vector machine algorithms are applied and compared with generally used Hough transform and curvature analysis methods. The machine learning methods showed a much higher accuracy and better precision in recognizing circular gas bubbles.

  9. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    Science.gov (United States)

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  10. A machine learning approach to the accurate prediction of monitor units for a compact proton machine.

    Science.gov (United States)

    Sun, Baozhou; Lam, Dao; Yang, Deshan; Grantham, Kevin; Zhang, Tiezhi; Mutic, Sasa; Zhao, Tianyu

    2018-05-01

    Clinical treatment planning systems for proton therapy currently do not calculate monitor units (MUs) in passive scatter proton therapy due to the complexity of the beam delivery systems. Physical phantom measurements are commonly employed to determine the field-specific output factors (OFs) but are often subject to limited machine time, measurement uncertainties and intensive labor. In this study, a machine learning-based approach was developed to predict output (cGy/MU) and derive MUs, incorporating the dependencies on gantry angle and field size for a single-room proton therapy system. The goal of this study was to develop a secondary check tool for OF measurements and eventually eliminate patient-specific OF measurements. The OFs of 1754 fields previously measured in a water phantom with calibrated ionization chambers and electrometers for patient-specific fields with various range and modulation width combinations for 23 options were included in this study. The training data sets for machine learning models in three different methods (Random Forest, XGBoost and Cubist) included 1431 (~81%) OFs. Ten-fold cross-validation was used to prevent "overfitting" and to validate each model. The remaining 323 (~19%) OFs were used to test the trained models. The difference between the measured and predicted values from machine learning models was analyzed. Model prediction accuracy was also compared with that of the semi-empirical model developed by Kooy (Phys. Med. Biol. 50, 2005). Additionally, gantry angle dependence of OFs was measured for three groups of options categorized on the selection of the second scatters. Field size dependence of OFs was investigated for the measurements with and without patient-specific apertures. All three machine learning methods showed higher accuracy than the semi-empirical model which shows considerably large discrepancy of up to 7.7% for the treatment fields with full range and full modulation width. The Cubist-based solution

  11. A comparison of machine learning and Bayesian modelling for molecular serotyping.

    Science.gov (United States)

    Newton, Richard; Wernisch, Lorenz

    2017-08-11

    Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological

  12. Machine learning in Python essential techniques for predictive analysis

    CERN Document Server

    Bowles, Michael

    2015-01-01

    Learn a simpler and more effective way to analyze data and predict outcomes with Python Machine Learning in Python shows you how to successfully analyze data using only two core machine learning algorithms, and how to apply them using Python. By focusing on two algorithm families that effectively predict outcomes, this book is able to provide full descriptions of the mechanisms at work, and the examples that illustrate the machinery with specific, hackable code. The algorithms are explained in simple terms with no complex math and applied using Python, with guidance on algorithm selection, d

  13. Research progress in machine learning methods for gene-gene interaction detection.

    Science.gov (United States)

    Peng, Zhe-Ye; Tang, Zi-Jun; Xie, Min-Zhu

    2018-03-20

    Complex diseases are results of gene-gene and gene-environment interactions. However, the detection of high-dimensional gene-gene interactions is computationally challenging. In the last two decades, machine-learning approaches have been developed to detect gene-gene interactions with some successes. In this review, we summarize the progress in research on machine learning methods, as applied to gene-gene interaction detection. It systematically examines the principles and limitations of the current machine learning methods used in genome wide association studies (GWAS) to detect gene-gene interactions, such as neural networks (NN), random forest (RF), support vector machines (SVM) and multifactor dimensionality reduction (MDR), and provides some insights on the future research directions in the field.

  14. Perspectives on Machine Learning for Classification of Schizotypy Using fMRI Data

    DEFF Research Database (Denmark)

    Madsen, Kristoffer Hougaard; Krohne, Laerke G; Cai, Xin-Lu

    2018-01-01

    Functional magnetic resonance imaging is capable of estimating functional activation and connectivity in the human brain, and lately there has been increased interest in the use of these functional modalities combined with machine learning for identification of psychiatric traits. While...... the use of machine learning schizotypy research. To this end, we describe common data processing steps while commenting on best practices and procedures. First, we introduce the important role of schizotypy to motivate the importance of reliable classification, and summarize existing machine learning....... We provide more detailed descriptions and software as supplementary material. Finally, we present current challenges in machine learning for classification of schizotypy and comment on future trends and perspectives....

  15. Multivariate Mapping of Environmental Data Using Extreme Learning Machines

    Science.gov (United States)

    Leuenberger, Michael; Kanevski, Mikhail

    2014-05-01

    In most real cases environmental data are multivariate, highly variable at several spatio-temporal scales, and are generated by nonlinear and complex phenomena. Mapping - spatial predictions of such data, is a challenging problem. Machine learning algorithms, being universal nonlinear tools, have demonstrated their efficiency in modelling of environmental spatial and space-time data (Kanevski et al. 2009). Recently, a new approach in machine learning - Extreme Learning Machine (ELM), has gained a great popularity. ELM is a fast and powerful approach being a part of the machine learning algorithm category. Developed by G.-B. Huang et al. (2006), it follows the structure of a multilayer perceptron (MLP) with one single-hidden layer feedforward neural networks (SLFNs). The learning step of classical artificial neural networks, like MLP, deals with the optimization of weights and biases by using gradient-based learning algorithm (e.g. back-propagation algorithm). Opposed to this optimization phase, which can fall into local minima, ELM generates randomly the weights between the input layer and the hidden layer and also the biases in the hidden layer. By this initialization, it optimizes just the weight vector between the hidden layer and the output layer in a single way. The main advantage of this algorithm is the speed of the learning step. In a theoretical context and by growing the number of hidden nodes, the algorithm can learn any set of training data with zero error. To avoid overfitting, cross-validation method or "true validation" (by randomly splitting data into training, validation and testing subsets) are recommended in order to find an optimal number of neurons. With its universal property and solid theoretical basis, ELM is a good machine learning algorithm which can push the field forward. The present research deals with an extension of ELM to multivariate output modelling and application of ELM to the real data case study - pollution of the sediments in

  16. Machine learning applied to the prediction of citrus production

    OpenAIRE

    Díaz Rodríguez, Susana Irene; Mazza, Silvia M.; Fernández-Combarro Álvarez, Elías; Giménez, Laura I.; Gaiad, José E.

    2017-01-01

    An in-depth knowledge about variables affecting production is required in order to predict global production and take decisions in agriculture. Machine learning is a technique used in agricultural planning and precision agriculture. This work (i) studies the effectiveness of machine learning techniques for predicting orchards production; and (ii) variables affecting this production were also identified. Data from 964 orchards of lemon, mandarin, and orange in Corrientes, Argentina are analyse...

  17. Identifying student stuck states in programmingassignments using machine learning

    OpenAIRE

    Lindell, Johan

    2014-01-01

    Intelligent tutors are becoming more popular with the increased use of computersand hand held devices in the education sphere. An area of research isinvestigating how machine learning can be used to improve the precision andfeedback of the tutor. This thesis compares machine learning clustering algorithmswith various distance functions in an attempt to cluster together codesnapshots of students solving a programming task. It investigates whethera general non-problem specific implementation of...

  18. Machine learning techniques applied to system characterization and equalization

    DEFF Research Database (Denmark)

    Zibar, Darko; Thrane, Jakob; Wass, Jesper

    2016-01-01

    Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals.......Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals....

  19. Machine learning application in the life time of materials

    OpenAIRE

    Yu, Xiaojiao

    2017-01-01

    Materials design and development typically takes several decades from the initial discovery to commercialization with the traditional trial and error development approach. With the accumulation of data from both experimental and computational results, data based machine learning becomes an emerging field in materials discovery, design and property prediction. This manuscript reviews the history of materials science as a disciplinary the most common machine learning method used in materials sc...

  20. ClearTK 2.0: Design Patterns for Machine Learning in UIMA.

    Science.gov (United States)

    Bethard, Steven; Ogren, Philip; Becker, Lee

    2014-05-01

    ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, readable pipeline descriptions, minimal collection readers, type system agnostic code, modules organized for ease of import, and assisting user comprehension of the complex UIMA framework.

  1. Predicting the concentration of residual methanol in industrial formalin using machine learning

    OpenAIRE

    Heidkamp, William

    2016-01-01

    In this thesis, a machine learning approach was used to develop a predictive model for residual methanol concentration in industrial formalin produced at the Akzo Nobel factory in Kristinehamn, Sweden. The MATLABTM computational environment supplemented with the Statistics and Machine LearningTM toolbox from the MathWorks were used to test various machine learning algorithms on the formalin production data from Akzo Nobel. As a result, the Gaussian Process Regression algorithm was found to pr...

  2. A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

    OpenAIRE

    Hayden Wimmer; Loreen Powell

    2014-01-01

    While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM), a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Baye...

  3. The use of machine learning and nonlinear statistical tools for ADME prediction.

    Science.gov (United States)

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.

  4. Machine learning techniques to examine large patient databases.

    Science.gov (United States)

    Meyfroidt, Geert; Güiza, Fabian; Ramon, Jan; Bruynooghe, Maurice

    2009-03-01

    Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential applications for these techniques in medicine is large, few medical doctors are familiar with their methodology, advantages and pitfalls. A general overview of machine learning techniques, with a more detailed discussion of some of these algorithms, is presented in this review.

  5. Advances in independent component analysis and learning machines

    CERN Document Server

    Bingham, Ella; Laaksonen, Jorma; Lampinen, Jouko

    2015-01-01

    In honour of Professor Erkki Oja, one of the pioneers of Independent Component Analysis (ICA), this book reviews key advances in the theory and application of ICA, as well as its influence on signal processing, pattern recognition, machine learning, and data mining. Examples of topics which have developed from the advances of ICA, which are covered in the book are: A unifying probabilistic model for PCA and ICA Optimization methods for matrix decompositions Insights into the FastICA algorithmUnsupervised deep learning Machine vision and image retrieval A review of developments in the t

  6. Quantum machine learning for quantum anomaly detection

    Science.gov (United States)

    Liu, Nana; Rebentrost, Patrick

    2018-04-01

    Anomaly detection is used for identifying data that deviate from "normal" data patterns. Its usage on classical data finds diverse applications in many important areas such as finance, fraud detection, medical diagnoses, data cleaning, and surveillance. With the advent of quantum technologies, anomaly detection of quantum data, in the form of quantum states, may become an important component of quantum applications. Machine-learning algorithms are playing pivotal roles in anomaly detection using classical data. Two widely used algorithms are the kernel principal component analysis and the one-class support vector machine. We find corresponding quantum algorithms to detect anomalies in quantum states. We show that these two quantum algorithms can be performed using resources that are logarithmic in the dimensionality of quantum states. For pure quantum states, these resources can also be logarithmic in the number of quantum states used for training the machine-learning algorithm. This makes these algorithms potentially applicable to big quantum data applications.

  7. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2015-01-01

    Techniques from the machine learning community are reviewed and employed for laser characterization, signal detection in the presence of nonlinear phase noise, and nonlinearity mitigation. Bayesian filtering and expectation maximization are employed within nonlinear state-space framework...

  8. Which Management Control System principles and aspects are relevant when deploying a learning machine?

    OpenAIRE

    Martin, Johansson; Mikael, Göthager

    2017-01-01

    How shall a business adapt its management control systems when learning machines enter the arena? Will the control system continue to focus on humans aspects and continue to consider a learning machine to be an automation tool as any other historically programmed computer? Learning machines introduces productivity capabilities that achieve very high levels of efficiency and quality. A learning machine can sort through large amounts of data and make conclusions difficult by a human mind. Howev...

  9. Promises, pitfalls, and basic guidelines for applying machine learning classifiers to psychiatric imaging data, with autism as an example

    Directory of Open Access Journals (Sweden)

    Pegah Kassraian Fard

    2016-12-01

    Full Text Available Most psychiatric disorders are associated with subtle alterations in brain function and are subject to large inter-individual differences. Typically the diagnosis of these disorders requires time-consuming behavioral assessments administered by a multi-disciplinary team with extensive experience. Whilst the application of machine learning classification methods (ML classifiers to neuroimaging data has the potential to speed and simplify diagnosis of psychiatric disorders, the methods, assumptions, and analytical steps are not currently opaque and accessible to researchers and clinicians outside the field. In this paper, we describe potential classification pipelines for Autism Spectrum Disorder, as an example of a psychiatric disorder. The analyses are based on resting-state fMRI data derived from a multi-site data repository (ABIDE. We compare several popular ML classifiers such as support vector machines, neural networks and regression approaches, among others. In a tutorial style, written to be equally accessible for researchers and clinicians, we explain the rationale of each classification approach, clarify the underlying assumptions, and discuss possible pitfalls and challenges. We also provide the data as well as the MATLAB code we used to achieve our results. We show that out-of-the-box ML classifiers can yield classification accuracies of about 60-70%. Finally, we discuss how classification accuracy can be further improved, and we mention methodological developments that are needed to pave the way for the use of ML classifiers in clinical practice.

  10. Overview of deep learning in medical imaging.

    Science.gov (United States)

    Suzuki, Kenji

    2017-09-01

    The use of machine learning (ML) has been increasing rapidly in the medical imaging field, including computer-aided diagnosis (CAD), radiomics, and medical image analysis. Recently, an ML area called deep learning emerged in the computer vision field and became very popular in many fields. It started from an event in late 2012, when a deep-learning approach based on a convolutional neural network (CNN) won an overwhelming victory in the best-known worldwide computer vision competition, ImageNet Classification. Since then, researchers in virtually all fields, including medical imaging, have started actively participating in the explosively growing field of deep learning. In this paper, the area of deep learning in medical imaging is overviewed, including (1) what was changed in machine learning before and after the introduction of deep learning, (2) what is the source of the power of deep learning, (3) two major deep-learning models: a massive-training artificial neural network (MTANN) and a convolutional neural network (CNN), (4) similarities and differences between the two models, and (5) their applications to medical imaging. This review shows that ML with feature input (or feature-based ML) was dominant before the introduction of deep learning, and that the major and essential difference between ML before and after deep learning is the learning of image data directly without object segmentation or feature extraction; thus, it is the source of the power of deep learning, although the depth of the model is an important attribute. The class of ML with image input (or image-based ML) including deep learning has a long history, but recently gained popularity due to the use of the new terminology, deep learning. There are two major models in this class of ML in medical imaging, MTANN and CNN, which have similarities as well as several differences. In our experience, MTANNs were substantially more efficient in their development, had a higher performance, and required a

  11. Predicting the dissolution kinetics of silicate glasses using machine learning

    Science.gov (United States)

    Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu

    2018-05-01

    Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.

  12. Different protein-protein interface patterns predicted by different machine learning methods.

    Science.gov (United States)

    Wang, Wei; Yang, Yongxiao; Yin, Jianxin; Gong, Xinqi

    2017-11-22

    Different types of protein-protein interactions make different protein-protein interface patterns. Different machine learning methods are suitable to deal with different types of data. Then, is it the same situation that different interface patterns are preferred for prediction by different machine learning methods? Here, four different machine learning methods were employed to predict protein-protein interface residue pairs on different interface patterns. The performances of the methods for different types of proteins are different, which suggest that different machine learning methods tend to predict different protein-protein interface patterns. We made use of ANOVA and variable selection to prove our result. Our proposed methods taking advantages of different single methods also got a good prediction result compared to single methods. In addition to the prediction of protein-protein interactions, this idea can be extended to other research areas such as protein structure prediction and design.

  13. Novel jet observables from machine learning

    Science.gov (United States)

    Datta, Kaustuv; Larkoski, Andrew J.

    2018-03-01

    Previous studies have demonstrated the utility and applicability of machine learning techniques to jet physics. In this paper, we construct new observables for the discrimination of jets from different originating particles exclusively from information identified by the machine. The approach we propose is to first organize information in the jet by resolved phase space and determine the effective N -body phase space at which discrimination power saturates. This then allows for the construction of a discrimination observable from the N -body phase space coordinates. A general form of this observable can be expressed with numerous parameters that are chosen so that the observable maximizes the signal vs. background likelihood. Here, we illustrate this technique applied to discrimination of H\\to b\\overline{b} decays from massive g\\to b\\overline{b} splittings. We show that for a simple parametrization, we can construct an observable that has discrimination power comparable to, or better than, widely-used observables motivated from theory considerations. For the case of jets on which modified mass-drop tagger grooming is applied, the observable that the machine learns is essentially the angle of the dominant gluon emission off of the b\\overline{b} pair.

  14. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  15. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2016-01-01

    Machine learning techniques relevant for nonlinearity mitigation, carrier recovery, and nanoscale device characterization are reviewed and employed. Markov Chain Monte Carlo in combination with Bayesian filtering is employed within the nonlinear state-space framework and demonstrated for parameter...

  16. Machine Learning Techniques for Optical Performance Monitoring from Directly Detected PDM-QAM Signals

    DEFF Research Database (Denmark)

    Thrane, Jakob; Wass, Jesper; Piels, Molly

    2017-01-01

    Linear signal processing algorithms are effective in dealing with linear transmission channel and linear signal detection, while the nonlinear signal processing algorithms, from the machine learning community, are effective in dealing with nonlinear transmission channel and nonlinear signal...... detection. In this paper, a brief overview of the various machine learning methods and their application in optical communication is presented and discussed. Moreover, supervised machine learning methods, such as neural networks and support vector machine, are experimentally demonstrated for in-band optical...

  17. Building machines that learn and think like people.

    Science.gov (United States)

    Lake, Brenden M; Ullman, Tomer D; Tenenbaum, Joshua B; Gershman, Samuel J

    2017-01-01

    Recent progress in artificial intelligence has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats that of humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

  18. Machine Shop I. Learning Activity Packets (LAPs). Section D--Power Saws and Drilling Machines.

    Science.gov (United States)

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This document contains two learning activity packets (LAPs) for the "power saws and drilling machines" instructional area of a Machine Shop I course. The two LAPs cover the following topics: power saws and drill press. Each LAP contains a cover sheet that describes its purpose, an introduction, and the tasks included in the LAP; learning…

  19. PMLB: a large benchmark suite for machine learning evaluation and comparison.

    Science.gov (United States)

    Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

    2017-01-01

    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

  20. Machine learning topological states

    Science.gov (United States)

    Deng, Dong-Ling; Li, Xiaopeng; Das Sarma, S.

    2017-11-01

    Artificial neural networks and machine learning have now reached a new era after several decades of improvement where applications are to explode in many fields of science, industry, and technology. Here, we use artificial neural networks to study an intriguing phenomenon in quantum physics—the topological phases of matter. We find that certain topological states, either symmetry-protected or with intrinsic topological order, can be represented with classical artificial neural networks. This is demonstrated by using three concrete spin systems, the one-dimensional (1D) symmetry-protected topological cluster state and the 2D and 3D toric code states with intrinsic topological orders. For all three cases, we show rigorously that the topological ground states can be represented by short-range neural networks in an exact and efficient fashion—the required number of hidden neurons is as small as the number of physical spins and the number of parameters scales only linearly with the system size. For the 2D toric-code model, we find that the proposed short-range neural networks can describe the excited states with Abelian anyons and their nontrivial mutual statistics as well. In addition, by using reinforcement learning we show that neural networks are capable of finding the topological ground states of nonintegrable Hamiltonians with strong interactions and studying their topological phase transitions. Our results demonstrate explicitly the exceptional power of neural networks in describing topological quantum states, and at the same time provide valuable guidance to machine learning of topological phases in generic lattice models.

  1. Conformal prediction for reliable machine learning theory, adaptations and applications

    CERN Document Server

    Balasubramanian, Vineeth; Vovk, Vladimir

    2014-01-01

    The conformal predictions framework is a recent development in machine learning that can associate a reliable measure of confidence with a prediction in any real-world pattern recognition application, including risk-sensitive applications such as medical diagnosis, face recognition, and financial risk prediction. Conformal Predictions for Reliable Machine Learning: Theory, Adaptations and Applications captures the basic theory of the framework, demonstrates how to apply it to real-world problems, and presents several adaptations, including active learning, change detection, and anomaly detecti

  2. Splendidly blended: a machine learning set up for CDU control

    Science.gov (United States)

    Utzny, Clemens

    2017-06-01

    As the concepts of machine learning and artificial intelligence continue to grow in importance in the context of internet related applications it is still in its infancy when it comes to process control within the semiconductor industry. Especially the branch of mask manufacturing presents a challenge to the concepts of machine learning since the business process intrinsically induces pronounced product variability on the background of small plate numbers. In this paper we present the architectural set up of a machine learning algorithm which successfully deals with the demands and pitfalls of mask manufacturing. A detailed motivation of this basic set up followed by an analysis of its statistical properties is given. The machine learning set up for mask manufacturing involves two learning steps: an initial step which identifies and classifies the basic global CD patterns of a process. These results form the basis for the extraction of an optimized training set via balanced sampling. A second learning step uses this training set to obtain the local as well as global CD relationships induced by the manufacturing process. Using two production motivated examples we show how this approach is flexible and powerful enough to deal with the exacting demands of mask manufacturing. In one example we show how dedicated covariates can be used in conjunction with increased spatial resolution of the CD map model in order to deal with pathological CD effects at the mask boundary. The other example shows how the model set up enables strategies for dealing tool specific CD signature differences. In this case the balanced sampling enables a process control scheme which allows usage of the full tool park within the specified tight tolerance budget. Overall, this paper shows that the current rapid developments off the machine learning algorithms can be successfully used within the context of semiconductor manufacturing.

  3. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

    Science.gov (United States)

    Zeng, Xueqiang; Luo, Gang

    2017-12-01

    Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

  4. What can machine learning do for antimicrobial peptides, and what can antimicrobial peptides do for machine learning?

    Science.gov (United States)

    Lee, Ernest Y; Lee, Michelle W; Fulan, Benjamin M; Ferguson, Andrew L; Wong, Gerard C L

    2017-12-06

    Antimicrobial peptides (AMPs) are a diverse class of well-studied membrane-permeating peptides with important functions in innate host defense. In this short review, we provide a historical overview of AMPs, summarize previous applications of machine learning to AMPs, and discuss the results of our studies in the context of the latest AMP literature. Much work has been recently done in leveraging computational tools to design new AMP candidates with high therapeutic efficacies for drug-resistant infections. We show that machine learning on AMPs can be used to identify essential physico-chemical determinants of AMP functionality, and identify and design peptide sequences to generate membrane curvature. In a broader scope, we discuss the implications of our findings for the discovery of membrane-active peptides in general, and uncovering membrane activity in new and existing peptide taxonomies.

  5. Machine learning in laboratory medicine: waiting for the flood?

    Science.gov (United States)

    Cabitza, Federico; Banfi, Giuseppe

    2018-03-28

    This review focuses on machine learning and on how methods and models combining data analytics and artificial intelligence have been applied to laboratory medicine so far. Although still in its infancy, the potential for applying machine learning to laboratory data for both diagnostic and prognostic purposes deserves more attention by the readership of this journal, as well as by physician-scientists who will want to take advantage of this new computer-based support in pathology and laboratory medicine.

  6. MACHINE LEARNING TECHNIQUES APPLIED TO LIGNOCELLULOSIC ETHANOL IN SIMULTANEOUS HYDROLYSIS AND FERMENTATION

    Directory of Open Access Journals (Sweden)

    J. Fischer

    Full Text Available Abstract This paper investigates the use of machine learning (ML techniques to study the effect of different process conditions on ethanol production from lignocellulosic sugarcane bagasse biomass using S. cerevisiae in a simultaneous hydrolysis and fermentation (SHF process. The effects of temperature, enzyme concentration, biomass load, inoculum size and time were investigated using artificial neural networks, a C5.0 classification tree and random forest algorithms. The optimization of ethanol production was also evaluated. The results clearly depict that ML techniques can be used to evaluate the SHF (R2 between actual and model predictions higher than 0.90, absolute average deviation lower than 8.1% and RMSE lower than 0.80 and predict optimized conditions which are in close agreement with those found experimentally. Optimal conditions were found to be a temperature of 35 ºC, an SHF time of 36 h, enzymatic load of 99.8%, inoculum size of 29.5 g/L and bagasse concentration of 24.9%. The ethanol concentration and volumetric productivity for these conditions were 12.1 g/L and 0.336 g/L.h, respectively.

  7. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    Science.gov (United States)

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  8. Physics-informed machine learning for inorganic scintillator discovery

    Science.gov (United States)

    Pilania, G.; McClellan, K. J.; Stanek, C. R.; Uberuaga, B. P.

    2018-06-01

    Applications of inorganic scintillators—activated with lanthanide dopants, such as Ce and Eu—are found in diverse fields. As a strict requirement to exhibit scintillation, the 4f ground state (with the electronic configuration of [Xe]4fn 5d0) and 5d1 lowest excited state (with the electronic configuration of [Xe]4fn-1 5d1) levels induced by the activator must lie within the host bandgap. Here we introduce a new machine learning (ML) based search strategy for high-throughput chemical space explorations to discover and design novel inorganic scintillators. Building upon well-known physics-based chemical trends for the host dependent electron binding energies within the 4f and 5d1 energy levels of lanthanide ions and available experimental data, the developed ML model—coupled with knowledge of the vacuum referred valence and conduction band edges computed from first principles—can rapidly and reliably estimate the relative positions of the activator's energy levels relative to the valence and conduction band edges of any given host chemistry. Using perovskite oxides and elpasolite halides as examples, the presented approach has been demonstrated to be able to (i) capture systematic chemical trends across host chemistries and (ii) effectively screen promising compounds in a high-throughput manner. While a number of other application-specific performance requirements need to be considered for a viable scintillator, the scheme developed here can be a practically useful tool to systematically down-select the most promising candidate materials in a first line of screening for a subsequent in-depth investigation.

  9. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    Science.gov (United States)

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

    Directory of Open Access Journals (Sweden)

    Rita Melo

    2016-07-01

    Full Text Available Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM, for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.

  11. Visible Machine Learning for Biomedicine.

    Science.gov (United States)

    Yu, Michael K; Ma, Jianzhu; Fisher, Jasmin; Kreisberg, Jason F; Raphael, Benjamin J; Ideker, Trey

    2018-06-14

    A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology. Copyright © 2018. Published by Elsevier Inc.

  12. Outsmarting neural networks: an alternative paradigm for machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  13. Machine learning derived risk prediction of anorexia nervosa.

    Science.gov (United States)

    Guo, Yiran; Wei, Zhi; Keating, Brendan J; Hakonarson, Hakon

    2016-01-20

    Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children's Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC's of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting.

  14. Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning.

    Science.gov (United States)

    van Ginneken, Bram

    2017-03-01

    Half a century ago, the term "computer-aided diagnosis" (CAD) was introduced in the scientific literature. Pulmonary imaging, with chest radiography and computed tomography, has always been one of the focus areas in this field. In this study, I describe how machine learning became the dominant technology for tackling CAD in the lungs, generally producing better results than do classical rule-based approaches, and how the field is now rapidly changing: in the last few years, we have seen how even better results can be obtained with deep learning. The key differences among rule-based processing, machine learning, and deep learning are summarized and illustrated for various applications of CAD in the chest.

  15. Machine Learning in Production Systems Design Using Genetic Algorithms

    OpenAIRE

    Abu Qudeiri Jaber; Yamamoto Hidehiko Rizauddin Ramli

    2008-01-01

    To create a solution for a specific problem in machine learning, the solution is constructed from the data or by use a search method. Genetic algorithms are a model of machine learning that can be used to find nearest optimal solution. While the great advantage of genetic algorithms is the fact that they find a solution through evolution, this is also the biggest disadvantage. Evolution is inductive, in nature life does not evolve towards a good solution but it evolves aw...

  16. Trip Travel Time Forecasting Based on Selective Forgetting Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Zhiming Gui

    2014-01-01

    Full Text Available Travel time estimation on road networks is a valuable traffic metric. In this paper, we propose a machine learning based method for trip travel time estimation in road networks. The method uses the historical trip information extracted from taxis trace data as the training data. An optimized online sequential extreme machine, selective forgetting extreme learning machine, is adopted to make the prediction. Its selective forgetting learning ability enables the prediction algorithm to adapt to trip conditions changes well. Experimental results using real-life taxis trace data show that the forecasting model provides an effective and practical way for the travel time forecasting.

  17. Introduction to machine learning: k-nearest neighbors.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-06-01

    Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance.

  18. CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

    2017-07-14

    We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human and machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.

  19. Machine learning in autistic spectrum disorder behavioral research: A review and ways forward.

    Science.gov (United States)

    Thabtah, Fadi

    2018-02-13

    Autistic Spectrum Disorder (ASD) is a mental disorder that retards acquisition of linguistic, communication, cognitive, and social skills and abilities. Despite being diagnosed with ASD, some individuals exhibit outstanding scholastic, non-academic, and artistic capabilities, in such cases posing a challenging task for scientists to provide answers. In the last few years, ASD has been investigated by social and computational intelligence scientists utilizing advanced technologies such as machine learning to improve diagnostic timing, precision, and quality. Machine learning is a multidisciplinary research topic that employs intelligent techniques to discover useful concealed patterns, which are utilized in prediction to improve decision making. Machine learning techniques such as support vector machines, decision trees, logistic regressions, and others, have been applied to datasets related to autism in order to construct predictive models. These models claim to enhance the ability of clinicians to provide robust diagnoses and prognoses of ASD. However, studies concerning the use of machine learning in ASD diagnosis and treatment suffer from conceptual, implementation, and data issues such as the way diagnostic codes are used, the type of feature selection employed, the evaluation measures chosen, and class imbalances in data among others. A more serious claim in recent studies is the development of a new method for ASD diagnoses based on machine learning. This article critically analyses these recent investigative studies on autism, not only articulating the aforementioned issues in these studies but also recommending paths forward that enhance machine learning use in ASD with respect to conceptualization, implementation, and data. Future studies concerning machine learning in autism research are greatly benefitted by such proposals.

  20. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography.

    Science.gov (United States)

    Narula, Sukrit; Shameer, Khader; Salem Omar, Alaa Mabrouk; Dudley, Joel T; Sengupta, Partho P

    2016-11-29

    Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p 13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience. Copyright © 2016 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  1. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    Science.gov (United States)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  2. Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

    Directory of Open Access Journals (Sweden)

    Fisnik Dalipi

    2016-01-01

    Full Text Available We present our data-driven supervised machine-learning (ML model to predict heat load for buildings in a district heating system (DHS. Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR, Partial Least Square (PLS, and random forest (RF. We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE, mean absolute percentage error (MAPE, and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

  3. A survey on Barrett's esophagus analysis using machine learning.

    Science.gov (United States)

    de Souza, Luis A; Palm, Christoph; Mendel, Robert; Hook, Christian; Ebigbo, Alanna; Probst, Andreas; Messmann, Helmut; Weber, Silke; Papa, João P

    2018-05-01

    This work presents a systematic review concerning recent studies and technologies of machine learning for Barrett's esophagus (BE) diagnosis and treatment. The use of artificial intelligence is a brand new and promising way to evaluate such disease. We compile some works published at some well-established databases, such as Science Direct, IEEEXplore, PubMed, Plos One, Multidisciplinary Digital Publishing Institute (MDPI), Association for Computing Machinery (ACM), Springer, and Hindawi Publishing Corporation. Each selected work has been analyzed to present its objective, methodology, and results. The BE progression to dysplasia or adenocarcinoma shows a complex pattern to be detected during endoscopic surveillance. Therefore, it is valuable to assist its diagnosis and automatic identification using computer analysis. The evaluation of the BE dysplasia can be performed through manual or automated segmentation through machine learning techniques. Finally, in this survey, we reviewed recent studies focused on the automatic detection of the neoplastic region for classification purposes using machine learning methods. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    Directory of Open Access Journals (Sweden)

    Saerom Park

    Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  5. Machine learning applied to crime prediction

    OpenAIRE

    Vaquero Barnadas, Miquel

    2016-01-01

    Machine Learning is a cornerstone when it comes to artificial intelligence and big data analysis. It provides powerful algorithms that are capable of recognizing patterns, classifying data, and, basically, learn by themselves to perform a specific task. This field has incredibly grown in popularity these days, however, it still remains unknown for the majority of people, and even for most professionals. This project intends to provide an understandable explanation of what is it, what types ar...

  6. Evaluation of Machine Learning Methods for LHC Optics Measurements and Corrections Software

    CERN Document Server

    AUTHOR|(CDS)2206853; Henning, Peter

    The field of artificial intelligence is driven by the goal to provide machines with human-like intelligence. However modern science is currently facing problems with high complexity that cannot be solved by humans in the same timescale as by machines. Therefore there is a demand on automation of complex tasks. To identify the category of tasks which can be performed by machines in the domain of optics measurements and correction on the Large Hadron Collider (LHC) is one of the central research subjects of this thesis. The application of machine learning methods and concepts of artificial intelligence can be found in various industry and scientific branches. In High Energy Physics these concepts are mostly used in offline analysis of experiments data and to perform regression tasks. In Accelerator Physics the machine learning approach has not found a wide application yet. Therefore potential tasks for machine learning solutions can be specified in this domain. The appropriate methods and their suitability for...

  7. Learning Activity Packets for Grinding Machines. Unit I--Grinding Machines.

    Science.gov (United States)

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the first unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  8. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.

    Science.gov (United States)

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi; Salleh, Abdul Hakim Mohamed

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.

  9. A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology

    Directory of Open Access Journals (Sweden)

    Ching Lee Koo

    2013-01-01

    Full Text Available Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs, support vector machine (SVM, and random forests (RFs in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.

  10. In vitro molecular machine learning algorithm via symmetric internal loops of DNA.

    Science.gov (United States)

    Lee, Ji-Hoon; Lee, Seung Hwan; Baek, Christina; Chun, Hyosun; Ryu, Je-Hwan; Kim, Jin-Woo; Deaton, Russell; Zhang, Byoung-Tak

    2017-08-01

    Programmable biomolecules, such as DNA strands, deoxyribozymes, and restriction enzymes, have been used to solve computational problems, construct large-scale logic circuits, and program simple molecular games. Although studies have shown the potential of molecular computing, the capability of computational learning with DNA molecules, i.e., molecular machine learning, has yet to be experimentally verified. Here, we present a novel molecular learning in vitro model in which symmetric internal loops of double-stranded DNA are exploited to measure the differences between training instances, thus enabling the molecules to learn from small errors. The model was evaluated on a data set of twenty dialogue sentences obtained from the television shows Friends and Prison Break. The wet DNA-computing experiments confirmed that the molecular learning machine was able to generalize the dialogue patterns of each show and successfully identify the show from which the sentences originated. The molecular machine learning model described here opens the way for solving machine learning problems in computer science and biology using in vitro molecular computing with the data encoded in DNA molecules. Copyright © 2017. Published by Elsevier B.V.

  11. The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening.

    Science.gov (United States)

    Kurczab, Rafał; Bojarski, Andrzej J

    2017-01-01

    The machine learning-based virtual screening of molecular databases is a commonly used approach to identify hits. However, many aspects associated with training predictive models can influence the final performance and, consequently, the number of hits found. Thus, we performed a systematic study of the simultaneous influence of the proportion of negatives to positives in the testing set, the size of screening databases and the type of molecular representations on the effectiveness of classification. The results obtained for eight protein targets, five machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest), two types of molecular fingerprints (MACCS and CDK FP) and eight screening databases with different numbers of molecules confirmed our previous findings that increases in the ratio of negative to positive training instances greatly influenced most of the investigated parameters of the ML methods in simulated virtual screening experiments. However, the performance of screening was shown to also be highly dependent on the molecular library dimension. Generally, with the increasing size of the screened database, the optimal training ratio also increased, and this ratio can be rationalized using the proposed cost-effectiveness threshold approach. To increase the performance of machine learning-based virtual screening, the training set should be constructed in a way that considers the size of the screening database.

  12. Machine learning of the reactor core loading pattern critical parameters

    International Nuclear Information System (INIS)

    Trontl, K.; Pevec, D.; Smuc, T.

    2007-01-01

    The usual approach to loading pattern optimization involves high degree of engineering judgment, a set of heuristic rules, an optimization algorithm and a computer code used for evaluating proposed loading patterns. The speed of the optimization process is highly dependent on the computer code used for the evaluation. In this paper we investigate the applicability of a machine learning model which could be used for fast loading pattern evaluation. We employed a recently introduced machine learning technique, Support Vector Regression (SVR), which has a strong theoretical background in statistical learning theory. Superior empirical performance of the method has been reported on difficult regression problems in different fields of science and technology. SVR is a data driven, kernel based, nonlinear modelling paradigm, in which model parameters are automatically determined by solving a quadratic optimization problem. The main objective of the work reported in this paper was to evaluate the possibility of applying SVR method for reactor core loading pattern modelling. The starting set of experimental data for training and testing of the machine learning algorithm was obtained using a two-dimensional diffusion theory reactor physics computer code. We illustrate the performance of the solution and discuss its applicability, i.e., complexity, speed and accuracy, with a projection to a more realistic scenario involving machine learning from the results of more accurate and time consuming three-dimensional core modelling code. (author)

  13. Improved Extreme Learning Machine and Its Application in Image Quality Assessment

    OpenAIRE

    Mao, Li; Zhang, Lidong; Liu, Xingyang; Li, Chaofeng; Yang, Hong

    2014-01-01

    Extreme learning machine (ELM) is a new class of single-hidden layer feedforward neural network (SLFN), which is simple in theory and fast in implementation. Zong et al. propose a weighted extreme learning machine for learning data with imbalanced class distribution, which maintains the advantages from original ELM. However, the current reported ELM and its improved version are only based on the empirical risk minimization principle, which may suffer from overfitting. To solve the overfitting...

  14. Sample-Based Extreme Learning Machine with Missing Data

    Directory of Open Access Journals (Sweden)

    Hang Gao

    2015-01-01

    Full Text Available Extreme learning machine (ELM has been extensively studied in machine learning community during the last few decades due to its high efficiency and the unification of classification, regression, and so forth. Though bearing such merits, existing ELM algorithms cannot efficiently handle the issue of missing data, which is relatively common in practical applications. The problem of missing data is commonly handled by imputation (i.e., replacing missing values with substituted values according to available information. However, imputation methods are not always effective. In this paper, we propose a sample-based learning framework to address this issue. Based on this framework, we develop two sample-based ELM algorithms for classification and regression, respectively. Comprehensive experiments have been conducted in synthetic data sets, UCI benchmark data sets, and a real world fingerprint image data set. As indicated, without introducing extra computational complexity, the proposed algorithms do more accurate and stable learning than other state-of-the-art ones, especially in the case of higher missing ratio.

  15. Advances in machine learning and data mining for astronomy

    CERN Document Server

    Way, Michael J

    2012-01-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health

  16. Water quality of Danube Delta systems: ecological status and prediction using machine-learning algorithms.

    Science.gov (United States)

    Stoica, C; Camejo, J; Banciu, A; Nita-Lazar, M; Paun, I; Cristofor, S; Pacheco, O R; Guevara, M

    2016-01-01

    Environmental issues have a worldwide impact on water bodies, including the Danube Delta, the largest European wetland. The Water Framework Directive (2000/60/EC) implementation operates toward solving environmental issues from European and national level. As a consequence, the water quality and the biocenosis structure was altered, especially the composition of the macro invertebrate community which is closely related to habitat and substrate heterogeneity. This study aims to assess the ecological status of Southern Branch of the Danube Delta, Saint Gheorghe, using benthic fauna and a computational method as an alternative for monitoring the water quality in real time. The analysis of spatial and temporal variability of unicriterial and multicriterial indices were used to assess the current status of aquatic systems. In addition, chemical status was characterized. Coliform bacteria and several chemical parameters were used to feed machine-learning (ML) algorithms to simulate a real-time classification method. Overall, the assessment of the water bodies indicated a moderate ecological status based on the biological quality elements or a good ecological status based on chemical and ML algorithms criteria.

  17. Using Machine Learning for Land Suitability Classification

    African Journals Online (AJOL)

    User

    West African Journal of Applied Ecology, vol. ... evidence for the utility of machine learning methods in land suitability classification especially MCS methods. ... Artificial intelligence tools. ..... Numerical values of index for the various classes.

  18. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  19. Machine learning on geospatial big data

    CSIR Research Space (South Africa)

    Van Zyl, T

    2014-02-01

    Full Text Available When trying to understand the difference between machine learning and statistics, it is important to note that it is not so much the set of techniques and theory that are used but more importantly the intended use of the results. In fact, many...

  20. Health Informatics via Machine Learning for the Clinical Management of Patients.

    Science.gov (United States)

    Clifton, D A; Niehaus, K E; Charlton, P; Colopy, G W

    2015-08-13

    To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale.

  1. Machine Learning in Computer-Aided Synthesis Planning.

    Science.gov (United States)

    Coley, Connor W; Green, William H; Jensen, Klavs F

    2018-05-15

    Computer-aided synthesis planning (CASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small molecule compounds. The ideal CASP program would take a molecular structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chemically feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two critical aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target molecule. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing number of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of experimental success

  2. Predictive ability of machine learning methods for massive crop yield prediction

    Directory of Open Access Journals (Sweden)

    Alberto Gonzalez-Sanchez

    2014-04-01

    Full Text Available An important issue for agricultural planning purposes is the accurate yield estimation for the numerous crops involved in the planning. Machine learning (ML is an essential approach for achieving practical and effective solutions for this problem. Many comparisons of ML methods for yield prediction have been made, seeking for the most accurate technique. Generally, the number of evaluated crops and techniques is too low and does not provide enough information for agricultural planning purposes. This paper compares the predictive accuracy of ML and linear regression techniques for crop yield prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural networks, support vector regression and k-nearest neighbor methods were ranked. Four accuracy metrics were used to validate the models: the root mean square error (RMS, root relative square error (RRSE, normalized mean absolute error (MAE, and correlation factor (R. Real data of an irrigation zone of Mexico were used for building the models. Models were tested with samples of two consecutive years. The results show that M5-Prime and k-nearest neighbor techniques obtain the lowest average RMSE errors (5.14 and 4.91, the lowest RRSE errors (79.46% and 79.78%, the lowest average MAE errors (18.12% and 19.42%, and the highest average correlation factors (0.41 and 0.42. Since M5-Prime achieves the largest number of crop yield models with the lowest errors, it is a very suitable tool for massive crop yield prediction in agricultural planning.

  3. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  4. Comparison of Machine Learning methods for incipient motion in gravel bed rivers

    Science.gov (United States)

    Valyrakis, Manousos

    2013-04-01

    Soil erosion and sediment transport of natural gravel bed streams are important processes which affect both the morphology as well as the ecology of earth's surface. For gravel bed rivers at near incipient flow conditions, particle entrainment dynamics are highly intermittent. This contribution reviews the use of modern Machine Learning (ML) methods implemented for short term prediction of entrainment instances of individual grains exposed in fully developed near boundary turbulent flows. Results obtained by network architectures of variable complexity based on two different ML methods namely the Artificial Neural Network (ANN) and the Adaptive Neuro-Fuzzy Inference System (ANFIS) are compared in terms of different error and performance indices, computational efficiency and complexity as well as predictive accuracy and forecast ability. Different model architectures are trained and tested with experimental time series obtained from mobile particle flume experiments. The experimental setup consists of a Laser Doppler Velocimeter (LDV) and a laser optics system, which acquire data for the instantaneous flow and particle response respectively, synchronously. The first is used to record the flow velocity components directly upstream of the test particle, while the later tracks the particle's displacements. The lengthy experimental data sets (millions of data points) are split into the training and validation subsets used to perform the corresponding learning and testing of the models. It is demonstrated that the ANFIS hybrid model, which is based on neural learning and fuzzy inference principles, better predicts the critical flow conditions above which sediment transport is initiated. In addition, it is illustrated that empirical knowledge can be extracted, validating the theoretical assumption that particle ejections occur due to energetic turbulent flow events. Such a tool may find application in management and regulation of stream flows downstream of dams for stream

  5. Making Individual Prognoses in Psychiatry Using Neuroimaging and Machine Learning.

    Science.gov (United States)

    Janssen, Ronald J; Mourão-Miranda, Janaina; Schnack, Hugo G

    2018-04-22

    Psychiatric prognosis is a difficult problem. Making a prognosis requires looking far into the future, as opposed to making a diagnosis, which is concerned with the current state. During the follow-up period, many factors will influence the course of the disease. Combined with the usually scarcer longitudinal data and the variability in the definition of outcomes/transition, this makes prognostic predictions a challenging endeavor. Employing neuroimaging data in this endeavor introduces the additional hurdle of high dimensionality. Machine-learning techniques are especially suited to tackle this challenging problem. This review starts with a brief introduction to machine learning in the context of its application to clinical neuroimaging data. We highlight a few issues that are especially relevant for prediction of outcome and transition using neuroimaging. We then review the literature that discusses the application of machine learning for this purpose. Critical examination of the studies and their results with respect to the relevant issues revealed the following: 1) there is growing evidence for the prognostic capability of machine-learning-based models using neuroimaging; and 2) reported accuracies may be too optimistic owing to small sample sizes and the lack of independent test samples. Finally, we discuss options to improve the reliability of (prognostic) prediction models. These include new methodologies and multimodal modeling. Paramount, however, is our conclusion that future work will need to provide properly (cross-)validated accuracy estimates of models trained on sufficiently large datasets. Nevertheless, with the technological advances enabling acquisition of large databases of patients and healthy subjects, machine learning represents a powerful tool in the search for psychiatric biomarkers. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  6. Potentials of Industrie 4.0 and Machine Learning for Mechanical Joining

    OpenAIRE

    Jäckel, Mathias

    2017-01-01

    -Sensitivity analysis of the influence of component properties and joining parameters on the joining result for self-pierce riveting -Possibilities to link mechanical joining technologies with the automotive process chain for quality and flexibility improvements -Potential of using machine learning to reduce automotive product development cycles in relation to mechanical joining -Datamining for machine learning at mechanical joining

  7. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  8. Proceedings of IEEE Machine Learning for Signal Processing Workshop XV

    DEFF Research Database (Denmark)

    Larsen, Jan

    These proceedings contains refereed papers presented at the Fifteenth IEEE Workshop on Machine Learning for Signal Processing (MLSP’2005), held in Mystic, Connecticut, USA, September 28-30, 2005. This is a continuation of the IEEE Workshops on Neural Networks for Signal Processing (NNSP) organized...... by the NNSP Technical Committee of the IEEE Signal Processing Society. The name of the Technical Committee, hence of the Workshop, was changed to Machine Learning for Signal Processing in September 2003 to better reflect the areas represented by the Technical Committee. The conference is organized...... by the Machine Learning for Signal Processing Technical Committee with sponsorship of the IEEE Signal Processing Society. Following the practice started two years ago, the bound volume of the proceedings is going to be published by IEEE following the Workshop, and we are pleased to offer to conference attendees...

  9. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  10. Applying machine learning techniques for forecasting flexibility of virtual power plants

    DEFF Research Database (Denmark)

    MacDougall, Pamela; Kosek, Anna Magdalena; Bindner, Henrik W.

    2016-01-01

    network as well as the multi-variant linear regression. It is found that it is possible to estimate the longevity of flexibility with machine learning. The linear regression algorithm is, on average, able to estimate the longevity with a 15% error. However, there was a significant improvement with the ANN...... approach to investigating the longevity of aggregated response of a virtual power plant using historic bidding and aggregated behaviour with machine learning techniques. The two supervised machine learning techniques investigated and compared in this paper are, multivariate linear regression and single...... algorithm achieving, on average, a 5.3% error. This is lowered 2.4% when learning for the same virtual power plant. With this information it would be possible to accurately offer residential VPP flexibility for market operations to safely avoid causing further imbalances and financial penalties....

  11. Machine learning and computer vision approaches for phenotypic profiling.

    Science.gov (United States)

    Grys, Ben T; Lo, Dara S; Sahin, Nil; Kraus, Oren Z; Morris, Quaid; Boone, Charles; Andrews, Brenda J

    2017-01-02

    With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. © 2017 Grys et al.

  12. Manifold learning in machine vision and robotics

    Science.gov (United States)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  13. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics

    OpenAIRE

    HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE

    2017-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better ...

  14. A Comparative Analysis of Machine Learning Techniques for Credit Scoring

    OpenAIRE

    Nwulu, Nnamdi; Oroja, Shola; İlkan, Mustafa

    2012-01-01

    Abstract Credit Scoring has become an oft researched topic in light of the increasing volatility of the global economy and the recent world financial crisis. Amidst the many methods used for credit scoring, machine learning techniques are becoming increasingly popular due to their efficient and accurate nature and relative simplicity. Furthermore machine learning techniques minimize the risk of human bias and error and maximize speed as they are able to perform computation...

  15. Evaluation of a Machine-Learning Algorithm for Treatment Planning in Prostate Low-Dose-Rate Brachytherapy

    Energy Technology Data Exchange (ETDEWEB)

    Nicolae, Alexandru [Department of Physics, Ryerson University, Toronto, Ontario (Canada); Department of Medical Physics, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Morton, Gerard; Chung, Hans; Loblaw, Andrew [Department of Radiation Oncology, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Jain, Suneil; Mitchell, Darren [Department of Clinical Oncology, The Northern Ireland Cancer Centre, Belfast City Hospital, Antrim, Northern Ireland (United Kingdom); Lu, Lin [Department of Radiation Therapy, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Helou, Joelle; Al-Hanaqta, Motasem [Department of Radiation Oncology, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Heath, Emily [Department of Physics, Carleton University, Ottawa, Ontario (Canada); Ravi, Ananth, E-mail: ananth.ravi@sunnybrook.ca [Department of Medical Physics, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada)

    2017-03-15

    Purpose: This work presents the application of a machine learning (ML) algorithm to automatically generate high-quality, prostate low-dose-rate (LDR) brachytherapy treatment plans. The ML algorithm can mimic characteristics of preoperative treatment plans deemed clinically acceptable by brachytherapists. The planning efficiency, dosimetry, and quality (as assessed by experts) of preoperative plans generated with an ML planning approach was retrospectively evaluated in this study. Methods and Materials: Preimplantation and postimplantation treatment plans were extracted from 100 high-quality LDR treatments and stored within a training database. The ML training algorithm matches similar features from a new LDR case to those within the training database to rapidly obtain an initial seed distribution; plans were then further fine-tuned using stochastic optimization. Preimplantation treatment plans generated by the ML algorithm were compared with brachytherapist (BT) treatment plans in terms of planning time (Wilcoxon rank sum, α = 0.05) and dosimetry (1-way analysis of variance, α = 0.05). Qualitative preimplantation plan quality was evaluated by expert LDR radiation oncologists using a Likert scale questionnaire. Results: The average planning time for the ML approach was 0.84 ± 0.57 minutes, compared with 17.88 ± 8.76 minutes for the expert planner (P=.020). Preimplantation plans were dosimetrically equivalent to the BT plans; the average prostate V150% was 4% lower for ML plans (P=.002), although the difference was not clinically significant. Respondents ranked the ML-generated plans as equivalent to expert BT treatment plans in terms of target coverage, normal tissue avoidance, implant confidence, and the need for plan modifications. Respondents had difficulty differentiating between plans generated by a human or those generated by the ML algorithm. Conclusions: Prostate LDR preimplantation treatment plans that have equivalent quality to plans created

  16. Evaluation of a Machine-Learning Algorithm for Treatment Planning in Prostate Low-Dose-Rate Brachytherapy

    International Nuclear Information System (INIS)

    Nicolae, Alexandru; Morton, Gerard; Chung, Hans; Loblaw, Andrew; Jain, Suneil; Mitchell, Darren; Lu, Lin; Helou, Joelle; Al-Hanaqta, Motasem; Heath, Emily; Ravi, Ananth

    2017-01-01

    Purpose: This work presents the application of a machine learning (ML) algorithm to automatically generate high-quality, prostate low-dose-rate (LDR) brachytherapy treatment plans. The ML algorithm can mimic characteristics of preoperative treatment plans deemed clinically acceptable by brachytherapists. The planning efficiency, dosimetry, and quality (as assessed by experts) of preoperative plans generated with an ML planning approach was retrospectively evaluated in this study. Methods and Materials: Preimplantation and postimplantation treatment plans were extracted from 100 high-quality LDR treatments and stored within a training database. The ML training algorithm matches similar features from a new LDR case to those within the training database to rapidly obtain an initial seed distribution; plans were then further fine-tuned using stochastic optimization. Preimplantation treatment plans generated by the ML algorithm were compared with brachytherapist (BT) treatment plans in terms of planning time (Wilcoxon rank sum, α = 0.05) and dosimetry (1-way analysis of variance, α = 0.05). Qualitative preimplantation plan quality was evaluated by expert LDR radiation oncologists using a Likert scale questionnaire. Results: The average planning time for the ML approach was 0.84 ± 0.57 minutes, compared with 17.88 ± 8.76 minutes for the expert planner (P=.020). Preimplantation plans were dosimetrically equivalent to the BT plans; the average prostate V150% was 4% lower for ML plans (P=.002), although the difference was not clinically significant. Respondents ranked the ML-generated plans as equivalent to expert BT treatment plans in terms of target coverage, normal tissue avoidance, implant confidence, and the need for plan modifications. Respondents had difficulty differentiating between plans generated by a human or those generated by the ML algorithm. Conclusions: Prostate LDR preimplantation treatment plans that have equivalent quality to plans created

  17. Prototype-based models in machine learning.

    Science.gov (United States)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We discuss basic schemes of competitive vector quantization as well as the so-called neural gas approach and Kohonen's topology-preserving self-organizing map. Supervised learning in prototype systems is exemplified in terms of learning vector quantization. Most frequently, the familiar Euclidean distance serves as a dissimilarity measure. We present extensions of the framework to nonstandard measures and give an introduction to the use of adaptive distances in relevance learning. © 2016 Wiley Periodicals, Inc.

  18. TensorFlow: A system for large-scale machine learning

    OpenAIRE

    Abadi, Martín; Barham, Paul; Chen, Jianmin; Chen, Zhifeng; Davis, Andy; Dean, Jeffrey; Devin, Matthieu; Ghemawat, Sanjay; Irving, Geoffrey; Isard, Michael; Kudlur, Manjunath; Levenberg, Josh; Monga, Rajat; Moore, Sherry; Murray, Derek G.

    2016-01-01

    TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexib...

  19. Less is more: regularization perspectives on large scale machine learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Deep learning based techniques provide a possible solution at the expanse of theoretical guidance and, especially, of computational requirements. It is then a key challenge for large scale machine learning to devise approaches guaranteed to be accurate and yet computationally efficient. In this talk, we will consider a regularization perspectives on machine learning appealing to classical ideas in linear algebra and inverse problems to scale-up dramatically nonparametric methods such as kernel methods, often dismissed because of prohibitive costs. Our analysis derives optimal theoretical guarantees while providing experimental results at par or out-performing state of the art approaches.

  20. A Machine Learning Concept for DTN Routing

    Science.gov (United States)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  1. Intelligent sensor networks the integration of sensor networks, signal processing and machine learning

    CERN Document Server

    Hu, Fei

    2012-01-01

    Although governments worldwide have invested significantly in intelligent sensor network research and applications, few books cover intelligent sensor networks from a machine learning and signal processing perspective. Filling this void, Intelligent Sensor Networks: The Integration of Sensor Networks, Signal Processing and Machine Learning focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on the world-class research of award-winning authors, the book provides a firm grounding in the fundamentals of intelligent sensor networks, incl

  2. Machine learning-enabled discovery and design of membrane-active peptides.

    Science.gov (United States)

    Lee, Ernest Y; Wong, Gerard C L; Ferguson, Andrew L

    2017-07-08

    Antimicrobial peptides are a class of membrane-active peptides that form a critical component of innate host immunity and possess a diversity of sequence and structure. Machine learning approaches have been profitably employed to efficiently screen sequence space and guide experiment towards promising candidates with high putative activity. In this mini-review, we provide an introduction to antimicrobial peptides and summarize recent advances in machine learning-enabled antimicrobial peptide discovery and design with a focus on a recent work Lee et al. Proc. Natl. Acad. Sci. USA 2016;113(48):13588-13593. This study reports the development of a support vector machine classifier to aid in the design of membrane active peptides. We use this model to discover membrane activity as a multiplexed function in diverse peptide families and provide interpretable understanding of the physicochemical properties and mechanisms governing membrane activity. Experimental validation of the classifier reveals it to have learned membrane activity as a unifying signature of antimicrobial peptides with diverse modes of action. Some of the discriminating rules by which it performs classification are in line with existing "human learned" understanding, but it also unveils new previously unknown determinants and multidimensional couplings governing membrane activity. Integrating machine learning with targeted experimentation can guide both antimicrobial peptide discovery and design and new understanding of the properties and mechanisms underpinning their modes of action. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    Science.gov (United States)

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  4. Improved detection of chemical substances from colorimetric sensor data using probabilistic machine learning

    DEFF Research Database (Denmark)

    Mølgaard, Lasse Lohilahti; Buus, Ole Thomsen; Larsen, Jan

    2017-01-01

    We present a data-driven machine learning approach to detect drug- and explosives-precursors using colorimetric sensor technology for air-sampling. The sensing technology has been developed in the context of the CRIM-TRACK project. At present a fully- integrated portable prototype for air sampling...... of the highly multi-variate data produced from the colorimetric chip a number of machine learning techniques are employed to provide reliable classification of target analytes from confounders found in the air streams. We demonstrate that a data-driven machine learning method using dimensionality reduction...... in combination with a probabilistic classifier makes it possible to produce informative features and a high detection rate of analytes. Furthermore, the probabilistic machine learning approach provides a means of automatically identifying unreliable measurements that could produce false predictions...

  5. Machine-learning-assisted materials discovery using failed experiments

    Science.gov (United States)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted

  6. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality.

    Science.gov (United States)

    Braithwaite, Scott R; Giraud-Carrier, Christophe; West, Josh; Barnes, Michael D; Hanson, Carl Lee

    2016-05-16

    One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data.

  7. Document Classification Using Distributed Machine Learning

    OpenAIRE

    Aydin, Galip; Hallac, Ibrahim Riza

    2018-01-01

    In this paper, we investigate the performance and success rates of Na\\"ive Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.

  8. Survey of Machine Learning Methods for Database Security

    Science.gov (United States)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  9. Distribution Learning in Evolutionary Strategies and Restricted Boltzmann Machines

    DEFF Research Database (Denmark)

    Krause, Oswin

    The thesis is concerned with learning distributions in the two settings of Evolutionary Strategies (ESs) and Restricted Boltzmann Machines (RBMs). In both cases, the distributions are learned from samples, albeit with different goals. Evolutionary Strategies are concerned with finding an optimum ...

  10. Data Mining and Machine Learning in Astronomy

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  11. Machine learning of network metrics in ATLAS Distributed Data Management

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00218873; The ATLAS collaboration; Toler, Wesley; Vamosi, Ralf; Bogado Garcia, Joaquin Ignacio

    2017-01-01

    The increasing volume of physics data poses a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from one of our ongoing automation efforts that focuses on network metrics. First, we describe our machine learning framework built atop the ATLAS Analytics Platform. This framework can automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our m...

  12. Machine learning of network metrics in ATLAS Distributed Data Management

    Science.gov (United States)

    Lassnig, Mario; Toler, Wesley; Vamosi, Ralf; Bogado, Joaquin; ATLAS Collaboration

    2017-10-01

    The increasing volume of physics data poses a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from one of our ongoing automation efforts that focuses on network metrics. First, we describe our machine learning framework built atop the ATLAS Analytics Platform. This framework can automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for networkaware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  13. Optimizing Distributed Machine Learning for Large Scale EEG Data Set

    Directory of Open Access Journals (Sweden)

    M Bilal Shaikh

    2017-06-01

    Full Text Available Distributed Machine Learning (DML has gained its importance more than ever in this era of Big Data. There are a lot of challenges to scale machine learning techniques on distributed platforms. When it comes to scalability, improving the processor technology for high level computation of data is at its limit, however increasing machine nodes and distributing data along with computation looks as a viable solution. Different frameworks   and platforms are available to solve DML problems. These platforms provide automated random data distribution of datasets which miss the power of user defined intelligent data partitioning based on domain knowledge. We have conducted an empirical study which uses an EEG Data Set collected through P300 Speller component of an ERP (Event Related Potential which is widely used in BCI problems; it helps in translating the intention of subject w h i l e performing any cognitive task. EEG data contains noise due to waves generated by other activities in the brain which contaminates true P300Speller. Use of Machine Learning techniques could help in detecting errors made by P300 Speller. We are solving this classification problem by partitioning data into different chunks and preparing distributed models using Elastic CV Classifier. To present a case of optimizing distributed machine learning, we propose an intelligent user defined data partitioning approach that could impact on the accuracy of distributed machine learners on average. Our results show better average AUC as compared to average AUC obtained after applying random data partitioning which gives no control to user over data partitioning. It improves the average accuracy of distributed learner due to the domain specific intelligent partitioning by the user. Our customized approach achieves 0.66 AUC on individual sessions and 0.75 AUC on mixed sessions, whereas random / uncontrolled data distribution records 0.63 AUC.

  14. A machine learning approach using EEG data to predict response to SSRI treatment for major depressive disorder.

    Science.gov (United States)

    Khodayari-Rostamabad, Ahmad; Reilly, James P; Hasey, Gary M; de Bruin, Hubert; Maccrimmon, Duncan J

    2013-10-01

    The problem of identifying, in advance, the most effective treatment agent for various psychiatric conditions remains an elusive goal. To address this challenge, we investigate the performance of the proposed machine learning (ML) methodology (based on the pre-treatment electroencephalogram (EEG)) for prediction of response to treatment with a selective serotonin reuptake inhibitor (SSRI) medication in subjects suffering from major depressive disorder (MDD). A relatively small number of most discriminating features are selected from a large group of candidate features extracted from the subject's pre-treatment EEG, using a machine learning procedure for feature selection. The selected features are fed into a classifier, which was realized as a mixture of factor analysis (MFA) model, whose output is the predicted response in the form of a likelihood value. This likelihood indicates the extent to which the subject belongs to the responder vs. non-responder classes. The overall method was evaluated using a "leave-n-out" randomized permutation cross-validation procedure. A list of discriminating EEG biomarkers (features) was found. The specificity of the proposed method is 80.9% while sensitivity is 94.9%, for an overall prediction accuracy of 87.9%. There is a 98.76% confidence that the estimated prediction rate is within the interval [75%, 100%]. These results indicate that the proposed ML method holds considerable promise in predicting the efficacy of SSRI antidepressant therapy for MDD, based on a simple and cost-effective pre-treatment EEG. The proposed approach offers the potential to improve the treatment of major depression and to reduce health care costs. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  15. Machine Learning for ATLAS DDM Network Metrics

    CERN Document Server

    Lassnig, Mario; The ATLAS collaboration; Vamosi, Ralf

    2016-01-01

    The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from our ongoing automation efforts. First, we describe our framework for distributed data management and network metrics, automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  16. Machine learning approach for single molecule localisation microscopy.

    Science.gov (United States)

    Colabrese, Silvia; Castello, Marco; Vicidomini, Giuseppe; Del Bue, Alessio

    2018-04-01

    Single molecule localisation (SML) microscopy is a fundamental tool for biological discoveries; it provides sub-diffraction spatial resolution images by detecting and localizing "all" the fluorescent molecules labeling the structure of interest. For this reason, the effective resolution of SML microscopy strictly depends on the algorithm used to detect and localize the single molecules from the series of microscopy frames. To adapt to the different imaging conditions that can occur in a SML experiment, all current localisation algorithms request, from the microscopy users, the choice of different parameters. This choice is not always easy and their wrong selection can lead to poor performance. Here we overcome this weakness with the use of machine learning. We propose a parameter-free pipeline for SML learning based on support vector machine (SVM). This strategy requires a short supervised training that consists in selecting by the user few fluorescent molecules (∼ 10-20) from the frames under analysis. The algorithm has been extensively tested on both synthetic and real acquisitions. Results are qualitatively and quantitatively consistent with the state of the art in SML microscopy and demonstrate that the introduction of machine learning can lead to a new class of algorithms competitive and conceived from the user point of view.

  17. Development of a machine learning potential for graphene

    Science.gov (United States)

    Rowe, Patrick; Csányi, Gábor; Alfè, Dario; Michaelides, Angelos

    2018-02-01

    We present an accurate interatomic potential for graphene, constructed using the Gaussian approximation potential (GAP) machine learning methodology. This GAP model obtains a faithful representation of a density functional theory (DFT) potential energy surface, facilitating highly accurate (approaching the accuracy of ab initio methods) molecular dynamics simulations. This is achieved at a computational cost which is orders of magnitude lower than that of comparable calculations which directly invoke electronic structure methods. We evaluate the accuracy of our machine learning model alongside that of a number of popular empirical and bond-order potentials, using both experimental and ab initio data as references. We find that whilst significant discrepancies exist between the empirical interatomic potentials and the reference data—and amongst the empirical potentials themselves—the machine learning model introduced here provides exemplary performance in all of the tested areas. The calculated properties include: graphene phonon dispersion curves at 0 K (which we predict with sub-meV accuracy), phonon spectra at finite temperature, in-plane thermal expansion up to 2500 K as compared to NPT ab initio molecular dynamics simulations and a comparison of the thermally induced dispersion of graphene Raman bands to experimental observations. We have made our potential freely available online at [http://www.libatoms.org].

  18. Automatic microseismic event picking via unsupervised machine learning

    Science.gov (United States)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  19. MACHINE LEARNING FOR THE SELF-ORGANIZATION OF DISTRIBUTED SYSTEMS IN ECONOMIC APPLICATIONS

    OpenAIRE

    Jerzy Balicki; Waldemar Korłub

    2017-01-01

    In this paper, an application of machine learning to the problem of self-organization of distributed systems has been discussed with regard to economic applications, with particular emphasis on supervised neural network learning to predict stock investments and some ratings of companies. In addition, genetic programming can play an important role in the preparation and testing of several financial information systems. For this reason, machine learning applications have been discussed because ...

  20. Multiple-Machine Scheduling with Learning Effects and Cooperative Games

    Directory of Open Access Journals (Sweden)

    Yiyuan Zhou

    2015-01-01

    Full Text Available Multiple-machine scheduling problems with position-based learning effects are studied in this paper. There is an initial schedule in this scheduling problem. The optimal schedule minimizes the sum of the weighted completion times; the difference between the initial total weighted completion time and the minimal total weighted completion time is the cost savings. A multiple-machine sequencing game is introduced to allocate the cost savings. The game is balanced if the normal processing times of jobs that are on the same machine are equal and an equal number of jobs are scheduled on each machine initially.