WorldWideScience

Sample records for machine learning methods

  1. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  2. Tracking by Machine Learning Methods

    CERN Document Server

    Jofrehei, Arash

    2015-01-01

    Current track reconstructing methods start with two points and then for each layer loop through all possible hits to find proper hits to add to that track. Another idea would be to use this large number of already reconstructed events and/or simulated data and train a machine on this data to find tracks given hit pixels. Training time could be long but real time tracking is really fast Simulation might not be as realistic as real data but tacking has been done for that with 100 percent efficiency while by using real data we would probably be limited to current efficiency.

  3. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  4. Machine Learning Methods for Production Cases Analysis

    Science.gov (United States)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  5. Application of machine learning methods in bioinformatics

    Science.gov (United States)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  6. Machine learning methods for metabolic pathway prediction

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2010-01-01

    Full Text Available Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

  7. Machine learning methods for metabolic pathway prediction

    Science.gov (United States)

    2010-01-01

    Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. PMID:20064214

  8. Ensemble Machine Learning Methods and Applications

    CERN Document Server

    Ma, Yunqian

    2012-01-01

    It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed “ensemble learning” by researchers in computational intelligence and machine learning, it is known to improve a decision system’s robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as “boosting” and “random forest” facilitate solutions to key computational issues such as face detection and are now being applied in areas as diverse as object trackingand bioinformatics.   Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs. At once a solid theoretical study and a practical guide, the volume is a windfall for r...

  9. Machine Learning Methods to Predict Diabetes Complications.

    Science.gov (United States)

    Dagliati, Arianna; Marini, Simone; Sacchi, Lucia; Cogni, Giulia; Teliti, Marsida; Tibollo, Valentina; De Cata, Pasquale; Chiovato, Luca; Bellazzi, Riccardo

    2018-03-01

    One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical practice.

  10. Studying depression using imaging and machine learning methods

    Directory of Open Access Journals (Sweden)

    Meenal J. Patel

    2016-01-01

    Full Text Available Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1 presents a background on depression, imaging, and machine learning methodologies; (2 reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3 suggests directions for future depression-related studies.

  11. Studying depression using imaging and machine learning methods.

    Science.gov (United States)

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  12. Finding protein sites using machine learning methods

    Directory of Open Access Journals (Sweden)

    Jaime Leonardo Bobadilla Molina

    2003-07-01

    Full Text Available The increasing amount of protein three-dimensional (3D structures determined by x-ray and NMR technologies as well as structures predicted by computational methods results in the need for automated methods to provide inital annotations. We have developed a new method for recognizing sites in three-dimensional protein structures. Our method is based on a previosly reported algorithm for creating descriptions of protein microenviroments using physical and chemical properties at multiple levels of detail. The recognition method takes three inputs: 1. A set of control nonsites that share some structural or functional role. 2. A set of control nonsites that lack this role. 3. A single query site. A support vector machine classifier is built using feature vectors where each component represents a property in a given volume. Validation against an independent test set shows that this recognition approach has high sensitivity and specificity. We also describe the results of scanning four calcium binding proteins (with the calcium removed using a three dimensional grid of probe points at 1.25 angstrom spacing. The system finds the sites in the proteins giving points at or near the blinding sites. Our results show that property based descriptions along with support vector machines can be used for recognizing protein sites in unannotated structures.

  13. In silico machine learning methods in drug development.

    Science.gov (United States)

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases.

  14. Machine learning methods without tears: a primer for ecologists.

    Science.gov (United States)

    Olden, Julian D; Lawler, Joshua J; Poff, N LeRoy

    2008-06-01

    Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.

  15. Unsupervised process monitoring and fault diagnosis with machine learning methods

    CERN Document Server

    Aldrich, Chris

    2013-01-01

    This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data

  16. Deep learning versus traditional machine learning methods for aggregated energy demand prediction

    NARCIS (Netherlands)

    Paterakis, N.G.; Mocanu, E.; Gibescu, M.; Stappers, B.; van Alst, W.

    2018-01-01

    In this paper the more advanced, in comparison with traditional machine learning approaches, deep learning methods are explored with the purpose of accurately predicting the aggregated energy consumption. Despite the fact that a wide range of machine learning methods have been applied to

  17. Kernel Methods for Machine Learning with Life Science Applications

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie

    Kernel methods refer to a family of widely used nonlinear algorithms for machine learning tasks like classification, regression, and feature extraction. By exploiting the so-called kernel trick straightforward extensions of classical linear algorithms are enabled as long as the data only appear a...

  18. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    Directory of Open Access Journals (Sweden)

    Vladimir S. Kublanov

    2017-01-01

    Full Text Available The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components.

  19. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  20. Studying depression using imaging and machine learning methods

    OpenAIRE

    Patel, Meenal J.; Khalaf, Alexander; Aizenstein, Howard J.

    2015-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presen...

  1. BEBP: An Poisoning Method Against Machine Learning Based IDSs

    OpenAIRE

    Li, Pan; Liu, Qiang; Zhao, Wentao; Wang, Dongxu; Wang, Siqi

    2018-01-01

    In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). However, practical IDSs generally update their decision module by feeding new data then retraining learning models in a periodical way. Hence, some attacks that comprise the data for training or testing classifiers significantly challenge the detecting capability of machine learning-based IDSs. Poisoning attack, which is one of the most recognized security threats towards machine learning...

  2. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  3. Machine Learning.

    Science.gov (United States)

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  4. Machine Learning Methods for Attack Detection in the Smart Grid.

    Science.gov (United States)

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  5. Housing Value Forecasting Based on Machine Learning Methods

    OpenAIRE

    Mu, Jingyi; Wu, Fang; Zhang, Aihua

    2014-01-01

    In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing...

  6. Machine Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  7. Data Mining and Machine Learning Methods for Dementia Research.

    Science.gov (United States)

    Li, Rui

    2018-01-01

    Patient data in clinical research often includes large amounts of structured information, such as neuroimaging data, neuropsychological test results, and demographic variables. Given the various sources of information, we can develop computerized methods that can be a great help to clinicians to discover hidden patterns in the data. The computerized methods often employ data mining and machine learning algorithms, lending themselves as the computer-aided diagnosis (CAD) tool that assists clinicians in making diagnostic decisions. In this chapter, we review state-of-the-art methods used in dementia research, and briefly introduce some recently proposed algorithms subsequently.

  8. MACHINE LEARNING METHODS IN DIGITAL AGRICULTURE: ALGORITHMS AND CASES

    Directory of Open Access Journals (Sweden)

    Aleksandr Vasilyevich Koshkarov

    2018-05-01

    Full Text Available Ensuring food security is a major challenge in many countries. With a growing global population, the issues of improving the efficiency of agriculture have become most relevant. Farmers are looking for new ways to increase yields, and governments of different countries are developing new programs to support agriculture. This contributes to a more active implementation of digital technologies in agriculture, helping farmers to make better decisions, increase yields and take care of the environment. The central point is the collection and analysis of data. In the industry of agriculture, data can be collected from different sources and may contain useful patterns that identify potential problems or opportunities. Data should be analyzed using machine learning algorithms to extract useful insights. Such methods of precision farming allow the farmer to monitor individual parts of the field, optimize the consumption of water and chemicals, and identify problems quickly. Purpose: to make an overview of the machine learning algorithms used for data analysis in agriculture. Methodology: an overview of the relevant literature; a survey of farmers. Results: relevant algorithms of machine learning for the analysis of data in agriculture at various levels were identified: soil analysis (soil assessment, soil classification, soil fertility predictions, weather forecast (simulation of climate change, temperature and precipitation prediction, and analysis of vegetation (weed identification, vegetation classification, plant disease identification, crop forecasting. Practical implications: agriculture, crop production.

  9. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  10. Machine learning methods for clinical forms analysis in mental health.

    Science.gov (United States)

    Strauss, John; Peguero, Arturo Martinez; Hirst, Graeme

    2013-01-01

    In preparation for a clinical information system implementation, the Centre for Addiction and Mental Health (CAMH) Clinical Information Transformation project completed multiple preparation steps. An automated process was desired to supplement the onerous task of manual analysis of clinical forms. We used natural language processing (NLP) and machine learning (ML) methods for a series of 266 separate clinical forms. For the investigation, documents were represented by feature vectors. We used four ML algorithms for our examination of the forms: cluster analysis, k-nearest neigh-bours (kNN), decision trees and support vector machines (SVM). Parameters for each algorithm were optimized. SVM had the best performance with a precision of 64.6%. Though we did not find any method sufficiently accurate for practical use, to our knowledge this approach to forms has not been used previously in mental health.

  11. Kernel methods for interpretable machine learning of order parameters

    Science.gov (United States)

    Ponte, Pedro; Melko, Roger G.

    2017-11-01

    Machine learning is capable of discriminating phases of matter, and finding associated phase transitions, directly from large data sets of raw state configurations. In the context of condensed matter physics, most progress in the field of supervised learning has come from employing neural networks as classifiers. Although very powerful, such algorithms suffer from a lack of interpretability, which is usually desired in scientific applications in order to associate learned features with physical phenomena. In this paper, we explore support vector machines (SVMs), which are a class of supervised kernel methods that provide interpretable decision functions. We find that SVMs can learn the mathematical form of physical discriminators, such as order parameters and Hamiltonian constraints, for a set of two-dimensional spin models: the ferromagnetic Ising model, a conserved-order-parameter Ising model, and the Ising gauge theory. The ability of SVMs to provide interpretable classification highlights their potential for automating feature detection in both synthetic and experimental data sets for condensed matter and other many-body systems.

  12. Machine Learning-Empowered Biometric Methods for Biomedicine Applications

    Directory of Open Access Journals (Sweden)

    Qingxue Zhang

    2017-07-01

    Full Text Available Nowadays, pervasive computing technologies are paving a promising way for advanced smart health applications. However, a key impediment faced by wide deployment of these assistive smart devices, is the increasing privacy and security issue, such as how to protect access to sensitive patient data in the health record. Focusing on this challenge, biometrics are attracting intense attention in terms of effective user identification to enable confidential health applications. In this paper, we take special interest in two bio-potential-based biometric modalities, electrocardiogram (ECG and electroencephalogram (EEG, considering that they are both unique to individuals, and more reliable than token (identity card and knowledge-based (username/password methods. After extracting effective features in multiple domains from ECG/EEG signals, several advanced machine learning algorithms are introduced to perform the user identification task, including Neural Network, K-nearest Neighbor, Bagging, Random Forest and AdaBoost. Experimental results on two public ECG and EEG datasets show that ECG is a more robust biometric modality compared to EEG, leveraging a higher signal to noise ratio and also more distinguishable morphological patterns. Among different machine learning classifiers, the random forest greatly outperforms the others and owns an identification rate as high as 98%. This study is expected to demonstrate that properly selected biometric empowered by an effective machine learner owns a great potential, to enable confidential biomedicine applications in the era of smart digital health.

  13. Extremely Randomized Machine Learning Methods for Compound Activity Prediction

    Directory of Open Access Journals (Sweden)

    Wojciech M. Czarnecki

    2015-11-01

    Full Text Available Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called ‘extremely randomized methods’—Extreme Entropy Machine and Extremely Randomized Trees—for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their ‘non-extreme’ competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

  14. Housing Value Forecasting Based on Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Jingyi Mu

    2014-01-01

    Full Text Available In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM, least squares support vector machine (LSSVM, and partial least squares (PLS methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.

  15. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  16. Estimating building energy consumption using extreme learning machine method

    International Nuclear Information System (INIS)

    Naji, Sareh; Keivani, Afram; Shamshirband, Shahaboddin; Alengaram, U. Johnson; Jumaat, Mohd Zamin; Mansor, Zulkefli; Lee, Malrey

    2016-01-01

    The current energy requirements of buildings comprise a large percentage of the total energy consumed around the world. The demand of energy, as well as the construction materials used in buildings, are becoming increasingly problematic for the earth's sustainable future, and thus have led to alarming concern. The energy efficiency of buildings can be improved, and in order to do so, their operational energy usage should be estimated early in the design phase, so that buildings are as sustainable as possible. An early energy estimate can greatly help architects and engineers create sustainable structures. This study proposes a novel method to estimate building energy consumption based on the ELM (Extreme Learning Machine) method. This method is applied to building material thicknesses and their thermal insulation capability (K-value). For this purpose up to 180 simulations are carried out for different material thicknesses and insulation properties, using the EnergyPlus software application. The estimation and prediction obtained by the ELM model are compared with GP (genetic programming) and ANNs (artificial neural network) models for accuracy. The simulation results indicate that an improvement in predictive accuracy is achievable with the ELM approach in comparison with GP and ANN. - Highlights: • Buildings consume huge amounts of energy for operation. • Envelope materials and insulation influence building energy consumption. • Extreme learning machine is used to estimate energy usage of a sample building. • The key effective factors in this study are insulation thickness and K-value.

  17. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  18. Sparse Machine Learning Methods for Understanding Large Text Corpora

    Data.gov (United States)

    National Aeronautics and Space Administration — Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational...

  19. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' < or = 18 mag), with 4500 K < or = Teff < or = 7000 K, corresponding to those with the most reliable SSPP estimates, I find that the model predicts [Fe/H] values with a root-mean-squared-error (RMSE) of approx.0.27 dex. The RMSE from this machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  20. Comparisons of likelihood and machine learning methods of individual classification

    Science.gov (United States)

    Guinand, B.; Topchy, A.; Page, K.S.; Burnham-Curtis, M. K.; Punch, W.F.; Scribner, K.T.

    2002-01-01

    Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin (“assignment tests”). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high FST), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0–2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to “learn” and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks. In recent years, characterization of highly polymorphic molecular markers such as mini- and microsatellites and development of novel methods of analysis have enabled researchers to extend investigations of ecological and evolutionary processes below the population level to the level of

  1. Advanced methods in NDE using machine learning approaches

    Science.gov (United States)

    Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

    2018-04-01

    Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability

  2. Employing Machine-Learning Methods to Study Young Stellar Objects

    Science.gov (United States)

    Moore, Nicholas

    2018-01-01

    Vast amounts of data exist in the astronomical data archives, and yet a large number of sources remain unclassified. We developed a multi-wavelength pipeline to classify infrared sources. The pipeline uses supervised machine learning methods to classify objects into the appropriate categories. The program is fed data that is already classified to train it, and is then applied to unknown catalogues. The primary use for such a pipeline is the rapid classification and cataloging of data that would take a much longer time to classify otherwise. While our primary goal is to study young stellar objects (YSOs), the applications extend beyond the scope of this project. We present preliminary results from our analysis and discuss future applications.

  3. Predicting Solar Activity Using Machine-Learning Methods

    Science.gov (United States)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  4. Modeling Music Emotion Judgments Using Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Naresh N. Vempala

    2018-01-01

    Full Text Available Emotion judgments and five channels of physiological data were obtained from 60 participants listening to 60 music excerpts. Various machine learning (ML methods were used to model the emotion judgments inclusive of neural networks, linear regression, and random forests. Input for models of perceived emotion consisted of audio features extracted from the music recordings. Input for models of felt emotion consisted of physiological features extracted from the physiological recordings. Models were trained and interpreted with consideration of the classic debate in music emotion between cognitivists and emotivists. Our models supported a hybrid position wherein emotion judgments were influenced by a combination of perceived and felt emotions. In comparing the different ML approaches that were used for modeling, we conclude that neural networks were optimal, yielding models that were flexible as well as interpretable. Inspection of a committee machine, encompassing an ensemble of networks, revealed that arousal judgments were predominantly influenced by felt emotion, whereas valence judgments were predominantly influenced by perceived emotion.

  5. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.

    Science.gov (United States)

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi; Salleh, Abdul Hakim Mohamed

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.

  6. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  7. Introduction to machine learning

    OpenAIRE

    Baştanlar, Yalın; Özuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning app...

  8. Sensor Data Air Pollution Prediction by Machine Learning Methods

    Czech Academy of Sciences Publication Activity Database

    Vidnerová, Petra; Neruda, Roman

    submitted 25. 1. (2018) ISSN 1530-437X R&D Projects: GA ČR GA15-18108S Grant - others:GA MŠk(CZ) LM2015042 Institutional support: RVO:67985807 Keywords : machine learning * sensors * air pollution * deep neural networks * regularization networks Subject RIV: IN - Informatics, Computer Science Impact factor: 2.512, year: 2016

  9. Classification of carcinogenic and mutagenic properties using machine learning method

    DEFF Research Database (Denmark)

    Moorthy, N. S.Hari Narayana; Kumar, Surendra; Poongavanam, Vasanthanathan

    2017-01-01

    An accurate calculation of carcinogenicity of chemicals became a serious challenge for the health assessment authority around the globe because of not only increased cost for experiments but also various ethical issues exist using animal models. In this study, we provide machine learning...

  10. Statistical and Machine Learning forecasting methods: Concerns and ways forward.

    Science.gov (United States)

    Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.

  11. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    Science.gov (United States)

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  12. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  13. Machine-Learning Research

    OpenAIRE

    Dietterich, Thomas G.

    1997-01-01

    Machine-learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (1) the improvement of classification accuracy by learning ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

  14. Machine Learning Methods for Prediction of CDK-Inhibitors

    Science.gov (United States)

    Ramana, Jayashree; Gupta, Dinesh

    2010-01-01

    Progression through the cell cycle involves the coordinated activities of a suite of cyclin/cyclin-dependent kinase (CDK) complexes. The activities of the complexes are regulated by CDK inhibitors (CDKIs). Apart from its role as cell cycle regulators, CDKIs are involved in apoptosis, transcriptional regulation, cell fate determination, cell migration and cytoskeletal dynamics. As the complexes perform crucial and diverse functions, these are important drug targets for tumour and stem cell therapeutic interventions. However, CDKIs are represented by proteins with considerable sequence heterogeneity and may fail to be identified by simple similarity search methods. In this work we have evaluated and developed machine learning methods for identification of CDKIs. We used different compositional features and evolutionary information in the form of PSSMs, from CDKIs and non-CDKIs for generating SVM and ANN classifiers. In the first stage, both the ANN and SVM models were evaluated using Leave-One-Out Cross-Validation and in the second stage these were tested on independent data sets. The PSSM-based SVM model emerged as the best classifier in both the stages and is publicly available through a user-friendly web interface at http://bioinfo.icgeb.res.in/cdkipred. PMID:20967128

  15. Machine-learning methods in the classification of water bodies

    Directory of Open Access Journals (Sweden)

    Sołtysiak Marek

    2016-06-01

    Full Text Available Amphibian species have been considered as useful ecological indicators. They are used as indicators of environmental contamination, ecosystem health and habitat quality., Amphibian species are sensitive to changes in the aquatic environment and therefore, may form the basis for the classification of water bodies. Water bodies in which there are a large number of amphibian species are especially valuable even if they are located in urban areas. The automation of the classification process allows for a faster evaluation of the presence of amphibian species in the water bodies. Three machine-learning methods (artificial neural networks, decision trees and the k-nearest neighbours algorithm have been used to classify water bodies in Chorzów – one of 19 cities in the Upper Silesia Agglomeration. In this case, classification is a supervised data mining method consisting of several stages such as building the model, the testing phase and the prediction. Seven natural and anthropogenic features of water bodies (e.g. the type of water body, aquatic plants, the purpose of the water body (destination, position of the water body in relation to any possible buildings, condition of the water body, the degree of littering, the shore type and fishing activities have been taken into account in the classification. The data set used in this study involved information about 71 different water bodies and 9 amphibian species living in them. The results showed that the best average classification accuracy was obtained with the multilayer perceptron neural network.

  16. Recent Advances in Conotoxin Classification by Using Machine Learning Methods.

    Science.gov (United States)

    Dao, Fu-Ying; Yang, Hui; Su, Zhen-Dong; Yang, Wuritu; Wu, Yun; Hui, Ding; Chen, Wei; Tang, Hua; Lin, Hao

    2017-06-25

    Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.

  17. Improved Saturated Hydraulic Conductivity Pedotransfer Functions Using Machine Learning Methods

    Science.gov (United States)

    Araya, S. N.; Ghezzehei, T. A.

    2017-12-01

    Saturated hydraulic conductivity (Ks) is one of the fundamental hydraulic properties of soils. Its measurement, however, is cumbersome and instead pedotransfer functions (PTFs) are often used to estimate it. Despite a lot of progress over the years, generic PTFs that estimate hydraulic conductivity generally don't have a good performance. We develop significantly improved PTFs by applying state of the art machine learning techniques coupled with high-performance computing on a large database of over 20,000 soils—USKSAT and the Florida Soil Characterization databases. We compared the performance of four machine learning algorithms (k-nearest neighbors, gradient boosted model, support vector machine, and relevance vector machine) and evaluated the relative importance of several soil properties in explaining Ks. An attempt is also made to better account for soil structural properties; we evaluated the importance of variables derived from transformations of soil water retention characteristics and other soil properties. The gradient boosted models gave the best performance with root mean square errors less than 0.7 and mean errors in the order of 0.01 on a log scale of Ks [cm/h]. The effective particle size, D10, was found to be the single most important predictor. Other important predictors included percent clay, bulk density, organic carbon percent, coefficient of uniformity and values derived from water retention characteristics. Model performances were consistently better for Ks values greater than 10 cm/h. This study maximizes the extraction of information from a large database to develop generic machine learning based PTFs to estimate Ks. The study also evaluates the importance of various soil properties and their transformations in explaining Ks.

  18. Computerization of Hungarian reforestation manual with machine learning methods

    Science.gov (United States)

    Czimber, Kornél; Gálos, Borbála; Mátyás, Csaba; Bidló, András; Gribovszki, Zoltán

    2017-04-01

    Hungarian forests are highly sensitive to the changing climate, especially to the available precipitation amount. Over the past two decades several drought damages were observed for tree species which are in the lower xeric limit of their distribution. From year to year these affected forest stands become more difficult to reforest with the same native species because these are not able to adapt to the increasing probability of droughts. The climate related parameter set of the Hungarian forest stand database needs updates. Air humidity that was formerly used to define the forest climate zones is not measured anymore and its value based on climate model outputs is highly uncertain. The aim was to develop a novel computerized and objective method to describe the species-specific climate conditions that is essential for survival, growth and optimal production of the forest ecosystems. The method is expected to project the species spatial distribution until 2100 on the basis of regional climate model simulations. Until now, Hungarian forest managers have been using a carefully edited spreadsheet for reforestation purposes. Applying binding regulations this spreadsheet prescribes the stand-forming and admixed tree species and their expected growth rate for each forest site types. We are going to present a new machine learning based method to replace the former spreadsheet. We took into great consideration of various methods, such as maximum likelihood, Bayesian networks, Fuzzy logic. The method calculates distributions, setups classification, which can be validated and modified by experts if necessary. Projected climate change conditions makes necessary to include into this system an additional climate zone that does not exist in our region now, as well as new options for potential tree species. In addition to or instead of the existing ones, the influence of further limiting parameters (climatic extremes, soil water retention) are also investigated. Results will be

  19. Machine learning methods in predicting the student academic motivation

    Directory of Open Access Journals (Sweden)

    Ivana Đurđević Babić

    2017-01-01

    Full Text Available Academic motivation is closely related to academic performance. For educators, it is equally important to detect early students with a lack of academic motivation as it is to detect those with a high level of academic motivation. In endeavouring to develop a classification model for predicting student academic motivation based on their behaviour in learning management system (LMS courses, this paper intends to establish links between the predicted student academic motivation and their behaviour in the LMS course. Students from all years at the Faculty of Education in Osijek participated in this research. Three machine learning classifiers (neural networks, decision trees, and support vector machines were used. To establish whether a significant difference in the performance of models exists, a t-test of the difference in proportions was used. Although, all classifiers were successful, the neural network model was shown to be the most successful in detecting the student academic motivation based on their behaviour in LMS course.

  20. Survey of Machine Learning Methods for Database Security

    Science.gov (United States)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  1. Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method

    Directory of Open Access Journals (Sweden)

    Muneki Yasuda

    2018-04-01

    Full Text Available The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the most important models in Markov random fields. However, the inference and learning problems in the Boltzmann machine are NP-hard. The investigation of an effective learning algorithm for the Boltzmann machine is one of the most important challenges in the field of statistical machine learning. In this paper, we study Boltzmann machine learning based on the (first-order spatial Monte Carlo integration method, referred to as the 1-SMCI learning method, which was proposed in the author’s previous paper. In the first part of this paper, we compare the method with the maximum pseudo-likelihood estimation (MPLE method using a theoretical and a numerical approaches, and show the 1-SMCI learning method is more effective than the MPLE. In the latter part, we compare the 1-SMCI learning method with other effective methods, ratio matching and minimum probability flow, using a numerical experiment, and show the 1-SMCI learning method outperforms them.

  2. Different protein-protein interface patterns predicted by different machine learning methods.

    Science.gov (United States)

    Wang, Wei; Yang, Yongxiao; Yin, Jianxin; Gong, Xinqi

    2017-11-22

    Different types of protein-protein interactions make different protein-protein interface patterns. Different machine learning methods are suitable to deal with different types of data. Then, is it the same situation that different interface patterns are preferred for prediction by different machine learning methods? Here, four different machine learning methods were employed to predict protein-protein interface residue pairs on different interface patterns. The performances of the methods for different types of proteins are different, which suggest that different machine learning methods tend to predict different protein-protein interface patterns. We made use of ANOVA and variable selection to prove our result. Our proposed methods taking advantages of different single methods also got a good prediction result compared to single methods. In addition to the prediction of protein-protein interactions, this idea can be extended to other research areas such as protein structure prediction and design.

  3. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    Science.gov (United States)

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  4. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    Science.gov (United States)

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  5. Evaluation of Machine Learning Methods for LHC Optics Measurements and Corrections Software

    CERN Document Server

    AUTHOR|(CDS)2206853; Henning, Peter

    The field of artificial intelligence is driven by the goal to provide machines with human-like intelligence. However modern science is currently facing problems with high complexity that cannot be solved by humans in the same timescale as by machines. Therefore there is a demand on automation of complex tasks. To identify the category of tasks which can be performed by machines in the domain of optics measurements and correction on the Large Hadron Collider (LHC) is one of the central research subjects of this thesis. The application of machine learning methods and concepts of artificial intelligence can be found in various industry and scientific branches. In High Energy Physics these concepts are mostly used in offline analysis of experiments data and to perform regression tasks. In Accelerator Physics the machine learning approach has not found a wide application yet. Therefore potential tasks for machine learning solutions can be specified in this domain. The appropriate methods and their suitability for...

  6. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Science.gov (United States)

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    Science.gov (United States)

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  8. Building Customer Churn Prediction Models in Fitness Industry with Machine Learning Methods

    OpenAIRE

    Shan, Min

    2017-01-01

    With the rapid growth of digital systems, churn management has become a major focus within customer relationship management in many industries. Ample research has been conducted for churn prediction in different industries with various machine learning methods. This thesis aims to combine feature selection and supervised machine learning methods for defining models of churn prediction and apply them on fitness industry. Forward selection is chosen as feature selection methods. Support Vector ...

  9. Machine Learning Methods for Identifying Composition of Uranium Deposits in Kazakhstan

    Directory of Open Access Journals (Sweden)

    Kuchin Yan

    2017-12-01

    Full Text Available The paper explores geophysical methods of wells survey, as well as their role in the development of Kazakhstan’s uranium deposit mining efforts. An analysis of the existing methods for solving the problem of interpreting geophysical data using machine learning in petroleum geophysics is made. The requirements and possible applications of machine learning methods in regard to uranium deposits of Kazakhstan are formulated in the paper.

  10. Machine Learning Method Applied in Readout System of Superheated Droplet Detector

    Science.gov (United States)

    Liu, Yi; Sullivan, Clair Julia; d'Errico, Francesco

    2017-07-01

    Direct readability is one advantage of superheated droplet detectors in neutron dosimetry. Utilizing such a distinct characteristic, an imaging readout system analyzes image of the detector for neutron dose readout. To improve the accuracy and precision of algorithms in the imaging readout system, machine learning algorithms were developed. Deep learning neural network and support vector machine algorithms are applied and compared with generally used Hough transform and curvature analysis methods. The machine learning methods showed a much higher accuracy and better precision in recognizing circular gas bubbles.

  11. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides

    Directory of Open Access Journals (Sweden)

    Stanislawski Jerzy

    2013-01-01

    Full Text Available Abstract Background Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%. The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile to 0.5 CPU-hours (simplified 3D profile to seconds (machine learning. Conclusions We showed that the simplified profile generation method does not introduce an error with regard to the original method, while

  12. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.

    Science.gov (United States)

    Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd

    2013-01-17

    Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset

  13. Machine learning for medical ultrasound: status, methods, and future opportunities.

    Science.gov (United States)

    Brattain, Laura J; Telfer, Brian A; Dhyani, Manish; Grajo, Joseph R; Samir, Anthony E

    2018-04-01

    Ultrasound (US) imaging is the most commonly performed cross-sectional diagnostic imaging modality in the practice of medicine. It is low-cost, non-ionizing, portable, and capable of real-time image acquisition and display. US is a rapidly evolving technology with significant challenges and opportunities. Challenges include high inter- and intra-operator variability and limited image quality control. Tremendous opportunities have arisen in the last decade as a result of exponential growth in available computational power coupled with progressive miniaturization of US devices. As US devices become smaller, enhanced computational capability can contribute significantly to decreasing variability through advanced image processing. In this paper, we review leading machine learning (ML) approaches and research directions in US, with an emphasis on recent ML advances. We also present our outlook on future opportunities for ML techniques to further improve clinical workflow and US-based disease diagnosis and characterization.

  14. Quantum Machine Learning

    OpenAIRE

    Romero García, Cristian

    2017-01-01

    [EN] In a world in which accessible information grows exponentially, the selection of the appropriate information turns out to be an extremely relevant problem. In this context, the idea of Machine Learning (ML), a subfield of Artificial Intelligence, emerged to face problems in data mining, pattern recognition, automatic prediction, among others. Quantum Machine Learning is an interdisciplinary research area combining quantum mechanics with methods of ML, in which quantum properties allow fo...

  15. A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology

    Directory of Open Access Journals (Sweden)

    Ching Lee Koo

    2013-01-01

    Full Text Available Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs, support vector machine (SVM, and random forests (RFs in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.

  16. Floor-Fractured Craters through Machine Learning Methods

    Science.gov (United States)

    Thorey, C.

    2015-12-01

    Floor-fractured craters are impact craters that have undergone post impact deformations. They are characterized by shallow floors with a plate-like or convex appearance, wide floor moats, and radial, concentric, and polygonal floor-fractures. While the origin of these deformations has long been debated, it is now generally accepted that they are the result of the emplacement of shallow magmatic intrusions below their floor. These craters thus constitute an efficient tool to probe the importance of intrusive magmatism from the lunar surface. The most recent catalog of lunar-floor fractured craters references about 200 of them, mainly located around the lunar maria Herein, we will discuss the possibility of using machine learning algorithms to try to detect new floor-fractured craters on the Moon among the 60000 craters referenced in the most recent catalogs. In particular, we will use the gravity field provided by the Gravity Recovery and Interior Laboratory (GRAIL) mission, and the topographic dataset obtained from the Lunar Orbiter Laser Altimeter (LOLA) instrument to design a set of representative features for each crater. We will then discuss the possibility to design a binary supervised classifier, based on these features, to discriminate between the presence or absence of crater-centered intrusion below a specific crater. First predictions from different classifier in terms of their accuracy and uncertainty will be presented.

  17. How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach.

    Science.gov (United States)

    Ichikawa, Daisuke; Saito, Toki; Ujita, Waka; Oyama, Hiroshi

    2016-12-01

    Our purpose was to develop a new machine-learning approach (a virtual health check-up) toward identification of those at high risk of hyperuricemia. Applying the system to general health check-ups is expected to reduce medical costs compared with administering an additional test. Data were collected during annual health check-ups performed in Japan between 2011 and 2013 (inclusive). We prepared training and test datasets from the health check-up data to build prediction models; these were composed of 43,524 and 17,789 persons, respectively. Gradient-boosting decision tree (GBDT), random forest (RF), and logistic regression (LR) approaches were trained using the training dataset and were then used to predict hyperuricemia in the test dataset. Undersampling was applied to build the prediction models to deal with the imbalanced class dataset. The results showed that the RF and GBDT approaches afforded the best performances in terms of sensitivity and specificity, respectively. The area under the curve (AUC) values of the models, which reflected the total discriminative ability of the classification, were 0.796 [95% confidence interval (CI): 0.766-0.825] for the GBDT, 0.784 [95% CI: 0.752-0.815] for the RF, and 0.785 [95% CI: 0.752-0.819] for the LR approaches. No significant differences were observed between pairs of each approach. Small changes occurred in the AUCs after applying undersampling to build the models. We developed a virtual health check-up that predicted the development of hyperuricemia using machine-learning methods. The GBDT, RF, and LR methods had similar predictive capability. Undersampling did not remarkably improve predictive power. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Application of machine learning methods for traffic signs recognition

    Science.gov (United States)

    Filatov, D. V.; Ignatev, K. V.; Deviatkin, A. V.; Serykh, E. V.

    2018-02-01

    This paper focuses on solving a relevant and pressing safety issue on intercity roads. Two approaches were considered for solving the problem of traffic signs recognition; the approaches involved neural networks to analyze images obtained from a camera in the real-time mode. The first approach is based on a sequential image processing. At the initial stage, with the help of color filters and morphological operations (dilatation and erosion), the area containing the traffic sign is located on the image, then the selected and scaled fragment of the image is analyzed using a feedforward neural network to determine the meaning of the found traffic sign. Learning of the neural network in this approach is carried out using a backpropagation method. The second approach involves convolution neural networks at both stages, i.e. when searching and selecting the area of the image containing the traffic sign, and when determining its meaning. Learning of the neural network in the second approach is carried out using the intersection over union function and a loss function. For neural networks to learn and the proposed algorithms to be tested, a series of videos from a dash cam were used that were shot under various weather and illumination conditions. As a result, the proposed approaches for traffic signs recognition were analyzed and compared by key indicators such as recognition rate percentage and the complexity of neural networks’ learning process.

  19. Predicting Coronal Mass Ejections Using Machine Learning Methods

    Science.gov (United States)

    Bobra, M. G.; Ilonidis, S.

    2016-04-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections (CMEs). Usually, solar active regions that produce large flares will also produce a CME, but this is not always true. Despite advances in numerical modeling, it is still unclear which circumstances will produce a CME. Therefore, it is worthwhile to empirically determine which features distinguish flares associated with CMEs from flares that are not. At this time, no extensive study has used physically meaningful features of active regions to distinguish between these two populations. As such, we attempt to do so by using features derived from (1) photospheric vector magnetic field data taken by the Solar Dynamics Observatory’s Helioseismic and Magnetic Imager instrument and (2) X-ray flux data from the Geostationary Operational Environmental Satellite’s X-ray Flux instrument. We build a catalog of active regions that either produced both a flare and a CME (the positive class) or simply a flare (the negative class). We then use machine-learning algorithms to (1) determine which features distinguish these two populations, and (2) forecast whether an active region that produces an M- or X-class flare will also produce a CME. We compute the True Skill Statistic, a forecast verification metric, and find that it is a relatively high value of ∼0.8 ± 0.2. We conclude that a combination of six parameters, which are all intensive in nature, will capture most of the relevant information contained in the photospheric magnetic field.

  20. PREDICTING CORONAL MASS EJECTIONS USING MACHINE LEARNING METHODS

    Energy Technology Data Exchange (ETDEWEB)

    Bobra, M. G.; Ilonidis, S. [W.W. Hansen Experimental Physics Laboratory, Stanford University, Stanford, CA 94305 (United States)

    2016-04-20

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections (CMEs). Usually, solar active regions that produce large flares will also produce a CME, but this is not always true. Despite advances in numerical modeling, it is still unclear which circumstances will produce a CME. Therefore, it is worthwhile to empirically determine which features distinguish flares associated with CMEs from flares that are not. At this time, no extensive study has used physically meaningful features of active regions to distinguish between these two populations. As such, we attempt to do so by using features derived from (1) photospheric vector magnetic field data taken by the Solar Dynamics Observatory ’s Helioseismic and Magnetic Imager instrument and (2) X-ray flux data from the Geostationary Operational Environmental Satellite’s X-ray Flux instrument. We build a catalog of active regions that either produced both a flare and a CME (the positive class) or simply a flare (the negative class). We then use machine-learning algorithms to (1) determine which features distinguish these two populations, and (2) forecast whether an active region that produces an M- or X-class flare will also produce a CME. We compute the True Skill Statistic, a forecast verification metric, and find that it is a relatively high value of ∼0.8 ± 0.2. We conclude that a combination of six parameters, which are all intensive in nature, will capture most of the relevant information contained in the photospheric magnetic field.

  1. Research progress in machine learning methods for gene-gene interaction detection.

    Science.gov (United States)

    Peng, Zhe-Ye; Tang, Zi-Jun; Xie, Min-Zhu

    2018-03-20

    Complex diseases are results of gene-gene and gene-environment interactions. However, the detection of high-dimensional gene-gene interactions is computationally challenging. In the last two decades, machine-learning approaches have been developed to detect gene-gene interactions with some successes. In this review, we summarize the progress in research on machine learning methods, as applied to gene-gene interaction detection. It systematically examines the principles and limitations of the current machine learning methods used in genome wide association studies (GWAS) to detect gene-gene interactions, such as neural networks (NN), random forest (RF), support vector machines (SVM) and multifactor dimensionality reduction (MDR), and provides some insights on the future research directions in the field.

  2. Fault Diagnosis of Batch Reactor Using Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Sujatha Subramanian

    2014-01-01

    Full Text Available Fault diagnosis of a batch reactor gives the early detection of fault and minimizes the risk of thermal runaway. It provides superior performance and helps to improve safety and consistency. It has become more vital in this technical era. In this paper, support vector machine (SVM is used to estimate the heat release (Qr of the batch reactor both normal and faulty conditions. The signature of the residual, which is obtained from the difference between nominal and estimated faulty Qr values, characterizes the different natures of faults occurring in the batch reactor. Appropriate statistical and geometric features are extracted from the residual signature and the total numbers of features are reduced using SVM attribute selection filter and principle component analysis (PCA techniques. artificial neural network (ANN classifiers like multilayer perceptron (MLP, radial basis function (RBF, and Bayes net are used to classify the different types of faults from the reduced features. It is observed from the result of the comparative study that the proposed method for fault diagnosis with limited number of features extracted from only one estimated parameter (Qr shows that it is more efficient and fast for diagnosing the typical faults.

  3. Classification of older adults with/without a fall history using machine learning methods.

    Science.gov (United States)

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  4. Prediction of Student Dropout in E-Learning Program Through the Use of Machine Learning Method

    Directory of Open Access Journals (Sweden)

    Mingjie Tan

    2015-02-01

    Full Text Available The high rate of dropout is a serious problem in E-learning program. Thus it has received extensive concern from the education administrators and researchers. Predicting the potential dropout students is a workable solution to prevent dropout. Based on the analysis of related literature, this study selected student’s personal characteristic and academic performance as input attributions. Prediction models were developed using Artificial Neural Network (ANN, Decision Tree (DT and Bayesian Networks (BNs. A large sample of 62375 students was utilized in the procedures of model training and testing. The results of each model were presented in confusion matrix, and analyzed by calculating the rates of accuracy, precision, recall, and F-measure. The results suggested all of the three machine learning methods were effective in student dropout prediction, and DT presented a better performance. Finally, some suggestions were made for considerable future research.

  5. Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches

    DEFF Research Database (Denmark)

    Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna

    2013-01-01

    machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region......-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods...

  6. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    Directory of Open Access Journals (Sweden)

    Zekić-Sušac Marijana

    2014-09-01

    Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

  7. Machine Learning and Applied Linguistics

    OpenAIRE

    Vajjala, Sowmya

    2018-01-01

    This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.

  8. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  9. Machine learning and medical imaging

    CERN Document Server

    Shen, Dinggang; Sabuncu, Mert

    2016-01-01

    Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, a...

  10. Performance of machine learning methods for ligand-based virtual screening.

    Science.gov (United States)

    Plewczynski, Dariusz; Spieser, Stéphane A H; Koch, Uwe

    2009-05-01

    Computational screening of compound databases has become increasingly popular in pharmaceutical research. This review focuses on the evaluation of ligand-based virtual screening using active compounds as templates in the context of drug discovery. Ligand-based screening techniques are based on comparative molecular similarity analysis of compounds with known and unknown activity. We provide an overview of publications that have evaluated different machine learning methods, such as support vector machines, decision trees, ensemble methods such as boosting, bagging and random forests, clustering methods, neuronal networks, naïve Bayesian, data fusion methods and others.

  11. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  12. Machine learning methods for the classification of gliomas: Initial results using features extracted from MR spectroscopy.

    Science.gov (United States)

    Ranjith, G; Parvathy, R; Vikas, V; Chandrasekharan, Kesavadas; Nair, Suresh

    2015-04-01

    With the advent of new imaging modalities, radiologists are faced with handling increasing volumes of data for diagnosis and treatment planning. The use of automated and intelligent systems is becoming essential in such a scenario. Machine learning, a branch of artificial intelligence, is increasingly being used in medical image analysis applications such as image segmentation, registration and computer-aided diagnosis and detection. Histopathological analysis is currently the gold standard for classification of brain tumors. The use of machine learning algorithms along with extraction of relevant features from magnetic resonance imaging (MRI) holds promise of replacing conventional invasive methods of tumor classification. The aim of the study is to classify gliomas into benign and malignant types using MRI data. Retrospective data from 28 patients who were diagnosed with glioma were used for the analysis. WHO Grade II (low-grade astrocytoma) was classified as benign while Grade III (anaplastic astrocytoma) and Grade IV (glioblastoma multiforme) were classified as malignant. Features were extracted from MR spectroscopy. The classification was done using four machine learning algorithms: multilayer perceptrons, support vector machine, random forest and locally weighted learning. Three of the four machine learning algorithms gave an area under ROC curve in excess of 0.80. Random forest gave the best performance in terms of AUC (0.911) while sensitivity was best for locally weighted learning (86.1%). The performance of different machine learning algorithms in the classification of gliomas is promising. An even better performance may be expected by integrating features extracted from other MR sequences. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  13. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets.

    Science.gov (United States)

    Korotcov, Alexandru; Tkachenko, Valery; Russo, Daniel P; Ekins, Sean

    2017-12-04

    Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further

  14. Improved method for SNR prediction in machine-learning-based test

    NARCIS (Netherlands)

    Sheng, Xiaoqin; Kerkhoff, Hans G.

    2010-01-01

    This paper applies an improved method for testing the signal-to-noise ratio (SNR) of Analogue-to-Digital Converters (ADC). In previous work, a noisy and nonlinear pulse signal is exploited as the input stimulus to obtain the signature results of ADC. By applying a machine-learning-based approach,

  15. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    OpenAIRE

    Zekić-Sušac, Marijana; Pfeifer, Sanja; Šarlija, Nataša

    2014-01-01

    Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART ...

  16. Feasibility of Machine Learning Methods for Separating Wood and Leaf Points from Terrestrial Laser Scanning Data

    Science.gov (United States)

    Wang, D.; Hollaus, M.; Pfeifer, N.

    2017-09-01

    Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI) and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM), Na¨ıve Bayes (NB), Random Forest (RF), and Gaussian Mixture Model (GMM), for separating wood and leaf points from terrestrial laser scanning (TLS) data. Two trees, an Erytrophleum fordii and a Betula pendula (silver birch) are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.

  17. FEASIBILITY OF MACHINE LEARNING METHODS FOR SEPARATING WOOD AND LEAF POINTS FROM TERRESTRIAL LASER SCANNING DATA

    Directory of Open Access Journals (Sweden)

    D. Wang

    2017-09-01

    Full Text Available Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM, Na¨ıve Bayes (NB, Random Forest (RF, and Gaussian Mixture Model (GMM, for separating wood and leaf points from terrestrial laser scanning (TLS data. Two trees, an Erytrophleum fordii and a Betula pendula (silver birch are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.

  18. A Review of Current Machine Learning Methods Used for Cancer Recurrence Modeling and Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Hemphill, Geralyn M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type has become a necessity in cancer research. A major challenge in cancer management is the classification of patients into appropriate risk groups for better treatment and follow-up. Such risk assessment is critically important in order to optimize the patient’s health and the use of medical resources, as well as to avoid cancer recurrence. This paper focuses on the application of machine learning methods for predicting the likelihood of a recurrence of cancer. It is not meant to be an extensive review of the literature on the subject of machine learning techniques for cancer recurrence modeling. Other recent papers have performed such a review, and I will rely heavily on the results and outcomes from these papers. The electronic databases that were used for this review include PubMed, Google, and Google Scholar. Query terms used include “cancer recurrence modeling”, “cancer recurrence and machine learning”, “cancer recurrence modeling and machine learning”, and “machine learning for cancer recurrence and prediction”. The most recent and most applicable papers to the topic of this review have been included in the references. It also includes a list of modeling and classification methods to predict cancer recurrence.

  19. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  20. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    Science.gov (United States)

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  2. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

    Science.gov (United States)

    Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

    2018-02-01

    For a drug, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  3. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

    Directory of Open Access Journals (Sweden)

    Hongbin Yang

    2018-02-01

    Full Text Available During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  4. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts.

    Science.gov (United States)

    Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

    2018-01-01

    During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  5. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    Science.gov (United States)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  6. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Science.gov (United States)

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  7. Prediction of Human Drug Targets and Their Interactions Using Machine Learning Methods: Current and Future Perspectives.

    Science.gov (United States)

    Nath, Abhigyan; Kumari, Priyanka; Chaube, Radha

    2018-01-01

    Identification of drug targets and drug target interactions are important steps in the drug-discovery pipeline. Successful computational prediction methods can reduce the cost and time demanded by the experimental methods. Knowledge of putative drug targets and their interactions can be very useful for drug repurposing. Supervised machine learning methods have been very useful in drug target prediction and in prediction of drug target interactions. Here, we describe the details for developing prediction models using supervised learning techniques for human drug target prediction and their interactions.

  8. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  9. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles.

    Science.gov (United States)

    Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua

    2018-04-25

    Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.

  10. Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Zhiling Guo

    2016-03-01

    Full Text Available In this study, a method based on supervised machine learning is proposed to identify village buildings from open high-resolution remote sensing images. We select Google Earth (GE RGB images to perform the classification in order to examine its suitability for village mapping, and investigate the feasibility of using machine learning methods to provide automatic classification in such fields. By analyzing the characteristics of GE images, we design different features on the basis of two kinds of supervised machine learning methods for classification: adaptive boosting (AdaBoost and convolutional neural networks (CNN. To recognize village buildings via their color and texture information, the RGB color features and a large number of Haar-like features in a local window are utilized in the AdaBoost method; with multilayer trained networks based on gradient descent algorithms and back propagation, CNN perform the identification by mining deeper information from buildings and their neighborhood. Experimental results from the testing area at Savannakhet province in Laos show that our proposed AdaBoost method achieves an overall accuracy of 96.22% and the CNN method is also competitive with an overall accuracy of 96.30%.

  11. Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

    Science.gov (United States)

    Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

    2010-10-01

    Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.

  12. Integrating Heuristic and Machine-Learning Methods for Efficient Virtual Machine Allocation in Data Centers

    OpenAIRE

    Pahlevan, Ali; Qu, Xiaoyu; Zapater Sancho, Marina; Atienza Alonso, David

    2017-01-01

    Modern cloud data centers (DCs) need to tackle efficiently the increasing demand for computing resources and address the energy efficiency challenge. Therefore, it is essential to develop resource provisioning policies that are aware of virtual machine (VM) characteristics, such as CPU utilization and data communication, and applicable in dynamic scenarios. Traditional approaches fall short in terms of flexibility and applicability for large-scale DC scenarios. In this paper we propose a heur...

  13. Fast learning method for convolutional neural networks using extreme learning machine and its application to lane detection.

    Science.gov (United States)

    Kim, Jihun; Kim, Jonghong; Jang, Gil-Jin; Lee, Minho

    2017-03-01

    Deep learning has received significant attention recently as a promising solution to many problems in the area of artificial intelligence. Among several deep learning architectures, convolutional neural networks (CNNs) demonstrate superior performance when compared to other machine learning methods in the applications of object detection and recognition. We use a CNN for image enhancement and the detection of driving lanes on motorways. In general, the process of lane detection consists of edge extraction and line detection. A CNN can be used to enhance the input images before lane detection by excluding noise and obstacles that are irrelevant to the edge detection result. However, training conventional CNNs requires considerable computation and a big dataset. Therefore, we suggest a new learning algorithm for CNNs using an extreme learning machine (ELM). The ELM is a fast learning method used to calculate network weights between output and hidden layers in a single iteration and thus, can dramatically reduce learning time while producing accurate results with minimal training data. A conventional ELM can be applied to networks with a single hidden layer; as such, we propose a stacked ELM architecture in the CNN framework. Further, we modify the backpropagation algorithm to find the targets of hidden layers and effectively learn network weights while maintaining performance. Experimental results confirm that the proposed method is effective in reducing learning time and improving performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Machine learning-based methods for prediction of linear B-cell epitopes.

    Science.gov (United States)

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  15. Application of machine-learning methods to solid-state chemistry: ferromagnetism in transition metal alloys

    International Nuclear Information System (INIS)

    Landrum, G.A.Gregory A.; Genin, Hugh

    2003-01-01

    Machine-learning methods are a collection of techniques for building predictive models from experimental data. The algorithms are problem-independent: the chemistry and physics of the problem being studied are contained in the descriptors used to represent the known data. The application of a variety of machine-learning methods to the prediction of ferromagnetism in ordered and disordered transition metal alloys is presented. Applying a decision tree algorithm to build a predictive model for ordered phases results in a model that is 100% accurate. The same algorithm achieves 99% accuracy when trained on a data set containing both ordered and disordered phases. Details of the descriptor sets for both applications are also presented

  16. A Novel Application of Machine Learning Methods to Model Microcontroller Upset Due to Intentional Electromagnetic Interference

    Science.gov (United States)

    Bilalic, Rusmir

    A novel application of support vector machines (SVMs), artificial neural networks (ANNs), and Gaussian processes (GPs) for machine learning (GPML) to model microcontroller unit (MCU) upset due to intentional electromagnetic interference (IEMI) is presented. In this approach, an MCU performs a counting operation (0-7) while electromagnetic interference in the form of a radio frequency (RF) pulse is direct-injected into the MCU clock line. Injection times with respect to the clock signal are the clock low, clock rising edge, clock high, and the clock falling edge periods in the clock window during which the MCU is performing initialization and executing the counting procedure. The intent is to cause disruption in the counting operation and model the probability of effect (PoE) using machine learning tools. Five experiments were executed as part of this research, each of which contained a set of 38,300 training points and 38,300 test points, for a total of 383,000 total points with the following experiment variables: injection times with respect to the clock signal, injected RF power, injected RF pulse width, and injected RF frequency. For the 191,500 training points, the average training error was 12.47%, while for the 191,500 test points the average test error was 14.85%, meaning that on average, the machine was able to predict MCU upset with an 85.15% accuracy. Leaving out the results for the worst-performing model (SVM with a linear kernel), the test prediction accuracy for the remaining machines is almost 89%. All three machine learning methods (ANNs, SVMs, and GPML) showed excellent and consistent results in their ability to model and predict the PoE on an MCU due to IEMI. The GP approach performed best during training with a 7.43% average training error, while the ANN technique was most accurate during the test with a 10.80% error.

  17. Use of machine learning methods to classify Universities based on the income structure

    Science.gov (United States)

    Terlyga, Alexandra; Balk, Igor

    2017-10-01

    In this paper we discuss use of machine learning methods such as self organizing maps, k-means and Ward’s clustering to perform classification of universities based on their income. This classification will allow us to quantitate classification of universities as teaching, research, entrepreneur, etc. which is important tool for government, corporations and general public alike in setting expectation and selecting universities to achieve different goals.

  18. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting

    OpenAIRE

    Zhaoxuan Li; SM Mahbobur Rahman; Rolando Vega; Bing Dong

    2016-01-01

    We evaluate and compare two common methods, artificial neural networks (ANN) and support vector regression (SVR), for predicting energy productions from a solar photovoltaic (PV) system in Florida 15 min, 1 h and 24 h ahead of time. A hierarchical approach is proposed based on the machine learning algorithms tested. The production data used in this work corresponds to 15 min averaged power measurements collected from 2014. The accuracy of the model is determined using computing error statisti...

  19. Gamma/hadron segregation for a ground based imaging atmospheric Cherenkov telescope using machine learning methods: Random Forest leads

    International Nuclear Information System (INIS)

    Sharma Mradul; Koul Maharaj Krishna; Mitra Abhas; Nayak Jitadeepa; Bose Smarajit

    2014-01-01

    A detailed case study of γ-hadron segregation for a ground based atmospheric Cherenkov telescope is presented. We have evaluated and compared various supervised machine learning methods such as the Random Forest method, Artificial Neural Network, Linear Discriminant method, Naive Bayes Classifiers, Support Vector Machines as well as the conventional dynamic supercut method by simulating triggering events with the Monte Carlo method and applied the results to a Cherenkov telescope. It is demonstrated that the Random Forest method is the most sensitive machine learning method for γ-hadron segregation. (research papers)

  20. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  1. Machine Learning methods in fitting first-principles total energies for substitutionally disordered solid

    Science.gov (United States)

    Gao, Qin; Yao, Sanxi; Widom, Michael

    2015-03-01

    Density functional theory (DFT) provides an accurate and first-principles description of solid structures and total energies. However, it is highly time-consuming to calculate structures with hundreds of atoms in the unit cell and almost not possible to calculate thousands of atoms. We apply and adapt machine learning algorithms, including compressive sensing, support vector regression and artificial neural networks to fit the DFT total energies of substitutionally disordered boron carbide. The nonparametric kernel method is also included in our models. Our fitted total energy model reproduces the DFT energies with prediction error of around 1 meV/atom. The assumptions of these machine learning models and applications of the fitted total energies will also be discussed. Financial support from McWilliams Fellowship and the ONR-MURI under the Grant No. N00014-11-1-0678 is gratefully acknowledged.

  2. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Science.gov (United States)

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  3. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Directory of Open Access Journals (Sweden)

    Ickwon Choi

    2015-04-01

    Full Text Available The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release. We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  4. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  5. Improved machine learning method for analysis of gas phase chemistry of peptides

    Directory of Open Access Journals (Sweden)

    Ahn Natalie

    2008-12-01

    Full Text Available Abstract Background Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrimination is achieved with theoretical spectra that are based on simulating gas phase chemistry of the peptides, but the limited understanding of those processes affects the accuracy of predictions from theoretical spectra. Results We employed a robust data mining strategy using new feature annotation functions of MAE software, which revealed under-prediction of the frequency of occurrence in fragmentation of the second peptide bond. We applied methods of exploratory data analysis to pre-process the information in the MS/MS spectra, including data normalization and attribute selection, to reduce the attributes to a smaller, less correlated set for machine learning studies. We then compared our rule building machine learning program, DataSqueezer, with commonly used association rules and decision tree algorithms. All used machine learning algorithms produced similar results that were consistent with expected properties for a second gas phase mechanism at the second peptide bond. Conclusion The results provide compelling evidence that we have identified underlying chemical properties in the data that suggest the existence of an additional gas phase mechanism for the second peptide bond. Thus, the methods described in this study provide a valuable approach for analyses of this kind in the future.

  6. Android Used in The Learning Innovation Atwood Machines on Lagrange Mechanics Methods

    Directory of Open Access Journals (Sweden)

    Shabrina Shabrina

    2017-12-01

    Full Text Available Android is one of the smartphone operating system platforms that is now widely developed in learning media. Android allows the learning process to be more flexible and not oriented to be teacher center, but it allows to be student center. The Atwood machines is an experimental tool that is often used to observe mechanical laws in constantly accelerated motion which can also be described by the Lagrange mechanics methods. As an innovative and alternative learning activity, Atwood Android-based learning apps are running for two experimental variations, which are variations in load in cart and load masses that are hung. The experiment of load-carrier mass variation found that the larger load mass in the cart, the smaller the acceleration experienced by the system. Meanwhile, the experiment on the variation of the loaded mass found that the larger the loaded mass, the greater the acceleration experienced by the system.

  7. Machine Learning an algorithmic perspective

    CERN Document Server

    Marsland, Stephen

    2009-01-01

    Traditional books on machine learning can be divided into two groups - those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement le

  8. Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

    Science.gov (United States)

    Karim, Mohammad Ehsanul; Pang, Menglan; Platt, Robert W

    2018-03-01

    The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

  9. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  10. Machine learning and radiology.

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  11. Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records.

    Science.gov (United States)

    Forsyth, Alexander W; Barzilay, Regina; Hughes, Kevin S; Lui, Dickson; Lorenz, Karl A; Enzinger, Andrea; Tulsky, James A; Lindvall, Charlotta

    2018-02-27

    Clinicians document cancer patients' symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped. To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes. The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning. The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes. We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  12. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  13. Creativity in Machine Learning

    OpenAIRE

    Thoma, Martin

    2016-01-01

    Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summary of the current work and give people who are new to machine learning some starting points.

  14. rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning

    Directory of Open Access Journals (Sweden)

    Miron B. Kursa

    2014-11-01

    Full Text Available Random ferns is a very simple yet powerful classification method originally introduced for specific computer vision tasks. In this paper, I show that this algorithm may be considered as a constrained decision tree ensemble and use this interpretation to introduce a series of modifications which enable the use of random ferns in general machine learning problems. Moreover, I extend the method with an internal error approximation and an attribute importance measure based on corresponding features of the random forest algorithm. I also present the R package rFerns containing an efficient implementation of this modified version of random ferns.

  15. Identifying Structural Flow Defects in Disordered Solids Using Machine-Learning Methods

    Science.gov (United States)

    Cubuk, E. D.; Schoenholz, S. S.; Rieser, J. M.; Malone, B. D.; Rottler, J.; Durian, D. J.; Kaxiras, E.; Liu, A. J.

    2015-03-01

    We use machine-learning methods on local structure to identify flow defects—or particles susceptible to rearrangement—in jammed and glassy systems. We apply this method successfully to two very different systems: a two-dimensional experimental realization of a granular pillar under compression and a Lennard-Jones glass in both two and three dimensions above and below its glass transition temperature. We also identify characteristics of flow defects that differentiate them from the rest of the sample. Our results show it is possible to discern subtle structural features responsible for heterogeneous dynamics observed across a broad range of disordered materials.

  16. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances

    Directory of Open Access Journals (Sweden)

    Abut F

    2015-08-01

    Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance

  17. Model-based machine learning.

    Science.gov (United States)

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  18. Energy-efficient algorithm for classification of states of wireless sensor network using machine learning methods

    Science.gov (United States)

    Yuldashev, M. N.; Vlasov, A. I.; Novikov, A. N.

    2018-05-01

    This paper focuses on the development of an energy-efficient algorithm for classification of states of a wireless sensor network using machine learning methods. The proposed algorithm reduces energy consumption by: 1) elimination of monitoring of parameters that do not affect the state of the sensor network, 2) reduction of communication sessions over the network (the data are transmitted only if their values can affect the state of the sensor network). The studies of the proposed algorithm have shown that at classification accuracy close to 100%, the number of communication sessions can be reduced by 80%.

  19. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...

  20. MU-LOC: A Machine-Learning Method for Predicting Mitochondrially Localized Proteins in Plants

    DEFF Research Database (Denmark)

    Zhang, Ning; Rao, R Shyama Prasad; Salvato, Fernanda

    2018-01-01

    -sequence or a multitude of internal signals. Compared with experimental approaches, computational predictions provide an efficient way to infer subcellular localization of a protein. However, it is still challenging to predict plant mitochondrially localized proteins accurately due to various limitations. Consequently......, the performance of current tools can be improved with new data and new machine-learning methods. We present MU-LOC, a novel computational approach for large-scale prediction of plant mitochondrial proteins. We collected a comprehensive dataset of plant subcellular localization, extracted features including amino...

  1. Machine learning methods for credibility assessment of interviewees based on posturographic data.

    Science.gov (United States)

    Saripalle, Sashi K; Vemulapalli, Spandana; King, Gregory W; Burgoon, Judee K; Derakhshani, Reza

    2015-01-01

    This paper discusses the advantages of using posturographic signals from force plates for non-invasive credibility assessment. The contributions of our work are two fold: first, the proposed method is highly efficient and non invasive. Second, feasibility for creating an autonomous credibility assessment system using machine-learning algorithms is studied. This study employs an interview paradigm that includes subjects responding with truthful and deceptive intent while their center of pressure (COP) signal is being recorded. Classification models utilizing sets of COP features for deceptive responses are derived and best accuracy of 93.5% for test interval is reported.

  2. Method of Automatic Ontology Mapping through Machine Learning and Logic Mining

    Institute of Scientific and Technical Information of China (English)

    王英林

    2004-01-01

    Ontology mapping is the bottleneck of handling conflicts among heterogeneous ontologies and of implementing reconfiguration or interoperability of legacy systems. We proposed an ontology mapping method by using machine learning, type constraints and logic mining techniques. This method is able to find concept correspondences through instances and the result is optimized by using an error function; it is able to find attribute correspondence between two equivalent concepts and the mapping accuracy is enhanced by combining together instances learning, type constraints and the logic relations that are imbedded in instances; moreover, it solves the most common kind of categorization conflicts. We then proposed a merging algorithm to generate the shared ontology and proposed a reconfigurable architecture for interoperation based on multi agents. The legacy systems are encapsulated as information agents to participate in the integration system. Finally we give a simplified case study.

  3. An Evaluation of Machine Learning Methods to Detect Malicious SCADA Communications

    Energy Technology Data Exchange (ETDEWEB)

    Beaver, Justin M [ORNL; Borges, Raymond Charles [ORNL; Buckner, Mark A [ORNL

    2013-01-01

    Critical infrastructure Supervisory Control and Data Acquisition (SCADA) systems were designed to operate on closed, proprietary networks where a malicious insider posed the greatest threat potential. The centralization of control and the movement towards open systems and standards has improved the efficiency of industrial control, but has also exposed legacy SCADA systems to security threats that they were not designed to mitigate. This work explores the viability of machine learning methods in detecting the new threat scenarios of command and data injection. Similar to network intrusion detection systems in the cyber security domain, the command and control communications in a critical infrastructure setting are monitored, and vetted against examples of benign and malicious command traffic, in order to identify potential attack events. Multiple learning methods are evaluated using a dataset of Remote Terminal Unit communications, which included both normal operations and instances of command and data injection attack scenarios.

  4. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    Science.gov (United States)

    Burlina, Philippe; Billings, Seth; Joshi, Neil; Albayda, Jemima

    2017-01-01

    To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  5. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    Directory of Open Access Journals (Sweden)

    Philippe Burlina

    Full Text Available To evaluate the use of ultrasound coupled with machine learning (ML and deep learning (DL techniques for automated or semi-automated classification of myositis.Eighty subjects comprised of 19 with inclusion body myositis (IBM, 14 with polymyositis (PM, 14 with dermatomyositis (DM, and 33 normal (N subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally were acquired. We considered three problems of classification including (A normal vs. affected (DM, PM, IBM; (B normal vs. IBM patients; and (C IBM vs. other types of myositis (DM or PM. We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification.The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A, 86.6% ± 2.4% for (B and 74.8% ± 3.9% for (C, while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A, 84.3% ± 2.3% for (B and 68.9% ± 2.5% for (C.This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  6. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  7. Can Machines Learn Respiratory Virus Epidemiology?: A Comparative Study of Likelihood-Free Methods for the Estimation of Epidemiological Dynamics

    Directory of Open Access Journals (Sweden)

    Heidi L. Tessmer

    2018-03-01

    Full Text Available To estimate and predict the transmission dynamics of respiratory viruses, the estimation of the basic reproduction number, R0, is essential. Recently, approximate Bayesian computation methods have been used as likelihood free methods to estimate epidemiological model parameters, particularly R0. In this paper, we explore various machine learning approaches, the multi-layer perceptron, convolutional neural network, and long-short term memory, to learn and estimate the parameters. Further, we compare the accuracy of the estimates and time requirements for machine learning and the approximate Bayesian computation methods on both simulated and real-world epidemiological data from outbreaks of influenza A(H1N1pdm09, mumps, and measles. We find that the machine learning approaches can be verified and tested faster than the approximate Bayesian computation method, but that the approximate Bayesian computation method is more robust across different datasets.

  8. In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods.

    Science.gov (United States)

    Du, Hanwen; Cai, Yingchun; Yang, Hongbin; Zhang, Hongxiao; Xue, Yuhan; Liu, Guixia; Tang, Yun; Li, Weihua

    2017-05-15

    Environmental chemicals may affect endocrine systems through multiple mechanisms, one of which is via effects on aromatase (also known as CYP19A1), an enzyme critical for maintaining the normal balance of estrogens and androgens in the body. Therefore, rapid and efficient identification of aromatase-related endocrine disrupting chemicals (EDCs) is important for toxicology and environment risk assessment. In this study, on the basis of the Tox21 10K compound library, in silico classification models for predicting aromatase binders/nonbinders were constructed by machine learning methods. To improve the prediction ability of the models, a combined classifier (CC) strategy that combines different independent machine learning methods was adopted. Performances of the models were measured by test and external validation sets containing 1336 and 216 chemicals, respectively. The best model was obtained with the MACCS (Molecular Access System) fingerprint and CC method, which exhibited an accuracy of 0.84 for the test set and 0.91 for the external validation set. Additionally, several representative substructures for characterizing aromatase binders, such as ketone, lactone, and nitrogen-containing derivatives, were identified using information gain and substructure frequency analysis. Our study provided a systematic assessment of chemicals binding to aromatase. The built models can be helpful to rapidly identify potential EDCs targeting aromatase.

  9. Machine learning systems

    Energy Technology Data Exchange (ETDEWEB)

    Forsyth, R

    1984-05-01

    With the dramatic rise of expert systems has come a renewed interest in the fuel that drives them-knowledge. For it is specialist knowledge which gives expert systems their power. But extracting knowledge from human experts in symbolic form has proved arduous and labour-intensive. So the idea of machine learning is enjoying a renaissance. Machine learning is any automatic improvement in the performance of a computer system over time, as a result of experience. Thus a learning algorithm seeks to do one or more of the following: cover a wider range of problems, deliver more accurate solutions, obtain answers more cheaply, and simplify codified knowledge. 6 references.

  10. A Machine Learning Method for the Prediction of Receptor Activation in the Simulation of Synapses

    Science.gov (United States)

    Montes, Jesus; Gomez, Elena; Merchán-Pérez, Angel; DeFelipe, Javier; Peña, Jose-Maria

    2013-01-01

    Chemical synaptic transmission involves the release of a neurotransmitter that diffuses in the extracellular space and interacts with specific receptors located on the postsynaptic membrane. Computer simulation approaches provide fundamental tools for exploring various aspects of the synaptic transmission under different conditions. In particular, Monte Carlo methods can track the stochastic movements of neurotransmitter molecules and their interactions with other discrete molecules, the receptors. However, these methods are computationally expensive, even when used with simplified models, preventing their use in large-scale and multi-scale simulations of complex neuronal systems that may involve large numbers of synaptic connections. We have developed a machine-learning based method that can accurately predict relevant aspects of the behavior of synapses, such as the percentage of open synaptic receptors as a function of time since the release of the neurotransmitter, with considerably lower computational cost compared with the conventional Monte Carlo alternative. The method is designed to learn patterns and general principles from a corpus of previously generated Monte Carlo simulations of synapses covering a wide range of structural and functional characteristics. These patterns are later used as a predictive model of the behavior of synapses under different conditions without the need for additional computationally expensive Monte Carlo simulations. This is performed in five stages: data sampling, fold creation, machine learning, validation and curve fitting. The resulting procedure is accurate, automatic, and it is general enough to predict synapse behavior under experimental conditions that are different to the ones it has been trained on. Since our method efficiently reproduces the results that can be obtained with Monte Carlo simulations at a considerably lower computational cost, it is suitable for the simulation of high numbers of synapses and it is

  11. A machine learning method for the prediction of receptor activation in the simulation of synapses.

    Directory of Open Access Journals (Sweden)

    Jesus Montes

    Full Text Available Chemical synaptic transmission involves the release of a neurotransmitter that diffuses in the extracellular space and interacts with specific receptors located on the postsynaptic membrane. Computer simulation approaches provide fundamental tools for exploring various aspects of the synaptic transmission under different conditions. In particular, Monte Carlo methods can track the stochastic movements of neurotransmitter molecules and their interactions with other discrete molecules, the receptors. However, these methods are computationally expensive, even when used with simplified models, preventing their use in large-scale and multi-scale simulations of complex neuronal systems that may involve large numbers of synaptic connections. We have developed a machine-learning based method that can accurately predict relevant aspects of the behavior of synapses, such as the percentage of open synaptic receptors as a function of time since the release of the neurotransmitter, with considerably lower computational cost compared with the conventional Monte Carlo alternative. The method is designed to learn patterns and general principles from a corpus of previously generated Monte Carlo simulations of synapses covering a wide range of structural and functional characteristics. These patterns are later used as a predictive model of the behavior of synapses under different conditions without the need for additional computationally expensive Monte Carlo simulations. This is performed in five stages: data sampling, fold creation, machine learning, validation and curve fitting. The resulting procedure is accurate, automatic, and it is general enough to predict synapse behavior under experimental conditions that are different to the ones it has been trained on. Since our method efficiently reproduces the results that can be obtained with Monte Carlo simulations at a considerably lower computational cost, it is suitable for the simulation of high numbers of

  12. NetiNeti: discovery of scientific names from text using machine learning methods

    Directory of Open Access Journals (Sweden)

    Akella Lakshmi

    2012-08-01

    Full Text Available Abstract Background A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. Results We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing, a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE. NetiNeti performs better (precision = 98.9% and recall = 70.5% compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3% on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Conclusions We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.

  13. Glucose Oxidase Biosensor Modeling and Predictors Optimization by Machine Learning Methods.

    Science.gov (United States)

    Gonzalez-Navarro, Felix F; Stilianova-Stoytcheva, Margarita; Renteria-Gutierrez, Livier; Belanche-Muñoz, Lluís A; Flores-Rios, Brenda L; Ibarra-Esquer, Jorge E

    2016-10-26

    Biosensors are small analytical devices incorporating a biological recognition element and a physico-chemical transducer to convert a biological signal into an electrical reading. Nowadays, their technological appeal resides in their fast performance, high sensitivity and continuous measuring capabilities; however, a full understanding is still under research. This paper aims to contribute to this growing field of biotechnology, with a focus on Glucose-Oxidase Biosensor (GOB) modeling through statistical learning methods from a regression perspective. We model the amperometric response of a GOB with dependent variables under different conditions, such as temperature, benzoquinone, pH and glucose concentrations, by means of several machine learning algorithms. Since the sensitivity of a GOB response is strongly related to these dependent variables, their interactions should be optimized to maximize the output signal, for which a genetic algorithm and simulated annealing are used. We report a model that shows a good generalization error and is consistent with the optimization.

  14. Glucose Oxidase Biosensor Modeling and Predictors Optimization by Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Felix F. Gonzalez-Navarro

    2016-10-01

    Full Text Available Biosensors are small analytical devices incorporating a biological recognition element and a physico-chemical transducer to convert a biological signal into an electrical reading. Nowadays, their technological appeal resides in their fast performance, high sensitivity and continuous measuring capabilities; however, a full understanding is still under research. This paper aims to contribute to this growing field of biotechnology, with a focus on Glucose-Oxidase Biosensor (GOB modeling through statistical learning methods from a regression perspective. We model the amperometric response of a GOB with dependent variables under different conditions, such as temperature, benzoquinone, pH and glucose concentrations, by means of several machine learning algorithms. Since the sensitivity of a GOB response is strongly related to these dependent variables, their interactions should be optimized to maximize the output signal, for which a genetic algorithm and simulated annealing are used. We report a model that shows a good generalization error and is consistent with the optimization.

  15. Automatic Detection of Acromegaly From Facial Photographs Using Machine Learning Methods.

    Science.gov (United States)

    Kong, Xiangyi; Gong, Shun; Su, Lijuan; Howard, Newton; Kong, Yanguo

    2018-01-01

    Automatic early detection of acromegaly is theoretically possible from facial photographs, which can lessen the prevalence and increase the cure probability. In this study, several popular machine learning algorithms were used to train a retrospective development dataset consisting of 527 acromegaly patients and 596 normal subjects. We firstly used OpenCV to detect the face bounding rectangle box, and then cropped and resized it to the same pixel dimensions. From the detected faces, locations of facial landmarks which were the potential clinical indicators were extracted. Frontalization was then adopted to synthesize frontal facing views to improve the performance. Several popular machine learning methods including LM, KNN, SVM, RT, CNN, and EM were used to automatically identify acromegaly from the detected facial photographs, extracted facial landmarks, and synthesized frontal faces. The trained models were evaluated using a separate dataset, of which half were diagnosed as acromegaly by growth hormone suppression test. The best result of our proposed methods showed a PPV of 96%, a NPV of 95%, a sensitivity of 96% and a specificity of 96%. Artificial intelligence can automatically early detect acromegaly with a high sensitivity and specificity. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems.

    Science.gov (United States)

    Tulsyan, Aditya; Garvin, Christopher; Ündey, Cenk

    2018-04-06

    Biopharmaceutical manufacturing comprises of multiple distinct processing steps that require effective and efficient monitoring of many variables simultaneously in real-time. The state-of-the-art real-time multivariate statistical batch process monitoring (BPM) platforms have been in use in recent years to ensure comprehensive monitoring is in place as a complementary tool for continued process verification to detect weak signals. This article addresses a longstanding, industry-wide problem in BPM, referred to as the "Low-N" problem, wherein a product has a limited production history. The current best industrial practice to address the Low-N problem is to switch from a multivariate to a univariate BPM, until sufficient product history is available to build and deploy a multivariate BPM platform. Every batch run without a robust multivariate BPM platform poses risk of not detecting potential weak signals developing in the process that might have an impact on process and product performance. In this article, we propose an approach to solve the Low-N problem by generating an arbitrarily large number of in silico batches through a combination of hardware exploitation and machine-learning methods. To the best of authors' knowledge, this is the first article to provide a solution to the Low-N problem in biopharmaceutical manufacturing using machine-learning methods. Several industrial case studies from bulk drug substance manufacturing are presented to demonstrate the efficacy of the proposed approach for BPM under various Low-N scenarios. © 2018 Wiley Periodicals, Inc.

  17. Comparison of four machine learning methods for object-oriented change detection in high-resolution satellite imagery

    Science.gov (United States)

    Bai, Ting; Sun, Kaimin; Deng, Shiquan; Chen, Yan

    2018-03-01

    High resolution image change detection is one of the key technologies of remote sensing application, which is of great significance for resource survey, environmental monitoring, fine agriculture, military mapping and battlefield environment detection. In this paper, for high-resolution satellite imagery, Random Forest (RF), Support Vector Machine (SVM), Deep belief network (DBN), and Adaboost models were established to verify the possibility of different machine learning applications in change detection. In order to compare detection accuracy of four machine learning Method, we applied these four machine learning methods for two high-resolution images. The results shows that SVM has higher overall accuracy at small samples compared to RF, Adaboost, and DBN for binary and from-to change detection. With the increase in the number of samples, RF has higher overall accuracy compared to Adaboost, SVM and DBN.

  18. Peak detection method evaluation for ion mobility spectrometry by using machine learning approaches.

    Science.gov (United States)

    Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna; Baumbach, Jörg Ingo; Rahmann, Sven; Baumbach, Jan

    2013-04-16

    Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.

  19. Estimating the complexity of 3D structural models using machine learning methods

    Science.gov (United States)

    Mejía-Herrera, Pablo; Kakurina, Maria; Royer, Jean-Jacques

    2016-04-01

    Quantifying the complexity of 3D geological structural models can play a major role in natural resources exploration surveys, for predicting environmental hazards or for forecasting fossil resources. This paper proposes a structural complexity index which can be used to help in defining the degree of effort necessary to build a 3D model for a given degree of confidence, and also to identify locations where addition efforts are required to meet a given acceptable risk of uncertainty. In this work, it is considered that the structural complexity index can be estimated using machine learning methods on raw geo-data. More precisely, the metrics for measuring the complexity can be approximated as the difficulty degree associated to the prediction of the geological objects distribution calculated based on partial information on the actual structural distribution of materials. The proposed methodology is tested on a set of 3D synthetic structural models for which the degree of effort during their building is assessed using various parameters (such as number of faults, number of part in a surface object, number of borders, ...), the rank of geological elements contained in each model, and, finally, their level of deformation (folding and faulting). The results show how the estimated complexity in a 3D model can be approximated by the quantity of partial data necessaries to simulated at a given precision the actual 3D model without error using machine learning algorithms.

  20. BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

    Directory of Open Access Journals (Sweden)

    F. Pirotti

    2016-06-01

    Full Text Available Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels for testing and validating subsets. The classes used are the following: (i urban (ii sowable areas (iii water (iv tree plantations (v grasslands. Validation is carried out using three different approaches: (i using pixels from the training dataset (train, (ii using pixels from the training dataset and applying cross-validation with the k-fold method (kfold and (iii using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train, the whole control dataset (full and with k-fold cross-validation (kfold with ten folds. Results from validation of predictions of the whole dataset (full show the

  1. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  2. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  3. Machine Learning in Medicine

    Science.gov (United States)

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  4. Clojure for machine learning

    CERN Document Server

    Wali, Akhil

    2014-01-01

    A book that brings out the strengths of Clojure programming that have to facilitate machine learning. Each topic is described in substantial detail, and examples and libraries in Clojure are also demonstrated.This book is intended for Clojure developers who want to explore the area of machine learning. Basic understanding of the Clojure programming language is required, but thorough acquaintance with the standard Clojure library or any libraries are not required. Familiarity with theoretical concepts and notation of mathematics and statistics would be an added advantage.

  5. Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables.

    Science.gov (United States)

    Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco

    2018-03-01

    This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.

  6. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    Science.gov (United States)

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.

  7. Gaussian processes for machine learning.

    Science.gov (United States)

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

  8. Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.

    Science.gov (United States)

    Luo, Gang; Stone, Bryan L; Johnson, Michael D; Tarczy-Hornoch, Peter; Wilcox, Adam B; Mooney, Sean D; Sheng, Xiaoming; Haug, Peter J; Nkoy, Flory L

    2017-08-29

    To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets. This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care

  9. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  10. Effect of abiotic and biotic stress factors analysis using machine learning methods in zebrafish.

    Science.gov (United States)

    Gutha, Rajasekar; Yarrappagaari, Suresh; Thopireddy, Lavanya; Reddy, Kesireddy Sathyavelu; Saddala, Rajeswara Reddy

    2018-03-01

    In order to understand the mechanisms underlying stress responses, meta-analysis of transcriptome is made to identify differentially expressed genes (DEGs) and their biological, molecular and cellular mechanisms in response to stressors. The present study is aimed at identifying the effect of abiotic and biotic stress factors, and it is found that several stress responsive genes are common for both abiotic and biotic stress factors in zebrafish. The meta-analysis of micro-array studies revealed that almost 4.7% i.e., 108 common DEGs are differentially regulated between abiotic and biotic stresses. This shows that there is a global coordination and fine-tuning of gene regulation in response to these two types of challenges. We also performed dimension reduction methods, principal component analysis, and partial least squares discriminant analysis which are able to segregate abiotic and biotic stresses into separate entities. The supervised machine learning model, recursive-support vector machine, could classify abiotic and biotic stresses with 100% accuracy using a subset of DEGs. Beside these methods, the random forests decision tree model classified five out of 8 stress conditions with high accuracy. Finally, Functional enrichment analysis revealed the different gene ontology terms, transcription factors and miRNAs factors in the regulation of stress responses. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Using multivariate machine learning methods and structural MRI to classify childhood onset schizophrenia and healthy controls

    Directory of Open Access Journals (Sweden)

    Deanna eGreenstein

    2012-06-01

    Full Text Available Introduction: Multivariate machine learning methods can be used to classify groups of schizophrenia patients and controls using structural magnetic resonance imaging (MRI. However, machine learning methods to date have not been extended beyond classification and contemporaneously applied in a meaningful way to clinical measures. We hypothesized that brain measures would classify groups, and that increased likelihood of being classified as a patient using regional brain measures would be positively related to illness severity, developmental delays and genetic risk. Methods: Using 74 anatomic brain MRI sub regions and Random Forest, we classified 98 COS patients and 99 age, sex, and ethnicity-matched healthy controls. We also used Random Forest to determine the likelihood of being classified as a schizophrenia patient based on MRI measures. We then explored relationships between brain-based probability of illness and symptoms, premorbid development, and presence of copy number variation associated with schizophrenia. Results: Brain regions jointly classified COS and control groups with 73.7% accuracy. Greater brain-based probability of illness was associated with worse functioning (p= 0.0004 and fewer developmental delays (p=0.02. Presence of copy number variation (CNV was associated with lower probability of being classified as schizophrenia (p=0.001. The regions that were most important in classifying groups included left temporal lobes, bilateral dorsolateral prefrontal regions, and left medial parietal lobes. Conclusions: Schizophrenia and control groups can be well classified using Random Forest and anatomic brain measures, and brain-based probability of illness has a positive relationship with illness severity and a negative relationship with developmental delays/problems and CNV-based risk.

  12. Machine Learning for Security

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  13. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop

  14. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting

    Directory of Open Access Journals (Sweden)

    Zhaoxuan Li

    2016-01-01

    Full Text Available We evaluate and compare two common methods, artificial neural networks (ANN and support vector regression (SVR, for predicting energy productions from a solar photovoltaic (PV system in Florida 15 min, 1 h and 24 h ahead of time. A hierarchical approach is proposed based on the machine learning algorithms tested. The production data used in this work corresponds to 15 min averaged power measurements collected from 2014. The accuracy of the model is determined using computing error statistics such as mean bias error (MBE, mean absolute error (MAE, root mean square error (RMSE, relative MBE (rMBE, mean percentage error (MPE and relative RMSE (rRMSE. This work provides findings on how forecasts from individual inverters will improve the total solar power generation forecast of the PV system.

  15. Comparison of four statistical and machine learning methods for crash severity prediction.

    Science.gov (United States)

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Using Machine Learning to Predict Student Performance

    OpenAIRE

    Pojon, Murat

    2017-01-01

    This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance. Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification ...

  17. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  18. MU-LOC: A Machine-Learning Method for Predicting Mitochondrially Localized Proteins in Plants

    Directory of Open Access Journals (Sweden)

    Ning Zhang

    2018-05-01

    Full Text Available Targeting and translocation of proteins to the appropriate subcellular compartments are crucial for cell organization and function. Newly synthesized proteins are transported to mitochondria with the assistance of complex targeting sequences containing either an N-terminal pre-sequence or a multitude of internal signals. Compared with experimental approaches, computational predictions provide an efficient way to infer subcellular localization of a protein. However, it is still challenging to predict plant mitochondrially localized proteins accurately due to various limitations. Consequently, the performance of current tools can be improved with new data and new machine-learning methods. We present MU-LOC, a novel computational approach for large-scale prediction of plant mitochondrial proteins. We collected a comprehensive dataset of plant subcellular localization, extracted features including amino acid composition, protein position weight matrix, and gene co-expression information, and trained predictors using deep neural network and support vector machine. Benchmarked on two independent datasets, MU-LOC achieved substantial improvements over six state-of-the-art tools for plant mitochondrial targeting prediction. In addition, MU-LOC has the advantage of predicting plant mitochondrial proteins either possessing or lacking N-terminal pre-sequences. We applied MU-LOC to predict candidate mitochondrial proteins for the whole proteome of Arabidopsis and potato. MU-LOC is publicly available at http://mu-loc.org.

  19. Process signal selection method to improve the impact mitigation of sensor broken for diagnosis using machine learning

    International Nuclear Information System (INIS)

    Minowa, Hirotsugu; Gofuku, Akio

    2014-01-01

    Accidents of industrial plants cause large loss on human, economic, social credibility. In recent, studies of diagnostic methods using techniques of machine learning are expected to detect early and correctly abnormality occurred in a plant. However, the general diagnostic machines are generated generally to require all process signals (hereafter, signals) for plant diagnosis. Thus if trouble occurs such as process sensor is broken, the diagnostic machine cannot diagnose or may decrease diagnostic performance. Therefore, we propose an important process signal selection method to improve impact mitigation without reducing the diagnostic performance by reducing the adverse effect of noises on multi-agent diagnostic system. The advantage of our method is the general-purpose property that allows to be applied to various supervised machine learning and to set the various parameters to decide termination of search. The experiment evaluation revealed that diagnostic machines generated by our method using SVM improved the impact mitigation and did not reduce performance about the diagnostic accuracy, the velocity of diagnosis, predictions of plant state near accident occurrence, in comparison with the basic diagnostic machine which diagnoses by using all signals. This paper reports our proposed method and the results evaluated which our method was applied to the simulated abnormal of the fast-breeder reactor Monju. (author)

  20. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  1. Machine learning methods to predict child posttraumatic stress: a proof of concept study.

    Science.gov (United States)

    Saxe, Glenn N; Ma, Sisi; Ren, Jiwen; Aliferis, Constantin

    2017-07-10

    The care of traumatized children would benefit significantly from accurate predictive models for Posttraumatic Stress Disorder (PTSD), using information available around the time of trauma. Machine Learning (ML) computational methods have yielded strong results in recent applications across many diseases and data types, yet they have not been previously applied to childhood PTSD. Since these methods have not been applied to this complex and debilitating disorder, there is a great deal that remains to be learned about their application. The first step is to prove the concept: Can ML methods - as applied in other fields - produce predictive classification models for childhood PTSD? Additionally, we seek to determine if specific variables can be identified - from the aforementioned predictive classification models - with putative causal relations to PTSD. ML predictive classification methods - with causal discovery feature selection - were applied to a data set of 163 children hospitalized with an injury and PTSD was determined three months after hospital discharge. At the time of hospitalization, 105 risk factor variables were collected spanning a range of biopsychosocial domains. Seven percent of subjects had a high level of PTSD symptoms. A predictive classification model was discovered with significant predictive accuracy. A predictive model constructed based on subsets of potentially causally relevant features achieves similar predictivity compared to the best predictive model constructed with all variables. Causal Discovery feature selection methods identified 58 variables of which 10 were identified as most stable. In this first proof-of-concept application of ML methods to predict childhood Posttraumatic Stress we were able to determine both predictive classification models for childhood PTSD and identify several causal variables. This set of techniques has great potential for enhancing the methodological toolkit in the field and future studies should seek to

  2. Can machine learning explain human learning?

    NARCIS (Netherlands)

    Vahdat, M.; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

    2016-01-01

    Learning Analytics (LA) has a major interest in exploring and understanding the learning process of humans and, for this purpose, benefits from both Cognitive Science, which studies how humans learn, and Machine Learning, which studies how algorithms learn from data. Usually, Machine Learning is

  3. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  4. A machine learning method for fast and accurate characterization of depth-of-interaction gamma cameras

    DEFF Research Database (Denmark)

    Pedemonte, Stefano; Pierce, Larry; Van Leemput, Koen

    2017-01-01

    to impose the depth-of-interaction in an experimental set-up. In this article we introduce a machine learning approach for extracting accurate forward models of gamma imaging devices from simple pencil-beam measurements, using a nonlinear dimensionality reduction technique in combination with a finite...

  5. Machine Learning Methods for Knowledge Discovery in Medical Data on Atherosclerosis

    Czech Academy of Sciences Publication Activity Database

    Serrano, J.I.; Tomečková, Marie; Zvárová, Jana

    2006-01-01

    Roč. 1, - (2006), s. 6-33 ISSN 1801-5603 Institutional research plan: CEZ:AV0Z10300504 Keywords : knowledge discovery * supervised machine learning * biomedical data mining * risk factors of atherosclerosis Subject RIV: BB - Applied Statistics, Operational Research

  6. Distinguishing butchery cut marks from crocodile bite marks through machine learning methods.

    Science.gov (United States)

    Domínguez-Rodrigo, Manuel; Baquedano, Enrique

    2018-04-10

    All models of evolution of human behaviour depend on the correct identification and interpretation of bone surface modifications (BSM) on archaeofaunal assemblages. Crucial evolutionary features, such as the origin of stone tool use, meat-eating, food-sharing, cooperation and sociality can only be addressed through confident identification and interpretation of BSM, and more specifically, cut marks. Recently, it has been argued that linear marks with the same properties as cut marks can be created by crocodiles, thereby questioning whether secure cut mark identifications can be made in the Early Pleistocene fossil record. Powerful classification methods based on multivariate statistics and machine learning (ML) algorithms have previously successfully discriminated cut marks from most other potentially confounding BSM. However, crocodile-made marks were marginal to or played no role in these comparative analyses. Here, for the first time, we apply state-of-the-art ML methods on crocodile linear BSM and experimental butchery cut marks, showing that the combination of multivariate taphonomy and ML methods provides accurate identification of BSM, including cut and crocodile bite marks. This enables empirically-supported hominin behavioural modelling, provided that these methods are applied to fossil assemblages.

  7. Machine learning plus optical flow: a simple and sensitive method to detect cardioactive drugs

    Science.gov (United States)

    Lee, Eugene K.; Kurokawa, Yosuke K.; Tu, Robin; George, Steven C.; Khine, Michelle

    2015-07-01

    Current preclinical screening methods do not adequately detect cardiotoxicity. Using human induced pluripotent stem cell-derived cardiomyocytes (iPS-CMs), more physiologically relevant preclinical or patient-specific screening to detect potential cardiotoxic effects of drug candidates may be possible. However, one of the persistent challenges for developing a high-throughput drug screening platform using iPS-CMs is the need to develop a simple and reliable method to measure key electrophysiological and contractile parameters. To address this need, we have developed a platform that combines machine learning paired with brightfield optical flow as a simple and robust tool that can automate the detection of cardiomyocyte drug effects. Using three cardioactive drugs of different mechanisms, including those with primarily electrophysiological effects, we demonstrate the general applicability of this screening method to detect subtle changes in cardiomyocyte contraction. Requiring only brightfield images of cardiomyocyte contractions, we detect changes in cardiomyocyte contraction comparable to - and even superior to - fluorescence readouts. This automated method serves as a widely applicable screening tool to characterize the effects of drugs on cardiomyocyte function.

  8. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-Learning Methods.

    Science.gov (United States)

    Zheng, Suqing; Jiang, Mengying; Zhao, Chengwei; Zhu, Rui; Hu, Zhicheng; Xu, Yong; Lin, Fu

    2018-01-01

    In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc.) combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC) of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program "e-Bitter" is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.

  9. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-Learning Methods

    Directory of Open Access Journals (Sweden)

    Suqing Zheng

    2018-03-01

    Full Text Available In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc. combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program “e-Bitter” is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.

  10. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-learning Methods

    Science.gov (United States)

    Zheng, Suqing; Jiang, Mengying; Zhao, Chengwei; Zhu, Rui; Hu, Zhicheng; Xu, Yong; Lin, Fu

    2018-03-01

    In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc.) combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC) of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program “e-Bitter” is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.

  11. Comparison of Machine Learning methods for incipient motion in gravel bed rivers

    Science.gov (United States)

    Valyrakis, Manousos

    2013-04-01

    Soil erosion and sediment transport of natural gravel bed streams are important processes which affect both the morphology as well as the ecology of earth's surface. For gravel bed rivers at near incipient flow conditions, particle entrainment dynamics are highly intermittent. This contribution reviews the use of modern Machine Learning (ML) methods implemented for short term prediction of entrainment instances of individual grains exposed in fully developed near boundary turbulent flows. Results obtained by network architectures of variable complexity based on two different ML methods namely the Artificial Neural Network (ANN) and the Adaptive Neuro-Fuzzy Inference System (ANFIS) are compared in terms of different error and performance indices, computational efficiency and complexity as well as predictive accuracy and forecast ability. Different model architectures are trained and tested with experimental time series obtained from mobile particle flume experiments. The experimental setup consists of a Laser Doppler Velocimeter (LDV) and a laser optics system, which acquire data for the instantaneous flow and particle response respectively, synchronously. The first is used to record the flow velocity components directly upstream of the test particle, while the later tracks the particle's displacements. The lengthy experimental data sets (millions of data points) are split into the training and validation subsets used to perform the corresponding learning and testing of the models. It is demonstrated that the ANFIS hybrid model, which is based on neural learning and fuzzy inference principles, better predicts the critical flow conditions above which sediment transport is initiated. In addition, it is illustrated that empirical knowledge can be extracted, validating the theoretical assumption that particle ejections occur due to energetic turbulent flow events. Such a tool may find application in management and regulation of stream flows downstream of dams for stream

  12. A multilevel-ROI-features-based machine learning method for detection of morphometric biomarkers in Parkinson's disease.

    Science.gov (United States)

    Peng, Bo; Wang, Suhong; Zhou, Zhiyong; Liu, Yan; Tong, Baotong; Zhang, Tao; Dai, Yakang

    2017-06-09

    Machine learning methods have been widely used in recent years for detection of neuroimaging biomarkers in regions of interest (ROIs) and assisting diagnosis of neurodegenerative diseases. The innovation of this study is to use multilevel-ROI-features-based machine learning method to detect sensitive morphometric biomarkers in Parkinson's disease (PD). Specifically, the low-level ROI features (gray matter volume, cortical thickness, etc.) and high-level correlative features (connectivity between ROIs) are integrated to construct the multilevel ROI features. Filter- and wrapper- based feature selection method and multi-kernel support vector machine (SVM) are used in the classification algorithm. T1-weighted brain magnetic resonance (MR) images of 69 PD patients and 103 normal controls from the Parkinson's Progression Markers Initiative (PPMI) dataset are included in the study. The machine learning method performs well in classification between PD patients and normal controls with an accuracy of 85.78%, a specificity of 87.79%, and a sensitivity of 87.64%. The most sensitive biomarkers between PD patients and normal controls are mainly distributed in frontal lobe, parental lobe, limbic lobe, temporal lobe, and central region. The classification performance of our method with multilevel ROI features is significantly improved comparing with other classification methods using single-level features. The proposed method shows promising identification ability for detecting morphometric biomarkers in PD, thus confirming the potentiality of our method in assisting diagnosis of the disease. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Introduction to Machine Learning: Class Notes 67577

    OpenAIRE

    Shashua, Amnon

    2009-01-01

    Introduction to Machine learning covering Statistical Inference (Bayes, EM, ML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering), and PAC learning (the Formal model, VC dimension, Double Sampling theorem).

  14. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules.

    Science.gov (United States)

    Vyas, Renu; Bapat, Sanket; Jain, Esha; Tambe, Sanjeev S; Karthikeyan, Muthukumarasamy; Kulkarni, Bhaskar D

    2015-01-01

    The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX and NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network (ANN), k nearest neighbor (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree--(DT) and Random Forest--(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82%) and toxicophoric (70.64%). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic were studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.

  15. Student Modeling and Machine Learning

    OpenAIRE

    Sison , Raymund; Shimura , Masamichi

    1998-01-01

    After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine le...

  16. Quantum Machine Learning

    Science.gov (United States)

    Biswas, Rupak

    2018-01-01

    Quantum computing promises an unprecedented ability to solve intractable problems by harnessing quantum mechanical effects such as tunneling, superposition, and entanglement. The Quantum Artificial Intelligence Laboratory (QuAIL) at NASA Ames Research Center is the space agency's primary facility for conducting research and development in quantum information sciences. QuAIL conducts fundamental research in quantum physics but also explores how best to exploit and apply this disruptive technology to enable NASA missions in aeronautics, Earth and space sciences, and space exploration. At the same time, machine learning has become a major focus in computer science and captured the imagination of the public as a panacea to myriad big data problems. In this talk, we will discuss how classical machine learning can take advantage of quantum computing to significantly improve its effectiveness. Although we illustrate this concept on a quantum annealer, other quantum platforms could be used as well. If explored fully and implemented efficiently, quantum machine learning could greatly accelerate a wide range of tasks leading to new technologies and discoveries that will significantly change the way we solve real-world problems.

  17. On Plant Detection of Intact Tomato Fruits Using Image Analysis and Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Kyosuke Yamamoto

    2014-07-01

    Full Text Available Fully automated yield estimation of intact fruits prior to harvesting provides various benefits to farmers. Until now, several studies have been conducted to estimate fruit yield using image-processing technologies. However, most of these techniques require thresholds for features such as color, shape and size. In addition, their performance strongly depends on the thresholds used, although optimal thresholds tend to vary with images. Furthermore, most of these techniques have attempted to detect only mature and immature fruits, although the number of young fruits is more important for the prediction of long-term fluctuations in yield. In this study, we aimed to develop a method to accurately detect individual intact tomato fruits including mature, immature and young fruits on a plant using a conventional RGB digital camera in conjunction with machine learning approaches. The developed method did not require an adjustment of threshold values for fruit detection from each image because image segmentation was conducted based on classification models generated in accordance with the color, shape, texture and size of the images. The results of fruit detection in the test images showed that the developed method achieved a recall of 0.80, while the precision was 0.88. The recall values of mature, immature and young fruits were 1.00, 0.80 and 0.78, respectively.

  18. Spectral methods in machine learning and new strategies for very large datasets

    Science.gov (United States)

    Belabbas, Mohamed-Ali; Wolfe, Patrick J.

    2009-01-01

    Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods. PMID:19129490

  19. Comparison of Machine Learning Methods for the Purpose Of Human Fall Detection

    Directory of Open Access Journals (Sweden)

    Strémy Maximilián

    2014-12-01

    Full Text Available According to several studies, the European population is rapidly aging far over last years. It is therefore important to ensure that aging population is able to live independently without the support of working-age population. In accordance with the studies, fall is the most dangerous and frequent accident in the everyday life of aging population. In our paper, we present a system to track the human fall by a visual detection, i.e. using no wearable equipment. For this purpose, we used a Kinect sensor, which provides the human body position in the Cartesian coordinates. It is possible to directly capture a human body because the Kinect sensor has a depth and also an infrared camera. The first step in our research was to detect postures and classify the fall accident. We experimented and compared the selected machine learning methods including Naive Bayes, decision trees and SVM method to compare the performance in recognizing the human postures (standing, sitting and lying. The highest classification accuracy of over 93.3% was achieved by the decision tree method.

  20. Teamwork: improved eQTL mapping using combinations of machine learning methods.

    Directory of Open Access Journals (Sweden)

    Marit Ackermann

    Full Text Available Expression quantitative trait loci (eQTL mapping is a widely used technique to uncover regulatory relationships between genes. A range of methodologies have been developed to map links between expression traits and genotypes. The DREAM (Dialogue on Reverse Engineering Assessments and Methods initiative is a community project to objectively assess the relative performance of different computational approaches for solving specific systems biology problems. The goal of one of the DREAM5 challenges was to reverse-engineer genetic interaction networks from synthetic genetic variation and gene expression data, which simulates the problem of eQTL mapping. In this framework, we proposed an approach whose originality resides in the use of a combination of existing machine learning algorithms (committee. Although it was not the best performer, this method was by far the most precise on average. After the competition, we continued in this direction by evaluating other committees using the DREAM5 data and developed a method that relies on Random Forests and LASSO. It achieved a much higher average precision than the DREAM best performer at the cost of slightly lower average sensitivity.

  1. Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours.

    Directory of Open Access Journals (Sweden)

    Monique A Ladds

    Full Text Available Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions. We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM, random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal, testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%. Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding-were all predicted with reasonable accuracy (52-81% by the SVM while travelling was poorly categorised (31-41%. These results show that model selection is important when classifying behaviour and that by using

  2. Extraction of Plant Physiological Status from Hyperspectral Signatures Using Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Daniel Doktor

    2014-12-01

    Full Text Available The machine learning method, random forest (RF, is applied in order to derive biophysical and structural vegetation parameters from hyperspectral signatures. Hyperspectral data are, among other things, characterized by their high dimensionality and autocorrelation. Common multivariate regression approaches, which usually include only a limited number of spectral indices as predictors, do not make full use of the available information. In contrast, machine learning methods, such as RF, are supposed to be better suited to extract information on vegetation status. First, vegetation parameters are extracted from hyperspectral signatures simulated with the radiative transfer model, PROSAIL. Second, the transferability of these results with respect to laboratory and field measurements is investigated. In situ observations of plant physiological parameters and corresponding spectra are gathered in the laboratory for summer barley (Hordeum vulgare. Field in situ measurements focus on winter crops over several growing seasons. Chlorophyll content, Leaf Area Index and phenological growth stages are derived from simulated and measured spectra. RF performs very robustly and with a very high accuracy on PROSAIL simulated data. Furthermore, it is almost unaffected by introduced noise and bias in the data. When applied to laboratory data, the prediction accuracy is still good (C\\(_{ab}\\: \\(R^2\\ = 0.94/ LAI: \\(R^2\\ = 0.80/BBCH (Growth stages of mono-and dicotyledonous plants : \\(R^2\\ = 0.91, but not as high as for simulated spectra. Transferability to field measurements is given with prediction levels as high as for laboratory data (C\\(_{ab}\\: \\(R^2\\ = 0.89/LAI: \\(R^2\\ = 0.89/BBCH: \\(R^2\\ = \\(\\sim\\0.8. Wavelengths for deriving plant physiological status based on simulated and measured hyperspectral signatures are mostly selected from appropriate spectral regions (both field and laboratory: 700–800 nm regressing on C\\(_{ab}\\ and 800–1300

  3. Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

    Science.gov (United States)

    Janik, M; Bossew, P; Kurihara, O

    2018-07-15

    Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon ( 222 Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical

  4. Organizational Learning Supported by Machine Learning Models Coupled with General Explanation Methods: A Case of B2B Sales Forecasting

    Directory of Open Access Journals (Sweden)

    Bohanec Marko

    2017-08-01

    Full Text Available Background and Purpose: The process of business to business (B2B sales forecasting is a complex decision-making process. There are many approaches to support this process, but mainly it is still based on the subjective judgment of a decision-maker. The problem of B2B sales forecasting can be modeled as a classification problem. However, top performing machine learning (ML models are black boxes and do not support transparent reasoning. The purpose of this research is to develop an organizational model using ML model coupled with general explanation methods. The goal is to support the decision-maker in the process of B2B sales forecasting.

  5. A multi-label learning based kernel automatic recommendation method for support vector machine.

    Science.gov (United States)

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.

  6. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  7. Improving the accuracy of myocardial perfusion scintigraphy results by machine learning method

    International Nuclear Information System (INIS)

    Groselj, C.; Kukar, M.

    2002-01-01

    Full text: Machine learning (ML) as rapidly growing artificial intelligence subfield has already proven in last decade to be a useful tool in many fields of decision making, also in some fields of medicine. Its decision accuracy usually exceeds the human one. To assess applicability of ML in interpretation the results of stress myocardial perfusion scintigraphy for CAD diagnosis. The 327 patient's data of planar stress myocardial perfusion scintigraphy were reevaluated in usual way. Comparing them with the results of coronary angiography the sensitivity, specificity and accuracy for the investigation was computed. The data were digitized and the decision procedure repeated by ML program 'Naive Bayesian classifier'. As the ML is able to simultaneously manipulate of whatever number of data, all reachable disease connected data (regarding history, habitus, risk factors, stress results) were added. The sensitivity, specificity and accuracy for scintigraphy were expressed in this way. The results of both decision procedures were compared. With ML method 19 patients more out of 327 (5.8 %) were correctly diagnosed by stress myocardial perfusion scintigraphy. ML could be an important tool for decision making in myocardial perfusion scintigraphy. (author)

  8. Using Machine Learning Methods Jointly to Find Better Set of Rules in Data Mining

    Directory of Open Access Journals (Sweden)

    SUG Hyontai

    2017-01-01

    Full Text Available Rough set-based data mining algorithms are one of widely accepted machine learning technologies because of their strong mathematical background and capability of finding optimal rules based on given data sets only without room for prejudiced views to be inserted on the data. But, because the algorithms find rules very precisely, we may confront with the overfitting problem. On the other hand, association rule algorithms find rules of association, where the association resides between sets of items in database. The algorithms find itemsets that occur more than given minimum support, so that they can find the itemsets practically in reasonable time even for very large databases by supplying the minimum support appropriately. In order to overcome the problem of the overfitting problem in rough set-based algorithms, first we find large itemsets, after that we select attributes that cover the large itemsets. By using the selected attributes only, we may find better set of rules based on rough set theory. Results from experiments support our suggested method.

  9. Integrating Symbolic and Statistical Methods for Testing Intelligent Systems Applications to Machine Learning and Computer Vision

    Energy Technology Data Exchange (ETDEWEB)

    Jha, Sumit Kumar [University of Central Florida, Orlando; Pullum, Laura L [ORNL; Ramanathan, Arvind [ORNL

    2016-01-01

    Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.

  10. Improving Hip-Worn Accelerometer Estimates of Sitting Using Machine Learning Methods.

    Science.gov (United States)

    Kerr, Jacqueline; Carlson, Jordan; Godbole, Suneeta; Cadmus-Bertram, Lisa; Bellettiere, John; Hartman, Sheri

    2018-02-13

    To improve estimates of sitting time from hip worn accelerometers used in large cohort studies by employing machine learning methods developed on free living activPAL data. Thirty breast cancer survivors concurrently wore a hip worn accelerometer and a thigh worn activPAL for 7 days. A random forest classifier, trained on the activPAL data, was employed to detect sitting, standing and sit-stand transitions in 5 second windows in the hip worn accelerometer. The classifier estimates were compared to the standard accelerometer cut point and significant differences across different bout lengths were investigated using mixed effect models. Overall, the algorithm predicted the postures with moderate accuracy (stepping 77%, standing 63%, sitting 67%, sit to stand 52% and stand to sit 51%). Daily level analyses indicated that errors in transition estimates were only occurring during sitting bouts of 2 minutes or less. The standard cut point was significantly different from the activPAL across all bout lengths, overestimating short bouts and underestimating long bouts. This is among the first algorithms for sitting and standing for hip worn accelerometer data to be trained from entirely free living activPAL data. The new algorithm detected prolonged sitting which has been shown to be most detrimental to health. Further validation and training in larger cohorts is warranted.This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

  11. Classification of follicular lymphoma images: a holistic approach with symbol-based machine learning methods.

    Science.gov (United States)

    Zorman, Milan; Sánchez de la Rosa, José Luis; Dinevski, Dejan

    2011-12-01

    It is not very often to see a symbol-based machine learning approach to be used for the purpose of image classification and recognition. In this paper we will present such an approach, which we first used on the follicular lymphoma images. Lymphoma is a broad term encompassing a variety of cancers of the lymphatic system. Lymphoma is differentiated by the type of cell that multiplies and how the cancer presents itself. It is very important to get an exact diagnosis regarding lymphoma and to determine the treatments that will be most effective for the patient's condition. Our work was focused on the identification of lymphomas by finding follicles in microscopy images provided by the Laboratory of Pathology in the University Hospital of Tenerife, Spain. We divided our work in two stages: in the first stage we did image pre-processing and feature extraction, and in the second stage we used different symbolic machine learning approaches for pixel classification. Symbolic machine learning approaches are often neglected when looking for image analysis tools. They are not only known for a very appropriate knowledge representation, but also claimed to lack computational power. The results we got are very promising and show that symbolic approaches can be successful in image analysis applications.

  12. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  13. Prediction of Student Dropout in E-Learning Program Through the Use of Machine Learning Method

    OpenAIRE

    Mingjie Tan; Peiji Shao

    2015-01-01

    The high rate of dropout is a serious problem in E-learning program. Thus it has received extensive concern from the education administrators and researchers. Predicting the potential dropout students is a workable solution to prevent dropout. Based on the analysis of related literature, this study selected student’s personal characteristic and academic performance as input attributions. Prediction models were developed using Artificial Neural Network (ANN), Decision Tree (DT) and Bayesian Ne...

  14. Neuroanatomical heterogeneity of schizophrenia revealed by semi-supervised machine learning methods.

    Science.gov (United States)

    Honnorat, Nicolas; Dong, Aoyan; Meisenzahl-Lechner, Eva; Koutsouleris, Nikolaos; Davatzikos, Christos

    2017-12-20

    Schizophrenia is associated with heterogeneous clinical symptoms and neuroanatomical alterations. In this work, we aim to disentangle the patterns of neuroanatomical alterations underlying a heterogeneous population of patients using a semi-supervised clustering method. We apply this strategy to a cohort of patients with schizophrenia of varying extends of disease duration, and we describe the neuroanatomical, demographic and clinical characteristics of the subtypes discovered. We analyze the neuroanatomical heterogeneity of 157 patients diagnosed with Schizophrenia, relative to a control population of 169 subjects, using a machine learning method called CHIMERA. CHIMERA clusters the differences between patients and a demographically-matched population of healthy subjects, rather than clustering patients themselves, thereby specifically assessing disease-related neuroanatomical alterations. Voxel-Based Morphometry was conducted to visualize the neuroanatomical patterns associated with each group. The clinical presentation and the demographics of the groups were then investigated. Three subgroups were identified. The first two differed substantially, in that one involved predominantly temporal-thalamic-peri-Sylvian regions, whereas the other involved predominantly frontal regions and the thalamus. Both subtypes included primarily male patients. The third pattern was a mix of these two and presented milder neuroanatomic alterations and comprised a comparable number of men and women. VBM and statistical analyses suggest that these groups could correspond to different neuroanatomical dimensions of schizophrenia. Our analysis suggests that schizophrenia presents distinct neuroanatomical variants. This variability points to the need for a dimensional neuroanatomical approach using data-driven, mathematically principled multivariate pattern analysis methods, and should be taken into account in clinical studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Considerations upon the Machine Learning Technologies

    OpenAIRE

    Alin Munteanu; Cristina Ofelia Sofran

    2006-01-01

    Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  16. Considerations upon the Machine Learning Technologies

    Directory of Open Access Journals (Sweden)

    Alin Munteanu

    2006-01-01

    Full Text Available Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  17. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  18. Entropy method combined with extreme learning machine method for the short-term photovoltaic power generation forecasting

    International Nuclear Information System (INIS)

    Tang, Pingzhou; Chen, Di; Hou, Yushuo

    2016-01-01

    As the world’s energy problem becomes more severe day by day, photovoltaic power generation has opened a new door for us with no doubt. It will provide an effective solution for this severe energy problem and meet human’s needs for energy if we can apply photovoltaic power generation in real life, Similar to wind power generation, photovoltaic power generation is uncertain. Therefore, the forecast of photovoltaic power generation is very crucial. In this paper, entropy method and extreme learning machine (ELM) method were combined to forecast a short-term photovoltaic power generation. First, entropy method is used to process initial data, train the network through the data after unification, and then forecast electricity generation. Finally, the data results obtained through the entropy method with ELM were compared with that generated through generalized regression neural network (GRNN) and radial basis function neural network (RBF) method. We found that entropy method combining with ELM method possesses higher accuracy and the calculation is faster.

  19. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  20. Using Machine Learning for Land Suitability Classification

    African Journals Online (AJOL)

    User

    West African Journal of Applied Ecology, vol. ... evidence for the utility of machine learning methods in land suitability classification especially MCS methods. ... Artificial intelligence tools. ..... Numerical values of index for the various classes.

  1. A MACHINE-LEARNING METHOD TO INFER FUNDAMENTAL STELLAR PARAMETERS FROM PHOTOMETRIC LIGHT CURVES

    International Nuclear Information System (INIS)

    Miller, A. A.; Bloom, J. S.; Richards, J. W.; Starr, D. L.; Lee, Y. S.; Butler, N. R.; Tokarz, S.; Smith, N.; Eisner, J. A.

    2015-01-01

    A fundamental challenge for wide-field imaging surveys is obtaining follow-up spectroscopic observations: there are >10 9 photometrically cataloged sources, yet modern spectroscopic surveys are limited to ∼few× 10 6 targets. As we approach the Large Synoptic Survey Telescope era, new algorithmic solutions are required to cope with the data deluge. Here we report the development of a machine-learning framework capable of inferring fundamental stellar parameters (T eff , log g, and [Fe/H]) using photometric-brightness variations and color alone. A training set is constructed from a systematic spectroscopic survey of variables with Hectospec/Multi-Mirror Telescope. In sum, the training set includes ∼9000 spectra, for which stellar parameters are measured using the SEGUE Stellar Parameters Pipeline (SSPP). We employed the random forest algorithm to perform a non-parametric regression that predicts T eff , log g, and [Fe/H] from photometric time-domain observations. Our final optimized model produces a cross-validated rms error (RMSE) of 165 K, 0.39 dex, and 0.33 dex for T eff , log g, and [Fe/H], respectively. Examining the subset of sources for which the SSPP measurements are most reliable, the RMSE reduces to 125 K, 0.37 dex, and 0.27 dex, respectively, comparable to what is achievable via low-resolution spectroscopy. For variable stars this represents a ≈12%-20% improvement in RMSE relative to models trained with single-epoch photometric colors. As an application of our method, we estimate stellar parameters for ∼54,000 known variables. We argue that this method may convert photometric time-domain surveys into pseudo-spectrographic engines, enabling the construction of extremely detailed maps of the Milky Way, its structure, and history

  2. Recognition of Time Stamps on Full-Disk Hα Images Using Machine Learning Methods

    Science.gov (United States)

    Xu, Y.; Huang, N.; Jing, J.; Liu, C.; Wang, H.; Fu, G.

    2016-12-01

    Observation and understanding of the physics of the 11-year solar activity cycle and 22-year magnetic cycle are among the most important research topics in solar physics. The solar cycle is responsible for magnetic field and particle fluctuation in the near-earth environment that have been found increasingly important in affecting the living of human beings in the modern era. A systematic study of large-scale solar activities, as made possible by our rich data archive, will further help us to understand the global-scale magnetic fields that are closely related to solar cycles. The long-time-span data archive includes both full-disk and high-resolution Hα images. Prior to the widely use of CCD cameras in 1990s, 35-mm films were the major media to store images. The research group at NJIT recently finished the digitization of film data obtained by the National Solar Observatory (NSO) and Big Bear Solar Observatory (BBSO) covering the period of 1953 to 2000. The total volume of data exceeds 60 TB. To make this huge database scientific valuable, some processing and calibration are required. One of the most important steps is to read the time stamps on all of the 14 million images, which is almost impossible to be done manually. We implemented three different methods to recognize the time stamps automatically, including Optical Character Recognition (OCR), Classification Tree and TensorFlow. The latter two are known as machine learning algorithms which are very popular now a day in pattern recognition area. We will present some sample images and the results of clock recognition from all three methods.

  3. Predictive ability of machine learning methods for massive crop yield prediction

    Directory of Open Access Journals (Sweden)

    Alberto Gonzalez-Sanchez

    2014-04-01

    Full Text Available An important issue for agricultural planning purposes is the accurate yield estimation for the numerous crops involved in the planning. Machine learning (ML is an essential approach for achieving practical and effective solutions for this problem. Many comparisons of ML methods for yield prediction have been made, seeking for the most accurate technique. Generally, the number of evaluated crops and techniques is too low and does not provide enough information for agricultural planning purposes. This paper compares the predictive accuracy of ML and linear regression techniques for crop yield prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural networks, support vector regression and k-nearest neighbor methods were ranked. Four accuracy metrics were used to validate the models: the root mean square error (RMS, root relative square error (RRSE, normalized mean absolute error (MAE, and correlation factor (R. Real data of an irrigation zone of Mexico were used for building the models. Models were tested with samples of two consecutive years. The results show that M5-Prime and k-nearest neighbor techniques obtain the lowest average RMSE errors (5.14 and 4.91, the lowest RRSE errors (79.46% and 79.78%, the lowest average MAE errors (18.12% and 19.42%, and the highest average correlation factors (0.41 and 0.42. Since M5-Prime achieves the largest number of crop yield models with the lowest errors, it is a very suitable tool for massive crop yield prediction in agricultural planning.

  4. Prediction of Aerosol Optical Depth in West Asia: Machine Learning Methods versus Numerical Models

    Science.gov (United States)

    Omid Nabavi, Seyed; Haimberger, Leopold; Abbasi, Reyhaneh; Samimi, Cyrus

    2017-04-01

    Dust-prone areas of West Asia are releasing increasingly large amounts of dust particles during warm months. Because of the lack of ground-based observations in the region, this phenomenon is mainly monitored through remotely sensed aerosol products. The recent development of mesoscale Numerical Models (NMs) has offered an unprecedented opportunity to predict dust emission, and, subsequently Aerosol Optical Depth (AOD), at finer spatial and temporal resolutions. Nevertheless, the significant uncertainties in input data and simulations of dust activation and transport limit the performance of numerical models in dust prediction. The presented study aims to evaluate if machine-learning algorithms (MLAs), which require much less computational expense, can yield the same or even better performance than NMs. Deep blue (DB) AOD, which is observed by satellites but also predicted by MLAs and NMs, is used for validation. We concentrate our evaluations on the over dry Iraq plains, known as the main origin of recently intensified dust storms in West Asia. Here we examine the performance of four MLAs including Linear regression Model (LM), Support Vector Machine (SVM), Artificial Neural Network (ANN), Multivariate Adaptive Regression Splines (MARS). The Weather Research and Forecasting model coupled to Chemistry (WRF-Chem) and the Dust REgional Atmosphere Model (DREAM) are included as NMs. The MACC aerosol re-analysis of European Centre for Medium-range Weather Forecast (ECMWF) is also included, although it has assimilated satellite-based AOD data. Using the Recursive Feature Elimination (RFE) method, nine environmental features including soil moisture and temperature, NDVI, dust source function, albedo, dust uplift potential, vertical velocity, precipitation and 9-month SPEI drought index are selected for dust (AOD) modeling by MLAs. During the feature selection process, we noticed that NDVI and SPEI are of the highest importance in MLAs predictions. The data set was divided

  5. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  6. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    International Nuclear Information System (INIS)

    Kanemoto, Shigeru; Watanabe, Masaya; Yusa, Noritaka

    2014-01-01

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology

  7. What is the machine learning?

    Science.gov (United States)

    Chang, Spencer; Cohen, Timothy; Ostdiek, Bryan

    2018-03-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables—aided by physical intuition—that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus nonlinear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

  8. Machine learning topological states

    Science.gov (United States)

    Deng, Dong-Ling; Li, Xiaopeng; Das Sarma, S.

    2017-11-01

    Artificial neural networks and machine learning have now reached a new era after several decades of improvement where applications are to explode in many fields of science, industry, and technology. Here, we use artificial neural networks to study an intriguing phenomenon in quantum physics—the topological phases of matter. We find that certain topological states, either symmetry-protected or with intrinsic topological order, can be represented with classical artificial neural networks. This is demonstrated by using three concrete spin systems, the one-dimensional (1D) symmetry-protected topological cluster state and the 2D and 3D toric code states with intrinsic topological orders. For all three cases, we show rigorously that the topological ground states can be represented by short-range neural networks in an exact and efficient fashion—the required number of hidden neurons is as small as the number of physical spins and the number of parameters scales only linearly with the system size. For the 2D toric-code model, we find that the proposed short-range neural networks can describe the excited states with Abelian anyons and their nontrivial mutual statistics as well. In addition, by using reinforcement learning we show that neural networks are capable of finding the topological ground states of nonintegrable Hamiltonians with strong interactions and studying their topological phase transitions. Our results demonstrate explicitly the exceptional power of neural networks in describing topological quantum states, and at the same time provide valuable guidance to machine learning of topological phases in generic lattice models.

  9. Prediction of interactions between viral and host proteins using supervised machine learning methods.

    Directory of Open Access Journals (Sweden)

    Ranjan Kumar Barman

    Full Text Available BACKGROUND: Viral-host protein-protein interaction plays a vital role in pathogenesis, since it defines viral infection of the host and regulation of the host proteins. Identification of key viral-host protein-protein interactions (PPIs has great implication for therapeutics. METHODS: In this study, a systematic attempt has been made to predict viral-host PPIs by integrating different features, including domain-domain association, network topology and sequence information using viral-host PPIs from VirusMINT. The three well-known supervised machine learning methods, such as SVM, Naïve Bayes and Random Forest, which are commonly used in the prediction of PPIs, were employed to evaluate the performance measure based on five-fold cross validation techniques. RESULTS: Out of 44 descriptors, best features were found to be domain-domain association and methionine, serine and valine amino acid composition of viral proteins. In this study, SVM-based method achieved better sensitivity of 67% over Naïve Bayes (37.49% and Random Forest (55.66%. However the specificity of Naïve Bayes was the highest (99.52% as compared with SVM (74% and Random Forest (89.08%. Overall, the SVM and Random Forest achieved accuracy of 71% and 72.41%, respectively. The proposed SVM-based method was evaluated on blind dataset and attained a sensitivity of 64%, specificity of 83%, and accuracy of 74%. In addition, unknown potential targets of hepatitis B virus-human and hepatitis E virus-human PPIs have been predicted through proposed SVM model and validated by gene ontology enrichment analysis. Our proposed model shows that, hepatitis B virus "C protein" binds to membrane docking protein, while "X protein" and "P protein" interacts with cell-killing and metabolic process proteins, respectively. CONCLUSION: The proposed method can predict large scale interspecies viral-human PPIs. The nature and function of unknown viral proteins (HBV and HEV, interacting partners of host

  10. Identifying essential genes in bacterial metabolic networks with machine learning methods

    Science.gov (United States)

    2010-01-01

    Background Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective. Results We developed a machine learning technique to identify essential genes using the experimental data of genome-wide knock-out screens from one bacterial organism to infer essential genes of another related bacterial organism. We used a broad variety of topological features, sequence characteristics and co-expression properties potentially associated with essentiality, such as flux deviations, centrality, codon frequencies of the sequences, co-regulation and phyletic retention. An organism-wise cross-validation on bacterial species yielded reliable results with good accuracies (area under the receiver-operator-curve of 75% - 81%). Finally, it was applied to drug target predictions for Salmonella typhimurium. We compared our predictions to the viability of experimental knock-outs of S. typhimurium and identified 35 enzymes, which are highly relevant to be considered as potential drug targets. Specifically, we detected promising drug targets in the non-mevalonate pathway. Conclusions Using elaborated features characterizing network topology, sequence information and microarray data enables to predict essential genes from a bacterial reference organism to a related query organism without any knowledge about the essentiality of genes of the query organism. In general, such a method is beneficial for inferring drug targets when experimental data about genome-wide knockout screens is not available for the investigated organism. PMID:20438628

  11. Extensions and applications of ensemble-of-trees methods in machine learning

    Science.gov (United States)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  12. Adaptive Machine Aids to Learning.

    Science.gov (United States)

    Starkweather, John A.

    With emphasis on man-machine relationships and on machine evolution, computer-assisted instruction (CAI) is examined in this paper. The discussion includes the background of machine assistance to learning, the current status of CAI, directions of development, the development of criteria for successful instruction, meeting the needs of users,…

  13. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  14. Use of a Machine-learning Method for Predicting Highly Cited Articles Within General Radiology Journals.

    Science.gov (United States)

    Rosenkrantz, Andrew B; Doshi, Ankur M; Ginocchio, Luke A; Aphinyanaphongs, Yindalon

    2016-12-01

    This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the model's most influential article features. We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features. The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics). Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature. Copyright © 2016 The Association of University

  15. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms.

    Science.gov (United States)

    Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L

    2013-12-01

    The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  16. Using machine-learning methods to analyze economic loss function of quality management processes

    Science.gov (United States)

    Dzedik, V. A.; Lontsikh, P. A.

    2018-05-01

    During analysis of quality management systems, their economic component is often analyzed insufficiently. To overcome this issue, it is necessary to withdraw the concept of economic loss functions from tolerance thinking and address it. Input data about economic losses in processes have a complex form, thus, using standard tools to solve this problem is complicated. Use of machine learning techniques allows one to obtain precise models of the economic loss function based on even the most complex input data. Results of such analysis contain data about the true efficiency of a process and can be used to make investment decisions.

  17. Multipolar electrostatics based on the Kriging machine learning method: an application to serine.

    Science.gov (United States)

    Yuan, Yongna; Mills, Matthew J L; Popelier, Paul L A

    2014-04-01

    A multipolar, polarizable electrostatic method for future use in a novel force field is described. Quantum Chemical Topology (QCT) is used to partition the electron density of a chemical system into atoms, then the machine learning method Kriging is used to build models that relate the multipole moments of the atoms to the positions of their surrounding nuclei. The pilot system serine is used to study both the influence of the level of theory and the set of data generator methods used. The latter consists of: (i) sampling of protein structures deposited in the Protein Data Bank (PDB), or (ii) normal mode distortion along either (a) Cartesian coordinates, or (b) redundant internal coordinates. Wavefunctions for the sampled geometries were obtained at the HF/6-31G(d,p), B3LYP/apc-1, and MP2/cc-pVDZ levels of theory, prior to calculation of the atomic multipole moments by volume integration. The average absolute error (over an independent test set of conformations) in the total atom-atom electrostatic interaction energy of serine, using Kriging models built with the three data generator methods is 11.3 kJ mol⁻¹ (PDB), 8.2 kJ mol⁻¹ (Cartesian distortion), and 10.1 kJ mol⁻¹ (redundant internal distortion) at the HF/6-31G(d,p) level. At the B3LYP/apc-1 level, the respective errors are 7.7 kJ mol⁻¹, 6.7 kJ mol⁻¹, and 4.9 kJ mol⁻¹, while at the MP2/cc-pVDZ level they are 6.5 kJ mol⁻¹, 5.3 kJ mol⁻¹, and 4.0 kJ mol⁻¹. The ranges of geometries generated by the redundant internal coordinate distortion and by extraction from the PDB are much wider than the range generated by Cartesian distortion. The atomic multipole moment and electrostatic interaction energy predictions for the B3LYP/apc-1 and MP2/cc-pVDZ levels are similar, and both are better than the corresponding predictions at the HF/6-31G(d,p) level.

  18. A SEMI-AUTOMATIC RULE SET BUILDING METHOD FOR URBAN LAND COVER CLASSIFICATION BASED ON MACHINE LEARNING AND HUMAN KNOWLEDGE

    Directory of Open Access Journals (Sweden)

    H. Y. Gu

    2017-09-01

    Full Text Available Classification rule set is important for Land Cover classification, which refers to features and decision rules. The selection of features and decision are based on an iterative trial-and-error approach that is often utilized in GEOBIA, however, it is time-consuming and has a poor versatility. This study has put forward a rule set building method for Land cover classification based on human knowledge and machine learning. The use of machine learning is to build rule sets effectively which will overcome the iterative trial-and-error approach. The use of human knowledge is to solve the shortcomings of existing machine learning method on insufficient usage of prior knowledge, and improve the versatility of rule sets. A two-step workflow has been introduced, firstly, an initial rule is built based on Random Forest and CART decision tree. Secondly, the initial rule is analyzed and validated based on human knowledge, where we use statistical confidence interval to determine its threshold. The test site is located in Potsdam City. We utilised the TOP, DSM and ground truth data. The results show that the method could determine rule set for Land Cover classification semi-automatically, and there are static features for different land cover classes.

  19. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    Science.gov (United States)

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. All rights reserved.

  20. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  1. Machine Learning of Fault Friction

    Science.gov (United States)

    Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.

    2017-12-01

    We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025

  2. A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling.

    Science.gov (United States)

    Leger, Stefan; Zwanenburg, Alex; Pilz, Karoline; Lohaus, Fabian; Linge, Annett; Zöphel, Klaus; Kotzerke, Jörg; Schreiber, Andreas; Tinhofer, Inge; Budach, Volker; Sak, Ali; Stuschke, Martin; Balermpas, Panagiotis; Rödel, Claus; Ganswindt, Ute; Belka, Claus; Pigorsch, Steffi; Combs, Stephanie E; Mönnich, David; Zips, Daniel; Krause, Mechthild; Baumann, Michael; Troost, Esther G C; Löck, Steffen; Richter, Christian

    2017-10-16

    Radiomics applies machine learning algorithms to quantitative imaging data to characterise the tumour phenotype and predict clinical outcome. For the development of radiomics risk models, a variety of different algorithms is available and it is not clear which one gives optimal results. Therefore, we assessed the performance of 11 machine learning algorithms combined with 12 feature selection methods by the concordance index (C-Index), to predict loco-regional tumour control (LRC) and overall survival for patients with head and neck squamous cell carcinoma. The considered algorithms are able to deal with continuous time-to-event survival data. Feature selection and model building were performed on a multicentre cohort (213 patients) and validated using an independent cohort (80 patients). We found several combinations of machine learning algorithms and feature selection methods which achieve similar results, e.g. C-Index = 0.71 and BT-COX: C-Index = 0.70 in combination with Spearman feature selection. Using the best performing models, patients were stratified into groups of low and high risk of recurrence. Significant differences in LRC were obtained between both groups on the validation cohort. Based on the presented analysis, we identified a subset of algorithms which should be considered in future radiomics studies to develop stable and clinically relevant predictive models for time-to-event endpoints.

  3. Designing Focused Chemical Libraries Enriched in Protein-Protein Interaction Inhibitors using Machine-Learning Methods

    Science.gov (United States)

    Reynès, Christelle; Host, Hélène; Camproux, Anne-Claude; Laconde, Guillaume; Leroux, Florence; Mazars, Anne; Deprez, Benoit; Fahraeus, Robin; Villoutreix, Bruno O.; Sperandio, Olivier

    2010-01-01

    Protein-protein interactions (PPIs) may represent one of the next major classes of therapeutic targets. So far, only a minute fraction of the estimated 650,000 PPIs that comprise the human interactome are known with a tiny number of complexes being drugged. Such intricate biological systems cannot be cost-efficiently tackled using conventional high-throughput screening methods. Rather, time has come for designing new strategies that will maximize the chance for hit identification through a rationalization of the PPI inhibitor chemical space and the design of PPI-focused compound libraries (global or target-specific). Here, we train machine-learning-based models, mainly decision trees, using a dataset of known PPI inhibitors and of regular drugs in order to determine a global physico-chemical profile for putative PPI inhibitors. This statistical analysis unravels two important molecular descriptors for PPI inhibitors characterizing specific molecular shapes and the presence of a privileged number of aromatic bonds. The best model has been transposed into a computer program, PPI-HitProfiler, that can output from any drug-like compound collection a focused chemical library enriched in putative PPI inhibitors. Our PPI inhibitor profiler is challenged on the experimental screening results of 11 different PPIs among which the p53/MDM2 interaction screened within our own CDithem platform, that in addition to the validation of our concept led to the identification of 4 novel p53/MDM2 inhibitors. Collectively, our tool shows a robust behavior on the 11 experimental datasets by correctly profiling 70% of the experimentally identified hits while removing 52% of the inactive compounds from the initial compound collections. We strongly believe that this new tool can be used as a global PPI inhibitor profiler prior to screening assays to reduce the size of the compound collections to be experimentally screened while keeping most of the true PPI inhibitors. PPI-HitProfiler is

  4. Designing focused chemical libraries enriched in protein-protein interaction inhibitors using machine-learning methods.

    Directory of Open Access Journals (Sweden)

    Christelle Reynès

    2010-03-01

    Full Text Available Protein-protein interactions (PPIs may represent one of the next major classes of therapeutic targets. So far, only a minute fraction of the estimated 650,000 PPIs that comprise the human interactome are known with a tiny number of complexes being drugged. Such intricate biological systems cannot be cost-efficiently tackled using conventional high-throughput screening methods. Rather, time has come for designing new strategies that will maximize the chance for hit identification through a rationalization of the PPI inhibitor chemical space and the design of PPI-focused compound libraries (global or target-specific. Here, we train machine-learning-based models, mainly decision trees, using a dataset of known PPI inhibitors and of regular drugs in order to determine a global physico-chemical profile for putative PPI inhibitors. This statistical analysis unravels two important molecular descriptors for PPI inhibitors characterizing specific molecular shapes and the presence of a privileged number of aromatic bonds. The best model has been transposed into a computer program, PPI-HitProfiler, that can output from any drug-like compound collection a focused chemical library enriched in putative PPI inhibitors. Our PPI inhibitor profiler is challenged on the experimental screening results of 11 different PPIs among which the p53/MDM2 interaction screened within our own CDithem platform, that in addition to the validation of our concept led to the identification of 4 novel p53/MDM2 inhibitors. Collectively, our tool shows a robust behavior on the 11 experimental datasets by correctly profiling 70% of the experimentally identified hits while removing 52% of the inactive compounds from the initial compound collections. We strongly believe that this new tool can be used as a global PPI inhibitor profiler prior to screening assays to reduce the size of the compound collections to be experimentally screened while keeping most of the true PPI inhibitors. PPI

  5. Designing focused chemical libraries enriched in protein-protein interaction inhibitors using machine-learning methods.

    Science.gov (United States)

    Reynès, Christelle; Host, Hélène; Camproux, Anne-Claude; Laconde, Guillaume; Leroux, Florence; Mazars, Anne; Deprez, Benoit; Fahraeus, Robin; Villoutreix, Bruno O; Sperandio, Olivier

    2010-03-05

    Protein-protein interactions (PPIs) may represent one of the next major classes of therapeutic targets. So far, only a minute fraction of the estimated 650,000 PPIs that comprise the human interactome are known with a tiny number of complexes being drugged. Such intricate biological systems cannot be cost-efficiently tackled using conventional high-throughput screening methods. Rather, time has come for designing new strategies that will maximize the chance for hit identification through a rationalization of the PPI inhibitor chemical space and the design of PPI-focused compound libraries (global or target-specific). Here, we train machine-learning-based models, mainly decision trees, using a dataset of known PPI inhibitors and of regular drugs in order to determine a global physico-chemical profile for putative PPI inhibitors. This statistical analysis unravels two important molecular descriptors for PPI inhibitors characterizing specific molecular shapes and the presence of a privileged number of aromatic bonds. The best model has been transposed into a computer program, PPI-HitProfiler, that can output from any drug-like compound collection a focused chemical library enriched in putative PPI inhibitors. Our PPI inhibitor profiler is challenged on the experimental screening results of 11 different PPIs among which the p53/MDM2 interaction screened within our own CDithem platform, that in addition to the validation of our concept led to the identification of 4 novel p53/MDM2 inhibitors. Collectively, our tool shows a robust behavior on the 11 experimental datasets by correctly profiling 70% of the experimentally identified hits while removing 52% of the inactive compounds from the initial compound collections. We strongly believe that this new tool can be used as a global PPI inhibitor profiler prior to screening assays to reduce the size of the compound collections to be experimentally screened while keeping most of the true PPI inhibitors. PPI-HitProfiler is

  6. Online transfer learning with extreme learning machine

    Science.gov (United States)

    Yin, Haibo; Yang, Yun-an

    2017-05-01

    In this paper, we propose a new transfer learning algorithm for online training. The proposed algorithm, which is called Online Transfer Extreme Learning Machine (OTELM), is based on Online Sequential Extreme Learning Machine (OSELM) while it introduces Semi-Supervised Extreme Learning Machine (SSELM) to transfer knowledge from the source to the target domain. With the manifold regularization, SSELM picks out instances from the source domain that are less relevant to those in the target domain to initialize the online training, so as to improve the classification performance. Experimental results demonstrate that the proposed OTELM can effectively use instances in the source domain to enhance the learning performance.

  7. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

    Science.gov (United States)

    Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

    2011-01-01

    Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433

  8. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods

    Directory of Open Access Journals (Sweden)

    Pontil Massimiliano

    2009-10-01

    Full Text Available Abstract Background Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots" at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. Results We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which ΔΔG ≥ 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. Conclusion We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been

  9. Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning.

    Science.gov (United States)

    Sung, Yao-Ting; Chen, Ju-Ling; Cha, Ji-Her; Tseng, Hou-Chiang; Chang, Tao-Hsing; Chang, Kuo-En

    2015-06-01

    Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.

  10. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  11. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  12. Machine learning applications in genetics and genomics.

    Science.gov (United States)

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  13. Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

    Science.gov (United States)

    Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

    2014-02-01

    The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding

  14. BELM: Bayesian extreme learning machine.

    Science.gov (United States)

    Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J

    2011-03-01

    The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.

  15. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  16. A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.

    Science.gov (United States)

    Habibi, Narjeskhatoon; Mohd Hashim, Siti Z; Norouzi, Alireza; Samian, Mohammed Razip

    2014-05-08

    Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.

  17. Machine learning for healthcare technologies

    CERN Document Server

    Clifton, David A

    2016-01-01

    This book brings together chapters on the state-of-the-art in machine learning (ML) as it applies to the development of patient-centred technologies, with a special emphasis on 'big data' and mobile data.

  18. Machine Learning via Mathematical Programming

    National Research Council Canada - National Science Library

    Mamgasarian, Olivi

    1999-01-01

    Mathematical programming approaches were applied to a variety of problems in machine learning in order to gain deeper understanding of the problems and to come up with new and more efficient computational algorithms...

  19. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  20. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  1. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Louppe, Gilles; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu

    2012-01-01

    Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings....

  2. Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

    Science.gov (United States)

    Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

    2017-05-01

    Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.

  3. Machine Learning of Musical Gestures

    OpenAIRE

    Caramiaux, Baptiste; Tanaka, Atau

    2013-01-01

    We present an overview of machine learning (ML) techniques and theirapplication in interactive music and new digital instruments design. We firstgive to the non-specialist reader an introduction to two ML tasks,classification and regression, that are particularly relevant for gesturalinteraction. We then present a review of the literature in current NIMEresearch that uses ML in musical gesture analysis and gestural sound control.We describe the ways in which machine learning is useful for cre...

  4. Diagnosis of Dementia by Machine learning methods in Epidemiological studies: a pilot exploratory study from south India.

    Science.gov (United States)

    Bhagyashree, Sheshadri Iyengar Raghavan; Nagaraj, Kiran; Prince, Martin; Fall, Caroline H D; Krishna, Murali

    2018-01-01

    There are limited data on the use of artificial intelligence methods for the diagnosis of dementia in epidemiological studies in low- and middle-income country (LMIC) settings. A culture and education fair battery of cognitive tests was developed and validated for population based studies in low- and middle-income countries including India by the 10/66 Dementia Research Group. We explored the machine learning methods based on the 10/66 battery of cognitive tests for the diagnosis of dementia based in a birth cohort study in South India. The data sets for 466 men and women for this study were obtained from the on-going Mysore Studies of Natal effect of Health and Ageing (MYNAH), in south India. The data sets included: demographics, performance on the 10/66 cognitive function tests, the 10/66 diagnosis of mental disorders and population based normative data for the 10/66 battery of cognitive function tests. Diagnosis of dementia from the rule based approach was compared against the 10/66 diagnosis of dementia. We have applied machine learning techniques to identify minimal number of the 10/66 cognitive function tests required for diagnosing dementia and derived an algorithm to improve the accuracy of dementia diagnosis. Of 466 subjects, 27 had 10/66 diagnosis of dementia, 19 of whom were correctly identified as having dementia by Jrip classification with 100% accuracy. This pilot exploratory study indicates that machine learning methods can help identify community dwelling older adults with 10/66 criterion diagnosis of dementia with good accuracy in a LMIC setting such as India. This should reduce the duration of the diagnostic assessment and make the process easier and quicker for clinicians, patients and will be useful for 'case' ascertainment in population based epidemiological studies.

  5. Performance Evaluation of Machine Learning Methods for Leaf Area Index Retrieval from Time-Series MODIS Reflectance Data

    Science.gov (United States)

    Wang, Tongtong; Xiao, Zhiqiang; Liu, Zhigang

    2017-01-01

    Leaf area index (LAI) is an important biophysical parameter and the retrieval of LAI from remote sensing data is the only feasible method for generating LAI products at regional and global scales. However, most LAI retrieval methods use satellite observations at a specific time to retrieve LAI. Because of the impacts of clouds and aerosols, the LAI products generated by these methods are spatially incomplete and temporally discontinuous, and thus they cannot meet the needs of practical applications. To generate high-quality LAI products, four machine learning algorithms, including back-propagation neutral network (BPNN), radial basis function networks (RBFNs), general regression neutral networks (GRNNs), and multi-output support vector regression (MSVR) are proposed to retrieve LAI from time-series Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance data in this study and performance of these machine learning algorithms is evaluated. The results demonstrated that GRNNs, RBFNs, and MSVR exhibited low sensitivity to training sample size, whereas BPNN had high sensitivity. The four algorithms performed slightly better with red, near infrared (NIR), and short wave infrared (SWIR) bands than red and NIR bands, and the results were significantly better than those obtained using single band reflectance data (red or NIR). Regardless of band composition, GRNNs performed better than the other three methods. Among the four algorithms, BPNN required the least training time, whereas MSVR needed the most for any sample size. PMID:28045443

  6. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  7. A machine learning approach for efficient uncertainty quantification using multiscale methods

    Science.gov (United States)

    Chan, Shing; Elsheikh, Ahmed H.

    2018-02-01

    Several multiscale methods account for sub-grid scale features using coarse scale basis functions. For example, in the Multiscale Finite Volume method the coarse scale basis functions are obtained by solving a set of local problems over dual-grid cells. We introduce a data-driven approach for the estimation of these coarse scale basis functions. Specifically, we employ a neural network predictor fitted using a set of solution samples from which it learns to generate subsequent basis functions at a lower computational cost than solving the local problems. The computational advantage of this approach is realized for uncertainty quantification tasks where a large number of realizations has to be evaluated. We attribute the ability to learn these basis functions to the modularity of the local problems and the redundancy of the permeability patches between samples. The proposed method is evaluated on elliptic problems yielding very promising results.

  8. Materials Screening for the Discovery of New Half-Heuslers: Machine Learning versus ab Initio Methods.

    Science.gov (United States)

    Legrain, Fleur; Carrete, Jesús; van Roekeghem, Ambroise; Madsen, Georg K H; Mingo, Natalio

    2018-01-18

    Machine learning (ML) is increasingly becoming a helpful tool in the search for novel functional compounds. Here we use classification via random forests to predict the stability of half-Heusler (HH) compounds, using only experimentally reported compounds as a training set. Cross-validation yields an excellent agreement between the fraction of compounds classified as stable and the actual fraction of truly stable compounds in the ICSD. The ML model is then employed to screen 71 178 different 1:1:1 compositions, yielding 481 likely stable candidates. The predicted stability of HH compounds from three previous high-throughput ab initio studies is critically analyzed from the perspective of the alternative ML approach. The incomplete consistency among the three separate ab initio studies and between them and the ML predictions suggests that additional factors beyond those considered by ab initio phase stability calculations might be determinant to the stability of the compounds. Such factors can include configurational entropies and quasiharmonic contributions.

  9. Data Processing And Machine Learning Methods For Multi-Modal Operator State Classification Systems

    Science.gov (United States)

    Hearn, Tristan A.

    2015-01-01

    This document is intended as an introduction to a set of common signal processing learning methods that may be used in the software portion of a functional crew state monitoring system. This includes overviews of both the theory of the methods involved, as well as examples of implementation. Practical considerations are discussed for implementing modular, flexible, and scalable processing and classification software for a multi-modal, multi-channel monitoring system. Example source code is also given for all of the discussed processing and classification methods.

  10. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  11. Comparative analysis of expert and machine-learning methods for classification of body cavity effusions in companion animals.

    Science.gov (United States)

    Hotz, Christine S; Templeton, Steven J; Christopher, Mary M

    2005-03-01

    A rule-based expert system using CLIPS programming language was created to classify body cavity effusions as transudates, modified transudates, exudates, chylous, and hemorrhagic effusions. The diagnostic accuracy of the rule-based system was compared with that produced by 2 machine-learning methods: Rosetta, a rough sets algorithm and RIPPER, a rule-induction method. Results of 508 body cavity fluid analyses (canine, feline, equine) obtained from the University of California-Davis Veterinary Medical Teaching Hospital computerized patient database were used to test CLIPS and to test and train RIPPER and Rosetta. The CLIPS system, using 17 rules, achieved an accuracy of 93.5% compared with pathologist consensus diagnoses. Rosetta accurately classified 91% of effusions by using 5,479 rules. RIPPER achieved the greatest accuracy (95.5%) using only 10 rules. When the original rules of the CLIPS application were replaced with those of RIPPER, the accuracy rates were identical. These results suggest that both rule-based expert systems and machine-learning methods hold promise for the preliminary classification of body fluids in the clinical laboratory.

  12. Detection of needle to nerve contact based on electric bioimpedance and machine learning methods.

    Science.gov (United States)

    Kalvoy, Havard; Tronstad, Christian; Ullensvang, Kyrre; Steinfeldt, Thorsten; Sauter, Axel R

    2017-07-01

    In an ongoing project for electrical impedance-based needle guidance we have previously showed in an animal model that intraneural needle positions can be detected with bioimpedance measurement. To enhance the power of this method we in this study have investigated whether an early detection of the needle only touching the nerve also is feasible. Measurement of complex impedance during needle to nerve contact was compared with needle positions in surrounding tissues in a volunteer study on 32 subjects. Classification analysis using Support-Vector Machines demonstrated that discrimination is possible, but that the sensitivity and specificity for the nerve touch algorithm not is at the same level of performance as for intra-neuralintraneural detection.

  13. An introduction to quantum machine learning

    Science.gov (United States)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  14. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

    Science.gov (United States)

    2013-01-01

    Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313

  15. Surface electromyography based muscle fatigue detection using high-resolution time-frequency methods and machine learning algorithms.

    Science.gov (United States)

    Karthick, P A; Ghosh, Diptasree Maitra; Ramakrishnan, S

    2018-02-01

    Surface electromyography (sEMG) based muscle fatigue research is widely preferred in sports science and occupational/rehabilitation studies due to its noninvasiveness. However, these signals are complex, multicomponent and highly nonstationary with large inter-subject variations, particularly during dynamic contractions. Hence, time-frequency based machine learning methodologies can improve the design of automated system for these signals. In this work, the analysis based on high-resolution time-frequency methods, namely, Stockwell transform (S-transform), B-distribution (BD) and extended modified B-distribution (EMBD) are proposed to differentiate the dynamic muscle nonfatigue and fatigue conditions. The nonfatigue and fatigue segments of sEMG signals recorded from the biceps brachii of 52 healthy volunteers are preprocessed and subjected to S-transform, BD and EMBD. Twelve features are extracted from each method and prominent features are selected using genetic algorithm (GA) and binary particle swarm optimization (BPSO). Five machine learning algorithms, namely, naïve Bayes, support vector machine (SVM) of polynomial and radial basis kernel, random forest and rotation forests are used for the classification. The results show that all the proposed time-frequency distributions (TFDs) are able to show the nonstationary variations of sEMG signals. Most of the features exhibit statistically significant difference in the muscle fatigue and nonfatigue conditions. The maximum number of features (66%) is reduced by GA and BPSO for EMBD and BD-TFD respectively. The combination of EMBD- polynomial kernel based SVM is found to be most accurate (91% accuracy) in classifying the conditions with the features selected using GA. The proposed methods are found to be capable of handling the nonstationary and multicomponent variations of sEMG signals recorded in dynamic fatiguing contractions. Particularly, the combination of EMBD- polynomial kernel based SVM could be used to

  16. Bayesian and Classical Machine Learning Methods: A Comparison for Tree Species Classification with LiDAR Waveform Signatures

    Directory of Open Access Journals (Sweden)

    Tan Zhou

    2017-12-01

    Full Text Available A plethora of information contained in full-waveform (FW Light Detection and Ranging (LiDAR data offers prospects for characterizing vegetation structures. This study aims to investigate the capacity of FW LiDAR data alone for tree species identification through the integration of waveform metrics with machine learning methods and Bayesian inference. Specifically, we first conducted automatic tree segmentation based on the waveform-based canopy height model (CHM using three approaches including TreeVaW, watershed algorithms and the combination of TreeVaW and watershed (TW algorithms. Subsequently, the Random forests (RF and Conditional inference forests (CF models were employed to identify important tree-level waveform metrics derived from three distinct sources, such as raw waveforms, composite waveforms, the waveform-based point cloud and the combined variables from these three sources. Further, we discriminated tree (gray pine, blue oak, interior live oak and shrub species through the RF, CF and Bayesian multinomial logistic regression (BMLR using important waveform metrics identified in this study. Results of the tree segmentation demonstrated that the TW algorithms outperformed other algorithms for delineating individual tree crowns. The CF model overcomes waveform metrics selection bias caused by the RF model which favors correlated metrics and enhances the accuracy of subsequent classification. We also found that composite waveforms are more informative than raw waveforms and waveform-based point cloud for characterizing tree species in our study area. Both classical machine learning methods (the RF and CF and the BMLR generated satisfactory average overall accuracy (74% for the RF, 77% for the CF and 81% for the BMLR and the BMLR slightly outperformed the other two methods. However, these three methods suffered from low individual classification accuracy for the blue oak which is prone to being misclassified as the interior live oak due

  17. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  18. Recent Advances in Predictive (Machine) Learning

    Energy Technology Data Exchange (ETDEWEB)

    Friedman, J

    2004-01-24

    Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

  19. Improving Hyperspectral Image Classification Method for Fine Land Use Assessment Application Using Semisupervised Machine Learning

    Directory of Open Access Journals (Sweden)

    Chunyang Wang

    2015-01-01

    Full Text Available Study on land use/cover can reflect changing rules of population, economy, agricultural structure adjustment, policy, and traffic and provide better service for the regional economic development and urban evolution. The study on fine land use/cover assessment using hyperspectral image classification is a focal growing area in many fields. Semisupervised learning method which takes a large number of unlabeled samples and minority labeled samples, improving classification and predicting the accuracy effectively, has been a new research direction. In this paper, we proposed improving fine land use/cover assessment based on semisupervised hyperspectral classification method. The test analysis of study area showed that the advantages of semisupervised classification method could improve the high precision overall classification and objective assessment of land use/cover results.

  20. A Hybrid Machine Learning Method for Fusing fMRI and Genetic Data: Combining both Improves Classification of Schizophrenia

    Directory of Open Access Journals (Sweden)

    Honghui Yang

    2010-10-01

    Full Text Available We demonstrate a hybrid machine learning method to classify schizophrenia patients and healthy controls, using functional magnetic resonance imaging (fMRI and single nucleotide polymorphism (SNP data. The method consists of four stages: (1 SNPs with the most discriminating information between the healthy controls and schizophrenia patients are selected to construct a support vector machine ensemble (SNP-SVME. (2 Voxels in the fMRI map contributing to classification are selected to build another SVME (Voxel-SVME. (3 Components of fMRI activation obtained with independent component analysis (ICA are used to construct a single SVM classifier (ICA-SVMC. (4 The above three models are combined into a single module using a majority voting approach to make a final decision (Combined SNP-fMRI. The method was evaluated by a fully-validated leave-one-out method using 40 subjects (20 patients and 20 controls. The classification accuracy was: 0.74 for SNP-SVME, 0.82 for Voxel-SVME, 0.83 for ICA-SVMC, and 0.87 for Combined SNP-fMRI. Experimental results show that better classification accuracy was achieved by combining genetic and fMRI data than using either alone, indicating that genetic and brain function representing different, but partially complementary aspects, of schizophrenia etiopathology. This study suggests an effective way to reassess biological classification of individuals with schizophrenia, which is also potentially useful for identifying diagnostically important markers for the disorder.

  1. Application of machine learning methods in big data analytics at management of contracts in the construction industry

    Directory of Open Access Journals (Sweden)

    Valpeters Marina

    2018-01-01

    Full Text Available The number of experts who realize the importance of big data continues to increase in various fields of the economy. Experts begin to use big data more frequently for the solution of their specific objectives. One of the probable big data tasks in the construction industry is the determination of the probability of contract execution at a stage of its establishment. The contract holder cannot guarantee execution of the contract. Therefore it leads to a lot of risks for the customer. This article is devoted to the applicability of machine learning methods to the task of determination of the probability of a successful contract execution. Authors try to reveal the factors influencing the possibility of contract default and then try to define the following corrective actions for a customer. In the problem analysis, authors used the linear and non-linear algorithms, feature extraction, feature transformation and feature selection. The results of investigation include the prognostic models with a predictive force based on the machine learning algorithms such as logistic regression, decision tree, randomize forest. Authors have validated models on available historical data. The developed models have the potential for practical use in the construction organizations while making new contracts.

  2. Prediction of Backbreak in Open-Pit Blasting Operations Using the Machine Learning Method

    Science.gov (United States)

    Khandelwal, Manoj; Monjezi, M.

    2013-03-01

    Backbreak is an undesirable phenomenon in blasting operations. It can cause instability of mine walls, falling down of machinery, improper fragmentation, reduced efficiency of drilling, etc. The existence of various effective parameters and their unknown relationships are the main reasons for inaccuracy of the empirical models. Presently, the application of new approaches such as artificial intelligence is highly recommended. In this paper, an attempt has been made to predict backbreak in blasting operations of Soungun iron mine, Iran, incorporating rock properties and blast design parameters using the support vector machine (SVM) method. To investigate the suitability of this approach, the predictions by SVM have been compared with multivariate regression analysis (MVRA). The coefficient of determination (CoD) and the mean absolute error (MAE) were taken as performance measures. It was found that the CoD between measured and predicted backbreak was 0.987 and 0.89 by SVM and MVRA, respectively, whereas the MAE was 0.29 and 1.07 by SVM and MVRA, respectively.

  3. Deep learning: Using machine learning to study biological vision

    OpenAIRE

    Majaj, Najib; Pelli, Denis

    2017-01-01

    Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.

  4. Higgs Machine Learning Challenge 2014

    CERN Document Server

    Olivier, A-P; Bourdarios, C ; LAL / Orsay; Goldfarb, S ; University of Michigan

    2014-01-01

    High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics. We are now exploring new ways to improve the cross-fertilization of the two fields by setting up a data challenge, following the footsteps of, among others, the astrophysics community (dark matter and galaxy zoo challenges) and neurobiology (connectomics and decoding the human brain). The organization committee consists of ATLAS physicists and machine learning researchers. The Challenge will run from Monday 12th to September 2014.

  5. Neuromorphic Deep Learning Machines

    OpenAIRE

    Neftci, E; Augustine, C; Paul, S; Detorakis, G

    2017-01-01

    An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Back Propagation (BP) rule, often relies on the immediate availability of network-wide...

  6. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  7. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  8. Teraflop-scale Incremental Machine Learning

    OpenAIRE

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-u...

  9. In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods.

    Science.gov (United States)

    Cheng, Feixiong; Shen, Jie; Yu, Yue; Li, Weihua; Liu, Guixia; Lee, Philip W; Tang, Yun

    2011-03-01

    There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  10. Remotely controlling of mobile robots using gesture captured by the Kinect and recognized by machine learning method

    Science.gov (United States)

    Hsu, Roy CHaoming; Jian, Jhih-Wei; Lin, Chih-Chuan; Lai, Chien-Hung; Liu, Cheng-Ting

    2013-01-01

    The main purpose of this paper is to use machine learning method and Kinect and its body sensation technology to design a simple, convenient, yet effective robot remote control system. In this study, a Kinect sensor is used to capture the human body skeleton with depth information, and a gesture training and identification method is designed using the back propagation neural network to remotely command a mobile robot for certain actions via the Bluetooth. The experimental results show that the designed mobile robots remote control system can achieve, on an average, more than 96% of accurate identification of 7 types of gestures and can effectively control a real e-puck robot for the designed commands.

  11. Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors.

    Science.gov (United States)

    Rivera-Borroto, Oscar Miguel; Marrero-Ponce, Yovani; García-de la Vega, José Manuel; Grau-Ábalo, Ricardo del Corazón

    2011-12-27

    Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors such as molecular representation, mathematical method, algorithmical technique, and statistical distribution of data. For this reason, introduction and comparison of new methods are necessary in order to find the model that best fits the problem at hand. Earlier comparative studies report on Ward's algorithm using fingerprints for molecular description as generally superior in this field. However, problems still remain, i.e., other types of numerical descriptions have been little exploited, current descriptors selection strategy is trial and error-driven, and no previous comparative studies considering a broader domain of the combinatorial methods in grouping chemoinformatic data sets have been conducted. In this work, a comparison between combinatorial methods is performed,with five of them being novel in cheminformatics. The experiments are carried out using eight data sets that are well established and validated in the medical chemistry literature. Each drug data set was represented by real molecular descriptors selected by machine learning techniques, which are consistent with the neighborhood principle. Statistical analysis of the results demonstrates that pharmacological activities of the eight data sets can be modeled with a few of families with 2D and 3D molecular descriptors, avoiding classification problems associated with the presence of nonrelevant features. Three out of five of the proposed cluster algorithms show superior performance over most classical algorithms and are similar (or slightly superior in the most optimistic sense) to Ward's algorithm. The usefulness of these algorithms is also assessed in a comparative experiment to potent QSAR and machine learning classifiers, where they perform

  12. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  13. Application of Machine Learning Techniques in Aquaculture

    OpenAIRE

    Rahman, Akhlaqur; Tasnim, Sumaira

    2014-01-01

    In this paper we present applications of different machine learning algorithms in aquaculture. Machine learning algorithms learn models from historical data. In aquaculture historical data are obtained from farm practices, yields, and environmental data sources. Associations between these different variables can be obtained by applying machine learning algorithms to historical data. In this paper we present applications of different machine learning algorithms in aquaculture applications.

  14. Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM).

    Science.gov (United States)

    Dipnall, J F; Pasco, J A; Berk, M; Williams, L J; Dodd, S; Jacka, F N; Meyer, D

    2017-01-01

    Key lifestyle-environ risk factors are operative for depression, but it is unclear how risk factors cluster. Machine-learning (ML) algorithms exist that learn, extract, identify and map underlying patterns to identify groupings of depressed individuals without constraints. The aim of this research was to use a large epidemiological study to identify and characterise depression clusters through "Graphing lifestyle-environs using machine-learning methods" (GLUMM). Two ML algorithms were implemented: unsupervised Self-organised mapping (SOM) to create GLUMM clusters and a supervised boosted regression algorithm to describe clusters. Ninety-six "lifestyle-environ" variables were used from the National health and nutrition examination study (2009-2010). Multivariate logistic regression validated clusters and controlled for possible sociodemographic confounders. The SOM identified two GLUMM cluster solutions. These solutions contained one dominant depressed cluster (GLUMM5-1, GLUMM7-1). Equal proportions of members in each cluster rated as highly depressed (17%). Alcohol consumption and demographics validated clusters. Boosted regression identified GLUMM5-1 as more informative than GLUMM7-1. Members were more likely to: have problems sleeping; unhealthy eating; ≤2 years in their home; an old home; perceive themselves underweight; exposed to work fumes; experienced sex at ≤14 years; not perform moderate recreational activities. A positive relationship between GLUMM5-1 (OR: 7.50, Pdepression was found, with significant interactions with those married/living with partner (P=0.001). Using ML based GLUMM to form ordered depressive clusters from multitudinous lifestyle-environ variables enabled a deeper exploration of the heterogeneous data to uncover better understandings into relationships between the complex mental health factors. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  15. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    Science.gov (United States)

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  16. New Learning Method of a Lecture of ‘Machine Fabrication’ by Self-study with Investigation and Presentation Incorporated

    Science.gov (United States)

    Kasuga, Yukio

    A new teaching method was developed in learningmachine fabrication’ for the undergraduate students. This consists of a few times of lectures, grouping, decision of industrial products which each group wants to investigate, investigation work by library books and internet, arrangement of data containing characteristics of the products, employed materials and processing methods, presentation, discussions and revision followed by another presentation. This new method is derived from one of the Finland‧s way of primary school education. Their way of education is believed to have boosted up to the top ranking in PISA tests by OECD. After starting the new way of learning, students have fresh impressions on this lesson, especially for self-study, the way of investigation, collaborate work and presentation. Also, after four years of implementation, some improvements have been made including less use of internet, and determination of products and fabricating methods in advance which should be investigated. By this, students‧ lecture assessment shows further encouraging results.

  17. Comparison between stochastic and machine learning methods for hydrological multi-step ahead forecasting: All forecasts are wrong!

    Science.gov (United States)

    Papacharalampous, Georgia; Tyralis, Hristos; Koutsoyiannis, Demetris

    2017-04-01

    Machine learning (ML) is considered to be a promising approach to hydrological processes forecasting. We conduct a comparison between several stochastic and ML point estimation methods by performing large-scale computational experiments based on simulations. The purpose is to provide generalized results, while the respective comparisons in the literature are usually based on case studies. The stochastic methods used include simple methods, models from the frequently used families of Autoregressive Moving Average (ARMA), Autoregressive Fractionally Integrated Moving Average (ARFIMA) and Exponential Smoothing models. The ML methods used are Random Forests (RF), Support Vector Machines (SVM) and Neural Networks (NN). The comparison refers to the multi-step ahead forecasting properties of the methods. A total of 20 methods are used, among which 9 are the ML methods. 12 simulation experiments are performed, while each of them uses 2 000 simulated time series of 310 observations. The time series are simulated using stochastic processes from the families of ARMA and ARFIMA models. Each time series is split into a fitting (first 300 observations) and a testing set (last 10 observations). The comparative assessment of the methods is based on 18 metrics, that quantify the methods' performance according to several criteria related to the accurate forecasting of the testing set, the capturing of its variation and the correlation between the testing and forecasted values. The most important outcome of this study is that there is not a uniformly better or worse method. However, there are methods that are regularly better or worse than others with respect to specific metrics. It appears that, although a general ranking of the methods is not possible, their classification based on their similar or contrasting performance in the various metrics is possible to some extent. Another important conclusion is that more sophisticated methods do not necessarily provide better forecasts

  18. Palate Shape and Depth: A Shape-Matching and Machine Learning Method for Estimating Ancestry from Human Skeletal Remains.

    Science.gov (United States)

    Maier, Christopher A; Zhang, Kang; Manhein, Mary H; Li, Xin

    2015-09-01

    In the past, assessing ancestry relied on the naked eye and observer experience; however, replicability has become an important aspect of such analysis through the application of metric techniques. This study examines palate shape and assesses ancestry quantitatively using a 3D digitizer and shape-matching and machine learning methods. Palate curves and depths were recorded, processed, and tested for 376 individuals. Palate shape was an accurate indicator of ancestry in 58% of cases. Cluster analysis revealed that the parabolic, hyperbolic, and elliptical shapes are discrete from one another. Preliminary results indicate that palate depth in Hispanic individuals is greatest. Palate shape appears to be a useful indicator of ancestry, particularly when assessed by a computer. However, these data suggest that palate shape is not useful for assessing ancestry in Hispanic individuals. Although ancestry may be determined from palate shape, the use of multiple features is recommended and more reliable. © 2015 American Academy of Forensic Sciences.

  19. Machine Learning applications in CMS

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine Learning is used in many aspects of CMS data taking, monitoring, processing and analysis. We review a few of these use cases and the most recent developments, with an outlook to future applications in the LHC Run III and for the High-Luminosity phase.

  20. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  1. Visible Machine Learning for Biomedicine.

    Science.gov (United States)

    Yu, Michael K; Ma, Jianzhu; Fisher, Jasmin; Kreisberg, Jason F; Raphael, Benjamin J; Ideker, Trey

    2018-06-14

    A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology. Copyright © 2018. Published by Elsevier Inc.

  2. Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals.

    Science.gov (United States)

    Pereira, Florbela; Xiao, Kaixia; Latino, Diogo A R S; Wu, Chengcheng; Zhang, Qingyou; Aires-de-Sousa, Joao

    2017-01-23

    Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).

  3. Design and Selection of Machine Learning Methods Using Radiomics and Dosiomics for Normal Tissue Complication Probability Modeling of Xerostomia

    Directory of Open Access Journals (Sweden)

    Hubert S. Gabryś

    2018-03-01

    Full Text Available PurposeThe purpose of this study is to investigate whether machine learning with dosiomic, radiomic, and demographic features allows for xerostomia risk assessment more precise than normal tissue complication probability (NTCP models based on the mean radiation dose to parotid glands.Material and methodsA cohort of 153 head-and-neck cancer patients was used to model xerostomia at 0–6 months (early, 6–15 months (late, 15–24 months (long-term, and at any time (a longitudinal model after radiotherapy. Predictive power of the features was evaluated by the area under the receiver operating characteristic curve (AUC of univariate logistic regression models. The multivariate NTCP models were tuned and tested with single and nested cross-validation, respectively. We compared predictive performance of seven classification algorithms, six feature selection methods, and ten data cleaning/class balancing techniques using the Friedman test and the Nemenyi post hoc analysis.ResultsNTCP models based on the parotid mean dose failed to predict xerostomia (AUCs < 0.60. The most informative predictors were found for late and long-term xerostomia. Late xerostomia correlated with the contralateral dose gradient in the anterior–posterior (AUC = 0.72 and the right–left (AUC = 0.68 direction, whereas long-term xerostomia was associated with parotid volumes (AUCs > 0.85, dose gradients in the right–left (AUCs > 0.78, and the anterior–posterior (AUCs > 0.72 direction. Multivariate models of long-term xerostomia were typically based on the parotid volume, the parotid eccentricity, and the dose–volume histogram (DVH spread with the generalization AUCs ranging from 0.74 to 0.88. On average, support vector machines and extra-trees were the top performing classifiers, whereas the algorithms based on logistic regression were the best choice for feature selection. We found no advantage in using data cleaning or class balancing

  4. A heuristic method for simulating open-data of arbitrary complexity that can be used to compare and evaluate machine learning methods.

    Science.gov (United States)

    Moore, Jason H; Shestov, Maksim; Schmitt, Peter; Olson, Randal S

    2018-01-01

    A central challenge of developing and evaluating artificial intelligence and machine learning methods for regression and classification is access to data that illuminates the strengths and weaknesses of different methods. Open data plays an important role in this process by making it easy for computational researchers to easily access real data for this purpose. Genomics has in some examples taken a leading role in the open data effort starting with DNA microarrays. While real data from experimental and observational studies is necessary for developing computational methods it is not sufficient. This is because it is not possible to know what the ground truth is in real data. This must be accompanied by simulated data where that balance between signal and noise is known and can be directly evaluated. Unfortunately, there is a lack of methods and software for simulating data with the kind of complexity found in real biological and biomedical systems. We present here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating complex biological and biomedical data. Further, we introduce new methods for developing simulation models that generate data that specifically allows discrimination between different machine learning methods.

  5. Classification of forest development stages from national low-density lidar datasets: a comparison of machine learning methods

    Directory of Open Access Journals (Sweden)

    R. Valbuena

    2016-02-01

    Full Text Available The area-based method has become a widespread approach in airborne laser scanning (ALS, being mainly employed for the estimation of continuous variables describing forest attributes: biomass, volume, density, etc. However, to date, classification methods based on machine learning, which are fairly common in other remote sensing fields, such as land use / land cover classification using multispectral sensors, have been largely overseen in forestry applications of ALS. In this article, we wish to draw the attention on statistical methods predicting discrete responses, for supervised classification of ALS datasets. A wide spectrum of approaches are reviewed: discriminant analysis (DA using various classifiers –maximum likelihood, minimum volume ellipsoid, naïve Bayes–, support vector machine (SVM, artificial neural networks (ANN, random forest (RF and nearest neighbour (NN methods. They are compared in the context of a classification of forest areas into development classes (DC used in practical silvicultural management in Finland, using their low-density national ALS dataset. We observed that RF and NN had the most balanced error matrices, with cross-validated predictions which were mainly unbiased for all DCs. Although overall accuracies were higher for SVM and ANN, their results were very dissimilar across DCs, and they can therefore be only advantageous if certain DCs are targeted. DA methods underperformed in comparison to other alternatives, and were only advantageous for the detection of seedling stands. These results show that, besides the well demonstrated capacity of ALS for quantifying forest stocks, there is a great deal of potential for predicting categorical variables in general, and forest types in particular. In conclusion, we consider that the presented methodology shall also be adapted to the type of forest classes that can be relevant to Mediterranean ecosystems, opening a range of possibilities for future research, in which

  6. An active role for machine learning in drug development

    Science.gov (United States)

    Murphy, Robert F.

    2014-01-01

    Due to the complexity of biological systems, cutting-edge machine-learning methods will be critical for future drug development. In particular, machine-vision methods to extract detailed information from imaging assays and active-learning methods to guide experimentation will be required to overcome the dimensionality problem in drug development. PMID:21587249

  7. Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information.

    Science.gov (United States)

    An, Ji-Yong; Zhang, Lei; Zhou, Yong; Zhao, Yu-Jun; Wang, Da-Fu

    2017-08-18

    Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .

  8. Design and Selection of Machine Learning Methods Using Radiomics and Dosiomics for Normal Tissue Complication Probability Modeling of Xerostomia.

    Science.gov (United States)

    Gabryś, Hubert S; Buettner, Florian; Sterzing, Florian; Hauswald, Henrik; Bangert, Mark

    2018-01-01

    The purpose of this study is to investigate whether machine learning with dosiomic, radiomic, and demographic features allows for xerostomia risk assessment more precise than normal tissue complication probability (NTCP) models based on the mean radiation dose to parotid glands. A cohort of 153 head-and-neck cancer patients was used to model xerostomia at 0-6 months (early), 6-15 months (late), 15-24 months (long-term), and at any time (a longitudinal model) after radiotherapy. Predictive power of the features was evaluated by the area under the receiver operating characteristic curve (AUC) of univariate logistic regression models. The multivariate NTCP models were tuned and tested with single and nested cross-validation, respectively. We compared predictive performance of seven classification algorithms, six feature selection methods, and ten data cleaning/class balancing techniques using the Friedman test and the Nemenyi post hoc analysis. NTCP models based on the parotid mean dose failed to predict xerostomia (AUCs  0.85), dose gradients in the right-left (AUCs > 0.78), and the anterior-posterior (AUCs > 0.72) direction. Multivariate models of long-term xerostomia were typically based on the parotid volume, the parotid eccentricity, and the dose-volume histogram (DVH) spread with the generalization AUCs ranging from 0.74 to 0.88. On average, support vector machines and extra-trees were the top performing classifiers, whereas the algorithms based on logistic regression were the best choice for feature selection. We found no advantage in using data cleaning or class balancing methods. We demonstrated that incorporation of organ- and dose-shape descriptors is beneficial for xerostomia prediction in highly conformal radiotherapy treatments. Due to strong reliance on patient-specific, dose-independent factors, our results underscore the need for development of personalized data-driven risk profiles for NTCP models of xerostomia. The facilitated

  9. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  10. Machine learning methods for locating re-entrant drivers from electrograms in a model of atrial fibrillation

    Science.gov (United States)

    McGillivray, Max Falkenberg; Cheng, William; Peters, Nicholas S.; Christensen, Kim

    2018-04-01

    Mapping resolution has recently been identified as a key limitation in successfully locating the drivers of atrial fibrillation (AF). Using a simple cellular automata model of AF, we demonstrate a method by which re-entrant drivers can be located quickly and accurately using a collection of indirect electrogram measurements. The method proposed employs simple, out-of-the-box machine learning algorithms to correlate characteristic electrogram gradients with the displacement of an electrogram recording from a re-entrant driver. Such a method is less sensitive to local fluctuations in electrical activity. As a result, the method successfully locates 95.4% of drivers in tissues containing a single driver, and 95.1% (92.6%) for the first (second) driver in tissues containing two drivers of AF. Additionally, we demonstrate how the technique can be applied to tissues with an arbitrary number of drivers. In its current form, the techniques presented are not refined enough for a clinical setting. However, the methods proposed offer a promising path for future investigations aimed at improving targeted ablation for AF.

  11. Book review: A first course in Machine Learning

    DEFF Research Database (Denmark)

    Ortiz-Arroyo, Daniel

    2016-01-01

    "The new edition of A First Course in Machine Learning by Rogers and Girolami is an excellent introduction to the use of statistical methods in machine learning. The book introduces concepts such as mathematical modeling, inference, and prediction, providing ‘just in time’ the essential background...... to change models and parameter values to make [it] easier to understand and apply these models in real applications. The authors [also] introduce more advanced, state-of-the-art machine learning methods, such as Gaussian process models and advanced mixture models, which are used across machine learning....... This makes the book interesting not only to students with little or no background in machine learning but also to more advanced graduate students interested in statistical approaches to machine learning." —Daniel Ortiz-Arroyo, Associate Professor, Aalborg University Esbjerg, Denmark...

  12. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  13. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  14. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  15. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  16. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  17. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study.

    Science.gov (United States)

    Lyles, Courtney Rees; Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-06-10

    It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals' opinions and the decision-making process. We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms "cervical cancer," "Pap smear," and "Pap test" were analyzed during: (1) January 1-March 13, 2012, and (2) March 14-June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, "New Screening Guidelines for Cervical Cancer"), and many topics in both time periods were addressing basic screening promotion and education (eg, "It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.") It was demonstrated that machine learning

  18. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  19. Combining Formal Logic and Machine Learning for Sentiment Analysis

    DEFF Research Database (Denmark)

    Petersen, Niklas Christoffer; Villadsen, Jørgen

    2014-01-01

    This paper presents a formal logical method for deep structural analysis of the syntactical properties of texts using machine learning techniques for efficient syntactical tagging. To evaluate the method it is used for entity level sentiment analysis as an alternative to pure machine learning...

  20. A method to combine target volume data from 3D and 4D planned thoracic radiotherapy patient cohorts for machine learning applications

    NARCIS (Netherlands)

    Johnson, Corinne; Price, Gareth; Khalifa, Jonathan; Faivre-Finn, Corinne; Dekker, Andre; Moore, Christopher; van Herk, Marcel

    2017-01-01

    The gross tumour volume (GTV) is predictive of clinical outcome and consequently features in many machine-learned models. 4D-planning, however, has prompted substitution of the GTV with the internal gross target volume (iGTV). We present and validate a method to synthesise GTV data from the iGTV,

  1. Galaxy Classification using Machine Learning

    Science.gov (United States)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  2. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  3. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM...

  4. Stochastic weather inputs for improved urban water demand forecasting: application of nonlinear input variable selection and machine learning methods

    Science.gov (United States)

    Quilty, J.; Adamowski, J. F.

    2015-12-01

    Urban water supply systems are often stressed during seasonal outdoor water use as water demands related to the climate are variable in nature making it difficult to optimize the operation of the water supply system. Urban water demand forecasts (UWD) failing to include meteorological conditions as inputs to the forecast model may produce poor forecasts as they cannot account for the increase/decrease in demand related to meteorological conditions. Meteorological records stochastically simulated into the future can be used as inputs to data-driven UWD forecasts generally resulting in improved forecast accuracy. This study aims to produce data-driven UWD forecasts for two different Canadian water utilities (Montreal and Victoria) using machine learning methods by first selecting historical UWD and meteorological records derived from a stochastic weather generator using nonlinear input variable selection. The nonlinear input variable selection methods considered in this work are derived from the concept of conditional mutual information, a nonlinear dependency measure based on (multivariate) probability density functions and accounts for relevancy, conditional relevancy, and redundancy from a potential set of input variables. The results of our study indicate that stochastic weather inputs can improve UWD forecast accuracy for the two sites considered in this work. Nonlinear input variable selection is suggested as a means to identify which meteorological conditions should be utilized in the forecast.

  5. A review of machine learning in obesity.

    Science.gov (United States)

    DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M

    2018-05-01

    Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.

  6. ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

    Science.gov (United States)

    2013-01-01

    Background Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case–control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. Results We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual’s continental and sub-continental ancestry. To predict an individual’s continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control’s λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of

  7. Novel computational methods to predict drug–target interactions using graph mining and machine learning approaches

    KAUST Repository

    Olayan, Rawan S.

    2017-12-01

    Computational drug repurposing aims at finding new medical uses for existing drugs. The identification of novel drug-target interactions (DTIs) can be a useful part of such a task. Computational determination of DTIs is a convenient strategy for systematic screening of a large number of drugs in the attempt to identify new DTIs at low cost and with reasonable accuracy. This necessitates development of accurate computational methods that can help focus on the follow-up experimental validation on a smaller number of highly likely targets for a drug. Although many methods have been proposed for computational DTI prediction, they suffer the high false positive prediction rate or they do not predict the effect that drugs exert on targets in DTIs. In this report, first, we present a comprehensive review of the recent progress in the field of DTI prediction from data-centric and algorithm-centric perspectives. The aim is to provide a comprehensive review of computational methods for identifying DTIs, which could help in constructing more reliable methods. Then, we present DDR, an efficient method to predict the existence of DTIs. DDR achieves significantly more accurate results compared to the other state-of-theart methods. As supported by independent evidences, we verified as correct 22 out of the top 25 DDR DTIs predictions. This validation proves the practical utility of DDR, suggesting that DDR can be used as an efficient method to identify 5 correct DTIs. Finally, we present DDR-FE method that predicts the effect types of a drug on its target. On different representative datasets, under various test setups, and using different performance measures, we show that DDR-FE achieves extremely good performance. Using blind test data, we verified as correct 2,300 out of 3,076 DTIs effects predicted by DDR-FE. This suggests that DDR-FE can be used as an efficient method to identify correct effects of a drug on its target.

  8. Machine learning in jet physics

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    High energy collider experiments produce several petabytes of data every year. Given the magnitude and complexity of the raw data, machine learning algorithms provide the best available platform to transform and analyse these data to obtain valuable insights to understand Standard Model and Beyond Standard Model theories. These collider experiments produce both quark and gluon initiated hadronic jets as the core components. Deep learning techniques enable us to classify quark/gluon jets through image recognition and help us to differentiate signals and backgrounds in Beyond Standard Model searches at LHC. We are currently working on quark/gluon jet classification and progressing in our studies to find the bias between event generators using domain adversarial neural networks (DANN). We also plan to investigate top tagging, weak supervision on mixed samples in high energy physics, utilizing transfer learning from simulated data to real experimental data.

  9. Learning About Climate and Atmospheric Models Through Machine Learning

    Science.gov (United States)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  10. Food category consumption and obesity prevalence across countries: an application of Machine Learning method to big data analysis

    Science.gov (United States)

    Dunstan, Jocelyn; Fallah-Fini, Saeideh; Nau, Claudia; Glass, Thomas; Global Obesity Prevention Center Team

    The applications of sophisticated mathematical and numerical tools in public health has been demonstrated to be useful in predicting the outcome of public intervention as well as to study, for example, the main causes of obesity without doing experiments with the population. In this project we aim to understand which kind of food consumed in different countries over time best defines the rate of obesity in those countries. The use of Machine Learning is particularly useful because we do not need to create a hypothesis and test it with the data, but instead we learn from the data to find the groups of food that best describe the prevalence of obesity.

  11. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior......Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...

  12. An analysis of feature relevance in the classification of astronomical transients with machine learning methods

    Science.gov (United States)

    D'Isanto, A.; Cavuoti, S.; Brescia, M.; Donalek, C.; Longo, G.; Riccio, G.; Djorgovski, S. G.

    2016-04-01

    The exploitation of present and future synoptic (multiband and multi-epoch) surveys requires an extensive use of automatic methods for data processing and data interpretation. In this work, using data extracted from the Catalina Real Time Transient Survey (CRTS), we investigate the classification performance of some well tested methods: Random Forest, MultiLayer Perceptron with Quasi Newton Algorithm and K-Nearest Neighbours, paying special attention to the feature selection phase. In order to do so, several classification experiments were performed. Namely: identification of cataclysmic variables, separation between galactic and extragalactic objects and identification of supernovae.

  13. Harnessing the power of big data: infusing the scientific method with machine learning to transform ecology

    Science.gov (United States)

    Most efforts to harness the power of big data for ecology and environmental sciences focus on data and metadata sharing, standardization, and accuracy. However, many scientists have not accepted the data deluge as an integral part of their research because the current scientific method is not scalab...

  14. Unsupervised machine-learning method for improving the performance of ambulatory fall-detection systems

    Directory of Open Access Journals (Sweden)

    Yuwono Mitchell

    2012-02-01

    Full Text Available Abstract Background Falls can cause trauma, disability and death among older people. Ambulatory accelerometer devices are currently capable of detecting falls in a controlled environment. However, research suggests that most current approaches can tend to have insufficient sensitivity and specificity in non-laboratory environments, in part because impacts can be experienced as part of ordinary daily living activities. Method We used a waist-worn wireless tri-axial accelerometer combined with digital signal processing, clustering and neural network classifiers. The method includes the application of Discrete Wavelet Transform, Regrouping Particle Swarm Optimization, Gaussian Distribution of Clustered Knowledge and an ensemble of classifiers including a multilayer perceptron and Augmented Radial Basis Function (ARBF neural networks. Results Preliminary testing with 8 healthy individuals in a home environment yields 98.6% sensitivity to falls and 99.6% specificity for routine Activities of Daily Living (ADL data. Single ARB and MLP classifiers were compared with a combined classifier. The combined classifier offers the greatest sensitivity, with a slight reduction in specificity for routine ADL and an increased specificity for exercise activities. In preliminary tests, the approach achieves 100% sensitivity on in-group falls, 97.65% on out-group falls, 99.33% specificity on routine ADL, and 96.59% specificity on exercise ADL. Conclusion The pre-processing and feature-extraction steps appear to simplify the signal while successfully extracting the essential features that are required to characterize a fall. The results suggest this combination of classifiers can perform better than MLP alone. Preliminary testing suggests these methods may be useful for researchers who are attempting to improve the performance of ambulatory fall-detection systems.

  15. Archetypal Analysis for Machine Learning

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai

    2010-01-01

    Archetypal analysis (AA) proposed by Cutler and Breiman in [1] estimates the principal convex hull of a data set. As such AA favors features that constitute representative ’corners’ of the data, i.e. distinct aspects or archetypes. We will show that AA enjoys the interpretability of clustering - ...... for K-means [2]. We demonstrate that the AA model is relevant for feature extraction and dimensional reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, text mining and collaborative filtering....

  16. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  17. Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza.

    Directory of Open Access Journals (Sweden)

    Chris Allen

    Full Text Available Traditional methods for monitoring influenza are haphazard and lack fine-grained details regarding the spatial and temporal dynamics of outbreaks. Twitter gives researchers and public health officials an opportunity to examine the spread of influenza in real-time and at multiple geographical scales. In this paper, we introduce an improved framework for monitoring influenza outbreaks using the social media platform Twitter. Relying upon techniques from geographic information science (GIS and data mining, Twitter messages were collected, filtered, and analyzed for the thirty most populated cities in the United States during the 2013-2014 flu season. The results of this procedure are compared with national, regional, and local flu outbreak reports, revealing a statistically significant correlation between the two data sources. The main contribution of this paper is to introduce a comprehensive data mining process that enhances previous attempts to accurately identify tweets related to influenza. Additionally, geographical information systems allow us to target, filter, and normalize Twitter messages.

  18. Rapid estimation of compost enzymatic activity by spectral analysis method combined with machine learning.

    Science.gov (United States)

    Chakraborty, Somsubhra; Das, Bhabani S; Ali, Md Nasim; Li, Bin; Sarathjith, M C; Majumdar, K; Ray, D P

    2014-03-01

    The aim of this study was to investigate the feasibility of using visible near-infrared (VisNIR) diffuse reflectance spectroscopy (DRS) as an easy, inexpensive, and rapid method to predict compost enzymatic activity, which traditionally measured by fluorescein diacetate hydrolysis (FDA-HR) assay. Compost samples representative of five different compost facilities were scanned by DRS, and the raw reflectance spectra were preprocessed using seven spectral transformations for predicting compost FDA-HR with six multivariate algorithms. Although principal component analysis for all spectral pretreatments satisfactorily identified the clusters by compost types, it could not separate different FDA contents. Furthermore, the artificial neural network multilayer perceptron (residual prediction deviation=3.2, validation r(2)=0.91 and RMSE=13.38 μg g(-1) h(-1)) outperformed other multivariate models to capture the highly non-linear relationships between compost enzymatic activity and VisNIR reflectance spectra after Savitzky-Golay first derivative pretreatment. This work demonstrates the efficiency of VisNIR DRS for predicting compost enzymatic as well as microbial activity. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Machine learning in radiation oncology theory and applications

    CERN Document Server

    El Naqa, Issam; Murphy, Martin J

    2015-01-01

    ​This book provides a complete overview of the role of machine learning in radiation oncology and medical physics, covering basic theory, methods, and a variety of applications in medical physics and radiotherapy. An introductory section explains machine learning, reviews supervised and unsupervised learning methods, discusses performance evaluation, and summarizes potential applications in radiation oncology. Detailed individual sections are then devoted to the use of machine learning in quality assurance; computer-aided detection, including treatment planning and contouring; image-guided rad

  20. Randomized Algorithms for Scalable Machine Learning

    OpenAIRE

    Kleiner, Ariel Jacob

    2012-01-01

    Many existing procedures in machine learning and statistics are computationally intractable in the setting of large-scale data. As a result, the advent of rapidly increasing dataset sizes, which should be a boon yielding improved statistical performance, instead severely blunts the usefulness of a variety of existing inferential methods. In this work, we use randomness to ameliorate this lack of scalability by reducing complex, computationally difficult inferential problems to larger sets o...

  1. Two-Dimensional Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Bo Jia

    2015-01-01

    (BP networks. However, like many other methods, ELM is originally proposed to handle vector pattern while nonvector patterns in real applications need to be explored, such as image data. We propose the two-dimensional extreme learning machine (2DELM based on the very natural idea to deal with matrix data directly. Unlike original ELM which handles vectors, 2DELM take the matrices as input features without vectorization. Empirical studies on several real image datasets show the efficiency and effectiveness of the algorithm.

  2. Using Machine learning method to estimate Air Temperature from MODIS over Berlin

    Science.gov (United States)

    Marzban, F.; Preusker, R.; Sodoudi, S.; Taheri, H.; Allahbakhshi, M.

    2015-12-01

    Land Surface Temperature (LST) is defined as the temperature of the interface between the Earth's surface and its atmosphere and thus it is a critical variable to understand land-atmosphere interactions and a key parameter in meteorological and hydrological studies, which is involved in energy fluxes. Air temperature (Tair) is one of the most important input variables in different spatially distributed hydrological, ecological models. The estimation of near surface air temperature is useful for a wide range of applications. Some applications from traffic or energy management, require Tair data in high spatial and temporal resolution at two meters height above the ground (T2m), sometimes in near-real-time. Thus, a parameterization based on boundary layer physical principles was developed that determines the air temperature from remote sensing data (MODIS). Tair is commonly obtained from synoptic measurements in weather stations. However, the derivation of near surface air temperature from the LST derived from satellite is far from straight forward. T2m is not driven directly by the sun, but indirectly by LST, thus T2m can be parameterized from the LST and other variables such as Albedo, NDVI, Water vapor and etc. Most of the previous studies have focused on estimating T2m based on simple and advanced statistical approaches, Temperature-Vegetation index and energy-balance approaches but the main objective of this research is to explore the relationships between T2m and LST in Berlin by using Artificial intelligence method with the aim of studying key variables to allow us establishing suitable techniques to obtain Tair from satellite Products and ground data. Secondly, an attempt was explored to identify an individual mix of attributes that reveals a particular pattern to better understanding variation of T2m during day and nighttime over the different area of Berlin. For this reason, a three layer Feedforward neural networks is considered with LMA algorithm

  3. Revisit of Machine Learning Supported Biological and Biomedical Studies.

    Science.gov (United States)

    Yu, Xiang-Tian; Wang, Lu; Zeng, Tao

    2018-01-01

    Generally, machine learning includes many in silico methods to transform the principles underlying natural phenomenon to human understanding information, which aim to save human labor, to assist human judge, and to create human knowledge. It should have wide application potential in biological and biomedical studies, especially in the era of big biological data. To look through the application of machine learning along with biological development, this review provides wide cases to introduce the selection of machine learning methods in different practice scenarios involved in the whole biological and biomedical study cycle and further discusses the machine learning strategies for analyzing omics data in some cutting-edge biological studies. Finally, the notes on new challenges for machine learning due to small-sample high-dimension are summarized from the key points of sample unbalance, white box, and causality.

  4. Source localization in an ocean waveguide using supervised machine learning.

    Science.gov (United States)

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  5. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  6. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  7. Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China

    Science.gov (United States)

    Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Li, Yuanyao; Catani, Filippo; Pourghasemi, Hamid Reza

    2018-03-01

    Landslide is a common natural hazard and responsible for extensive damage and losses in mountainous areas. In this study, Longju in the Three Gorges Reservoir area in China was taken as a case study for landslide susceptibility assessment in order to develop effective risk prevention and mitigation strategies. To begin, 202 landslides were identified, including 95 colluvial landslides and 107 rockfalls. Twelve landslide causal factor maps were prepared initially, and the relationship between these factors and each landslide type was analyzed using the information value model. Later, the unimportant factors were selected and eliminated using the information gain ratio technique. The landslide locations were randomly divided into two groups: 70% for training and 30% for verifying. Two machine learning models: the support vector machine (SVM) and artificial neural network (ANN), and a multivariate statistical model: the logistic regression (LR), were applied for landslide susceptibility modeling (LSM) for each type. The LSM index maps, obtained from combining the assessment results of the two landslide types, were classified into five levels. The performance of the LSMs was evaluated using the receiver operating characteristics curve and Friedman test. Results show that the elimination of noise-generating factors and the separated modeling of each landslide type have significantly increased the prediction accuracy. The machine learning models outperformed the multivariate statistical model and SVM model was found ideal for the case study area.

  8. Early identification of posttraumatic stress following military deployment: Application of machine learning methods to a prospective study of Danish soldiers.

    Science.gov (United States)

    Karstoft, Karen-Inge; Statnikov, Alexander; Andersen, Søren B; Madsen, Trine; Galatzer-Levy, Isaac R

    2015-09-15

    Pre-deployment identification of soldiers at risk for long-term posttraumatic stress psychopathology after home coming is important to guide decisions about deployment. Early post-deployment identification can direct early interventions to those in need and thereby prevents the development of chronic psychopathology. Both hold significant public health benefits given large numbers of deployed soldiers, but has so far not been achieved. Here, we aim to assess the potential for pre- and early post-deployment prediction of resilience or posttraumatic stress development in soldiers by application of machine learning (ML) methods. ML feature selection and prediction algorithms were applied to a prospective cohort of 561 Danish soldiers deployed to Afghanistan in 2009 to identify unique risk indicators and forecast long-term posttraumatic stress responses. Robust pre- and early postdeployment risk indicators were identified, and included individual PTSD symptoms as well as total level of PTSD symptoms, previous trauma and treatment, negative emotions, and thought suppression. The predictive performance of these risk indicators combined was assessed by cross-validation. Together, these indicators forecasted long term posttraumatic stress responses with high accuracy (pre-deployment: AUC = 0.84 (95% CI = 0.81-0.87), post-deployment: AUC = 0.88 (95% CI = 0.85-0.91)). This study utilized a previously collected data set and was therefore not designed to exhaust the potential of ML methods. Further, the study relied solely on self-reported measures. Pre-deployment and early post-deployment identification of risk for long-term posttraumatic psychopathology are feasible and could greatly reduce the public health costs of war. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Trends in Machine Learning for Signal Processing

    DEFF Research Database (Denmark)

    Adali, Tulay; Miller, David J.; Diamantaras, Konstantinos I.

    2011-01-01

    By putting the accent on learning from the data and the environment, the Machine Learning for SP (MLSP) Technical Committee (TC) provides the essential bridge between the machine learning and SP communities. While the emphasis in MLSP is on learning and data-driven approaches, SP defines the main...... applications of interest, and thus the constraints and requirements on solutions, which include computational efficiency, online adaptation, and learning with limited supervision/reference data....

  10. Machine learning in heart failure: ready for prime time.

    Science.gov (United States)

    Awan, Saqib Ejaz; Sohel, Ferdous; Sanfilippo, Frank Mario; Bennamoun, Mohammed; Dwivedi, Girish

    2018-03-01

    The aim of this review is to present an up-to-date overview of the application of machine learning methods in heart failure including diagnosis, classification, readmissions and medication adherence. Recent studies have shown that the application of machine learning techniques may have the potential to improve heart failure outcomes and management, including cost savings by improving existing diagnostic and treatment support systems. Recently developed deep learning methods are expected to yield even better performance than traditional machine learning techniques in performing complex tasks by learning the intricate patterns hidden in big medical data. The review summarizes the recent developments in the application of machine and deep learning methods in heart failure management.

  11. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  12. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  13. Machine Learning in Medical Imaging.

    Science.gov (United States)

    Giger, Maryellen L

    2018-03-01

    Advances in both imaging and computers have synergistically led to a rapid rise in the potential use of artificial intelligence in various radiological imaging tasks, such as risk assessment, detection, diagnosis, prognosis, and therapy response, as well as in multi-omics disease discovery. A brief overview of the field is given here, allowing the reader to recognize the terminology, the various subfields, and components of machine learning, as well as the clinical potential. Radiomics, an expansion of computer-aided diagnosis, has been defined as the conversion of images to minable data. The ultimate benefit of quantitative radiomics is to (1) yield predictive image-based phenotypes of disease for precision medicine or (2) yield quantitative image-based phenotypes for data mining with other -omics for discovery (ie, imaging genomics). For deep learning in radiology to succeed, note that well-annotated large data sets are needed since deep networks are complex, computer software and hardware are evolving constantly, and subtle differences in disease states are more difficult to perceive than differences in everyday objects. In the future, machine learning in radiology is expected to have a substantial clinical impact with imaging examinations being routinely obtained in clinical practice, providing an opportunity to improve decision support in medical image interpretation. The term of note is decision support, indicating that computers will augment human decision making, making it more effective and efficient. The clinical impact of having computers in the routine clinical practice may allow radiologists to further integrate their knowledge with their clinical colleagues in other medical specialties and allow for precision medicine. Copyright © 2018. Published by Elsevier Inc.

  14. Building machine learning systems with Python

    CERN Document Server

    Coelho, Luis Pedro

    2015-01-01

    This book primarily targets Python developers who want to learn and use Python's machine learning capabilities and gain valuable insights from data to develop effective solutions for business problems.

  15. Machine learning in cardiovascular medicine: are we there yet?

    Science.gov (United States)

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. Learning as a Machine: Crossovers between Humans and Machines

    Science.gov (United States)

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  17. The ATLAS Higgs machine learning challenge

    CERN Document Server

    Davey, W; The ATLAS collaboration; Rousseau, D; Cowan, G; Kegl, B; Germain-Renaud, C; Guyon, I

    2014-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 90's with Artificial Neural Net for example, more recently with Boosted Decision Trees, Random Forest etc... Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, Data Scientists are developing new Machine Learning algorithms to extract sense from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, data scientists have advanced algorithms: the goal of the HiggsML project is to bring the two together by a “challenge”: participants from all over the world and any scientific background can compete online ( https://www.kaggle.com/c/higgs-boson ) to obtain the best Higgs to tau tau signal significance on a set of ATLAS full simulated Monte Carlo signal and background. Winners with the best scores will receive money prizes ; authors of the best method (most usable) will be invited t...

  18. What is the machine learning.

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. In this talk, I explore a procedure for identifying combinations of variables -- aided by physical intuition -- that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. I will demonstrate these features in both an easy to understand toy model and an idealized LHC resonance scenario.

  19. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images.

    Science.gov (United States)

    Wang, Hongkai; Zhou, Zongwei; Li, Yingci; Chen, Zhonghua; Lu, Peiou; Wang, Wenzhi; Liu, Wanyu; Yu, Lijuan

    2017-12-01

    This study aimed to compare one state-of-the-art deep learning method and four classical machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer (NSCLC) from 18 F-FDG PET/CT images. Another objective was to compare the discriminative power of the recently popular PET/CT texture features with the widely used diagnostic features such as tumor size, CT value, SUV, image contrast, and intensity standard deviation. The four classical machine learning methods included random forests, support vector machines, adaptive boosting, and artificial neural network. The deep learning method was the convolutional neural networks (CNN). The five methods were evaluated using 1397 lymph nodes collected from PET/CT images of 168 patients, with corresponding pathology analysis results as gold standard. The comparison was conducted using 10 times 10-fold cross-validation based on the criterion of sensitivity, specificity, accuracy (ACC), and area under the ROC curve (AUC). For each classical method, different input features were compared to select the optimal feature set. Based on the optimal feature set, the classical methods were compared with CNN, as well as with human doctors from our institute. For the classical methods, the diagnostic features resulted in 81~85% ACC and 0.87~0.92 AUC, which were significantly higher than the results of texture features. CNN's sensitivity, specificity, ACC, and AUC were 84, 88, 86, and 0.91, respectively. There was no significant difference between the results of CNN and the best classical method. The sensitivity, specificity, and ACC of human doctors were 73, 90, and 82, respectively. All the five machine learning methods had higher sensitivities but lower specificities than human doctors. The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images

  20. Teaching machine learning to design students

    NARCIS (Netherlands)

    Vlist, van der B.J.J.; van de Westelaken, H.F.M.; Bartneck, C.; Hu, J.; Ahn, R.M.C.; Barakova, E.I.; Delbressine, F.L.M.; Feijs, L.M.G.; Pan, Z.; Zhang, X.; El Rhalibi, A.

    2008-01-01

    Machine learning is a key technology to design and create intelligent systems, products, and related services. Like many other design departments, we are faced with the challenge to teach machine learning to design students, who often do not have an inherent affinity towards technology. We

  1. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  2. A machine learning model with human cognitive biases capable of learning from small and biased datasets.

    Science.gov (United States)

    Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro

    2018-05-09

    Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.

  3. Machine vision systems using machine learning for industrial product inspection

    Science.gov (United States)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  4. Probabilistic models and machine learning in structural bioinformatics

    DEFF Research Database (Denmark)

    Hamelryck, Thomas

    2009-01-01

    . Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis...

  5. Using machine learning, neural networks and statistics to predict bankruptcy

    NARCIS (Netherlands)

    Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.

    1997-01-01

    Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear

  6. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    Directory of Open Access Journals (Sweden)

    Ai-bing Zhang

    Full Text Available Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish and two representing non-coding ITS barcodes (rust fungi and brown algae. Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ and Maximum likelihood (ML methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40% for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37% for 1094 brown algae queries, both using ITS barcodes.

  7. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    Science.gov (United States)

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  8. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    Science.gov (United States)

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  9. Adaptive Learning Systems: Beyond Teaching Machines

    Science.gov (United States)

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  10. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Boucher, Thomas F., E-mail: boucher@cs.umass.edu [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Ozanne, Marie V. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Carmosino, Marco L. [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Dyar, M. Darby [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Mahadevan, Sridhar [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Breves, Elly A.; Lepore, Kate H. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Clegg, Samuel M. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

    2015-05-01

    dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO{sub 2}, MgO, Fe{sub 2}O{sub 3}, and Na{sub 2}O, lasso for Al{sub 2}O{sub 3}, elastic net for MnO, and PLS-1 for CaO, TiO{sub 2}, and K{sub 2}O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user. - Highlights: • We compared 9 machine learning regression models for predicting mineral composition from LIBS. • These models vary over factors: linear/nonlinear, sparse/dense, univariate/multivariate. • The linear models evaluated generalized well for out-of-sample predictions. • The nonlinear models evaluated tended to overfit the training data and generalize poorly. • Sparse models best predicted the elements with a small number of high transition probability emission lines.

  11. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

    2015-01-01

    channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO 2 , MgO, Fe 2 O 3 , and Na 2 O, lasso for Al 2 O 3 , elastic net for MnO, and PLS-1 for CaO, TiO 2 , and K 2 O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user. - Highlights: • We compared 9 machine learning regression models for predicting mineral composition from LIBS. • These models vary over factors: linear/nonlinear, sparse/dense, univariate/multivariate. • The linear models evaluated generalized well for out-of-sample predictions. • The nonlinear models evaluated tended to overfit the training data and generalize poorly. • Sparse models best predicted the elements with a small number of high transition probability emission lines

  12. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  13. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  14. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  15. Robust Matching Pursuit Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Zejian Yuan

    2018-01-01

    Full Text Available Extreme learning machine (ELM is a popular learning algorithm for single hidden layer feedforward networks (SLFNs. It was originally proposed with the inspiration from biological learning and has attracted massive attentions due to its adaptability to various tasks with a fast learning ability and efficient computation cost. As an effective sparse representation method, orthogonal matching pursuit (OMP method can be embedded into ELM to overcome the singularity problem and improve the stability. Usually OMP recovers a sparse vector by minimizing a least squares (LS loss, which is efficient for Gaussian distributed data, but may suffer performance deterioration in presence of non-Gaussian data. To address this problem, a robust matching pursuit method based on a novel kernel risk-sensitive loss (in short KRSLMP is first proposed in this paper. The KRSLMP is then applied to ELM to solve the sparse output weight vector, and the new method named the KRSLMP-ELM is developed for SLFN learning. Experimental results on synthetic and real-world data sets confirm the effectiveness and superiority of the proposed method.

  16. A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information.

    Science.gov (United States)

    Chen, Gongbo; Li, Shanshan; Knibbs, Luke D; Hamm, N A S; Cao, Wei; Li, Tiantian; Guo, Jianping; Ren, Hongyan; Abramson, Michael J; Guo, Yuming

    2018-04-24

    Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM 2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. To estimate daily concentrations of PM 2.5 across China during 2005-2016. Daily ground-level PM 2.5 data were obtained from 1479 stations across China during 2014-2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM 2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM 2.5 across China with a resolution of 0.1° (≈10 km) during 2005-2016. The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM 2.5 [10-fold cross-validation (CV) R 2  = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m 3 ]. At the monthly and annual time-scale, the explained variability of average PM 2.5 increased up to 86% (RMSE = 10.7 μg/m 3 and 6.9 μg/m 3 , respectively). Taking advantage of a novel application of modeling framework and the most recent ground-level PM 2.5 observations, the machine learning method showed higher predictive ability than previous studies. Random forests approach can be used to estimate historical exposure to PM 2.5 in China with high accuracy. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. International Conference on Extreme Learning Machines 2014

    CERN Document Server

    Mao, Kezhi; Cambria, Erik; Man, Zhihong; Toh, Kar-Ann

    2015-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2014, which was held in Singapore, December 8-10, 2014. This conference brought together the researchers and practitioners of Extreme Learning Machine (ELM) from a variety of fields to promote research and development of “learning without iterative tuning”.  The book covers theories, algorithms and applications of ELM. It gives the readers a glance of the most recent advances of ELM.  

  18. An introduction to quantum machine learning

    OpenAIRE

    Schuld, M.; Sinayskiy, I.; Petruccione, F.

    2014-01-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum compute...

  19. International Conference on Extreme Learning Machine 2015

    CERN Document Server

    Mao, Kezhi; Wu, Jonathan; Lendasse, Amaury; ELM 2015; Theory, Algorithms and Applications (I); Theory, Algorithms and Applications (II)

    2016-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2015, which was held in Hangzhou, China, December 15-17, 2015. This conference brought together researchers and engineers to share and exchange R&D experience on both theoretical studies and practical applications of the Extreme Learning Machine (ELM) technique and brain learning. This book covers theories, algorithms ad applications of ELM. It gives readers a glance of the most recent advances of ELM. .

  20. Component Pin Recognition Using Algorithms Based on Machine Learning

    Science.gov (United States)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  1. Machine learning search for variable stars

    Science.gov (United States)

    Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis

    2018-04-01

    Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.

  2. Using Machine Learning in Adversarial Environments.

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Warren Leon [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-02-01

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approaches only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.

  3. From Curve Fitting to Machine Learning

    CERN Document Server

    Zielesny, Achim

    2011-01-01

    The analysis of experimental data is at heart of science from its beginnings. But it was the advent of digital computers that allowed the execution of highly non-linear and increasingly complex data analysis procedures - methods that were completely unfeasible before. Non-linear curve fitting, clustering and machine learning belong to these modern techniques which are a further step towards computational intelligence. The goal of this book is to provide an interactive and illustrative guide to these topics. It concentrates on the road from two dimensional curve fitting to multidimensional clus

  4. An introduction to machine learning with Scikit-Learn

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  5. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  6. Machine Learning in Production Systems Design Using Genetic Algorithms

    OpenAIRE

    Abu Qudeiri Jaber; Yamamoto Hidehiko Rizauddin Ramli

    2008-01-01

    To create a solution for a specific problem in machine learning, the solution is constructed from the data or by use a search method. Genetic algorithms are a model of machine learning that can be used to find nearest optimal solution. While the great advantage of genetic algorithms is the fact that they find a solution through evolution, this is also the biggest disadvantage. Evolution is inductive, in nature life does not evolve towards a good solution but it evolves aw...

  7. A Comparative Analysis of Machine Learning Techniques for Credit Scoring

    OpenAIRE

    Nwulu, Nnamdi; Oroja, Shola; İlkan, Mustafa

    2012-01-01

    Abstract Credit Scoring has become an oft researched topic in light of the increasing volatility of the global economy and the recent world financial crisis. Amidst the many methods used for credit scoring, machine learning techniques are becoming increasingly popular due to their efficient and accurate nature and relative simplicity. Furthermore machine learning techniques minimize the risk of human bias and error and maximize speed as they are able to perform computation...

  8. Machine learning application in the life time of materials

    OpenAIRE

    Yu, Xiaojiao

    2017-01-01

    Materials design and development typically takes several decades from the initial discovery to commercialization with the traditional trial and error development approach. With the accumulation of data from both experimental and computational results, data based machine learning becomes an emerging field in materials discovery, design and property prediction. This manuscript reviews the history of materials science as a disciplinary the most common machine learning method used in materials sc...

  9. Machine learning in laboratory medicine: waiting for the flood?

    Science.gov (United States)

    Cabitza, Federico; Banfi, Giuseppe

    2018-03-28

    This review focuses on machine learning and on how methods and models combining data analytics and artificial intelligence have been applied to laboratory medicine so far. Although still in its infancy, the potential for applying machine learning to laboratory data for both diagnostic and prognostic purposes deserves more attention by the readership of this journal, as well as by physician-scientists who will want to take advantage of this new computer-based support in pathology and laboratory medicine.

  10. Producing landslide susceptibility maps by utilizing machine learning methods. The case of Finikas catchment basin, North Peloponnese, Greece.

    Science.gov (United States)

    Tsangaratos, Paraskevas; Ilia, Ioanna; Loupasakis, Constantinos; Papadakis, Michalis; Karimalis, Antonios

    2017-04-01

    The main objective of the present study was to apply two machine learning methods for the production of a landslide susceptibility map in the Finikas catchment basin, located in North Peloponnese, Greece and to compare their results. Specifically, Logistic Regression and Random Forest were utilized, based on a database of 40 sites classified into two categories, non-landslide and landslide areas that were separated into a training dataset (70% of the total data) and a validation dataset (remaining 30%). The identification of the areas was established by analyzing airborne imagery, extensive field investigation and the examination of previous research studies. Six landslide related variables were analyzed, namely: lithology, elevation, slope, aspect, distance to rivers and distance to faults. Within the Finikas catchment basin most of the reported landslides were located along the road network and within the residential complexes, classified as rotational and translational slides, and rockfalls, mainly caused due to the physical conditions and the general geotechnical behavior of the geological formation that cover the area. Each landslide susceptibility map was reclassified by applying the Geometric Interval classification technique into five classes, namely: very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility. The comparison and validation of the outcomes of each model were achieved using statistical evaluation measures, the receiving operating characteristic and the area under the success and predictive rate curves. The computation process was carried out using RStudio an integrated development environment for R language and ArcGIS 10.1 for compiling the data and producing the landslide susceptibility maps. From the outcomes of the Logistic Regression analysis it was induced that the highest b coefficient is allocated to lithology and slope, which was 2.8423 and 1.5841, respectively. From the

  11. On the Use of Machine Learning for Identifying Botnet Network Traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    contemporary approaches use machine learning techniques for identifying malicious traffic. This paper presents a survey of contemporary botnet detection methods that rely on machine learning for identifying botnet network traffic. The paper provides a comprehensive overview on the existing scientific work thus...... contributing to the better understanding of capabilities, limitations and opportunities of using machine learning for identifying botnet traffic. Furthermore, the paper outlines possibilities for the future development of machine learning-based botnet detection systems....

  12. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2016-01-01

    Machine learning techniques relevant for nonlinearity mitigation, carrier recovery, and nanoscale device characterization are reviewed and employed. Markov Chain Monte Carlo in combination with Bayesian filtering is employed within the nonlinear state-space framework and demonstrated for parameter...

  13. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2015-01-01

    Techniques from the machine learning community are reviewed and employed for laser characterization, signal detection in the presence of nonlinear phase noise, and nonlinearity mitigation. Bayesian filtering and expectation maximization are employed within nonlinear state-space framework...

  14. Computer vision and machine learning for archaeology

    NARCIS (Netherlands)

    van der Maaten, L.J.P.; Boon, P.; Lange, G.; Paijmans, J.J.; Postma, E.

    2006-01-01

    Until now, computer vision and machine learning techniques barely contributed to the archaeological domain. The use of these techniques can support archaeologists in their assessment and classification of archaeological finds. The paper illustrates the use of computer vision techniques for

  15. A fast hybrid methodology based on machine learning, quantum methods, and experimental measurements for evaluating material properties

    Science.gov (United States)

    Kong, Chang Sun; Haverty, Michael; Simka, Harsono; Shankar, Sadasivan; Rajan, Krishna

    2017-09-01

    We present a hybrid approach based on both machine learning and targeted ab-initio calculations to determine adhesion energies between dissimilar materials. The goals of this approach are to complement experimental and/or all ab-initio computational efforts, to identify promising materials rapidly and identify in a quantitative manner the relative contributions of the different material attributes affecting adhesion. Applications of the methodology to predict bulk modulus, yield strength, adhesion and wetting properties of copper (Cu) with other materials including metals, nitrides and oxides is discussed in this paper. In the machine learning component of this methodology, the parameters that were chosen can be roughly divided into four types: atomic and crystalline parameters (which are related to specific elements such as electronegativities, electron densities in Wigner-Seitz cells); bulk material properties (e.g. melting point), mechanical properties (e.g. modulus) and those representing atomic characteristics in ab-initio formalisms (e.g. pseudopotentials). The atomic parameters are defined over one dataset to determine property correlation with published experimental data. We then develop a semi-empirical model across multiple datasets to predict adhesion in material interfaces outside the original datasets. Since adhesion is between two materials, we appropriately use parameters which indicate differences between the elements that comprise the materials. These semi-empirical predictions agree reasonably with the trend in chemical work of adhesion predicted using ab-initio techniques and are used for fast materials screening. For the screened candidates, the ab-initio modeling component provides fundamental understanding of the chemical interactions at the interface, and explains the wetting thermodynamics of thin Cu layers on various substrates. Comparison against ultra-high vacuum (UHV) experiments for well-characterized Cu/Ta and Cu/α-Al2O3 interfaces is

  16. Model-Agnostic Interpretability of Machine Learning

    OpenAIRE

    Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos

    2016-01-01

    Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred f...

  17. Implementing Machine Learning in the PCWG Tool

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    2016-12-13

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  18. A machine learning method to separate cosmic ray electrons from protons from 10 to 100 GeV using DAMPE data

    Science.gov (United States)

    Zhao, Hao; Peng, Wen-Xi; Wang, Huan-Yu; Qiao, Rui; Guo, Dong-Ya; Xiao, Hong; Wang, Zhao-Min

    2018-06-01

    DArk Matter Particle Explorer (DAMPE) is a general purpose high energy cosmic ray and gamma ray observatory, aiming to detect high energy electrons and gammas in the energy range 5 GeV to 10 TeV and hundreds of TeV for nuclei. This paper provides a method using machine learning to identify electrons and separate them from gammas, protons, helium and heavy nuclei with the DAMPE data acquired from 2016 January 1 to 2017 June 30, in the energy range from 10 to 100 GeV.

  19. IRB Process Improvements: A Machine Learning Analysis.

    Science.gov (United States)

    Shoenbill, Kimberly; Song, Yiqiang; Cobb, Nichelle L; Drezner, Marc K; Mendonca, Eneida A

    2017-06-01

    Clinical research involving humans is critically important, but it is a lengthy and expensive process. Most studies require institutional review board (IRB) approval. Our objective is to identify predictors of delays or accelerations in the IRB review process and apply this knowledge to inform process change in an effort to improve IRB efficiency, transparency, consistency and communication. We analyzed timelines of protocol submissions to determine protocol or IRB characteristics associated with different processing times. Our evaluation included single variable analysis to identify significant predictors of IRB processing time and machine learning methods to predict processing times through the IRB review system. Based on initial identified predictors, changes to IRB workflow and staffing procedures were instituted and we repeated our analysis. Our analysis identified several predictors of delays in the IRB review process including type of IRB review to be conducted, whether a protocol falls under Veteran's Administration purview and specific staff in charge of a protocol's review. We have identified several predictors of delays in IRB protocol review processing times using statistical and machine learning methods. Application of this knowledge to process improvement efforts in two IRBs has led to increased efficiency in protocol review. The workflow and system enhancements that are being made support our four-part goal of improving IRB efficiency, consistency, transparency, and communication.

  20. Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions.

    Science.gov (United States)

    Agarwal, Shashank; Liu, Feifan; Yu, Hong

    2011-10-03

    Protein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named "Simple Classifier", and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named "OntoNorm". We evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8% F1-score for identifying relevant documents, and 52.3% F1-score for mapping articles to interaction method ontology. Our results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles. Simple Classifier is available at http://sourceforge.net/p/simpleclassify/home/ and OntoNorm at http://sourceforge.net/p/ontonorm/home/.

  1. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  2. Acceleration of saddle-point searches with machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, Andrew A., E-mail: andrew-peterson@brown.edu [School of Engineering, Brown University, Providence, Rhode Island 02912 (United States)

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  3. Acceleration of saddle-point searches with machine learning

    International Nuclear Information System (INIS)

    Peterson, Andrew A.

    2016-01-01

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  4. Acceleration of saddle-point searches with machine learning.

    Science.gov (United States)

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  5. Towards Machine Learning of Motor Skills

    Science.gov (United States)

    Peters, Jan; Schaal, Stefan; Schölkopf, Bernhard

    Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

  6. Machine learning techniques for optical communication system optimization

    DEFF Research Database (Denmark)

    Zibar, Darko; Wass, Jesper; Thrane, Jakob

    In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction.......In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction....

  7. Machine Learning Techniques for Optical Performance Monitoring from Directly Detected PDM-QAM Signals

    DEFF Research Database (Denmark)

    Thrane, Jakob; Wass, Jesper; Piels, Molly

    2017-01-01

    Linear signal processing algorithms are effective in dealing with linear transmission channel and linear signal detection, while the nonlinear signal processing algorithms, from the machine learning community, are effective in dealing with nonlinear transmission channel and nonlinear signal...... detection. In this paper, a brief overview of the various machine learning methods and their application in optical communication is presented and discussed. Moreover, supervised machine learning methods, such as neural networks and support vector machine, are experimentally demonstrated for in-band optical...

  8. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  9. Predicting the dissolution kinetics of silicate glasses using machine learning

    Science.gov (United States)

    Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu

    2018-05-01

    Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.

  10. Applications of machine learning in cancer prediction and prognosis.

    Science.gov (United States)

    Cruz, Joseph A; Wishart, David S

    2007-02-11

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  11. A Machine Learning Application Based in Random Forest for Integrating Mass Spectrometry-Based Metabolomic Data: A Simple Screening Method for Patients With Zika Virus

    Directory of Open Access Journals (Sweden)

    Carlos Fernando Odir Rodrigues Melo

    2018-04-01

    Full Text Available Recent Zika outbreaks in South America, accompanied by unexpectedly severe clinical complications have brought much interest in fast and reliable screening methods for ZIKV (Zika virus identification. Reverse-transcriptase polymerase chain reaction (RT-PCR is currently the method of choice to detect ZIKV in biological samples. This approach, nonetheless, demands a considerable amount of time and resources such as kits and reagents that, in endemic areas, may result in a substantial financial burden over affected individuals and health services veering away from RT-PCR analysis. This study presents a powerful combination of high-resolution mass spectrometry and a machine-learning prediction model for data analysis to assess the existence of ZIKV infection across a series of patients that bear similar symptomatic conditions, but not necessarily are infected with the disease. By using mass spectrometric data that are inputted with the developed decision-making algorithm, we were able to provide a set of features that work as a “fingerprint” for this specific pathophysiological condition, even after the acute phase of infection. Since both mass spectrometry and machine learning approaches are well-established and have largely utilized tools within their respective fields, this combination of methods emerges as a distinct alternative for clinical applications, providing a diagnostic screening—faster and more accurate—with improved cost-effectiveness when compared to existing technologies.

  12. A Machine Learning Application Based in Random Forest for Integrating Mass Spectrometry-Based Metabolomic Data: A Simple Screening Method for Patients With Zika Virus.

    Science.gov (United States)

    Melo, Carlos Fernando Odir Rodrigues; Navarro, Luiz Claudio; de Oliveira, Diogo Noin; Guerreiro, Tatiane Melina; Lima, Estela de Oliveira; Delafiori, Jeany; Dabaja, Mohamed Ziad; Ribeiro, Marta da Silva; de Menezes, Maico; Rodrigues, Rafael Gustavo Martins; Morishita, Karen Noda; Esteves, Cibele Zanardi; de Amorim, Aline Lopes Lucas; Aoyagui, Caroline Tiemi; Parise, Pierina Lorencini; Milanez, Guilherme Paier; do Nascimento, Gabriela Mansano; Ribas Freitas, André Ricardo; Angerami, Rodrigo; Costa, Fábio Trindade Maranhão; Arns, Clarice Weis; Resende, Mariangela Ribeiro; Amaral, Eliana; Junior, Renato Passini; Ribeiro-do-Valle, Carolina C; Milanez, Helaine; Moretti, Maria Luiza; Proenca-Modena, Jose Luiz; Avila, Sandra; Rocha, Anderson; Catharino, Rodrigo Ramos

    2018-01-01

    Recent Zika outbreaks in South America, accompanied by unexpectedly severe clinical complications have brought much interest in fast and reliable screening methods for ZIKV (Zika virus) identification. Reverse-transcriptase polymerase chain reaction (RT-PCR) is currently the method of choice to detect ZIKV in biological samples. This approach, nonetheless, demands a considerable amount of time and resources such as kits and reagents that, in endemic areas, may result in a substantial financial burden over affected individuals and health services veering away from RT-PCR analysis. This study presents a powerful combination of high-resolution mass spectrometry and a machine-learning prediction model for data analysis to assess the existence of ZIKV infection across a series of patients that bear similar symptomatic conditions, but not necessarily are infected with the disease. By using mass spectrometric data that are inputted with the developed decision-making algorithm, we were able to provide a set of features that work as a "fingerprint" for this specific pathophysiological condition, even after the acute phase of infection. Since both mass spectrometry and machine learning approaches are well-established and have largely utilized tools within their respective fields, this combination of methods emerges as a distinct alternative for clinical applications, providing a diagnostic screening-faster and more accurate-with improved cost-effectiveness when compared to existing technologies.

  13. Integrated Method for Personal Thermal Comfort Assessment and Optimization through Users' Feedback, IoT and Machine Learning: A Case Study †.

    Science.gov (United States)

    Salamone, Francesco; Belussi, Lorenzo; Currò, Cristian; Danza, Ludovico; Ghellere, Matteo; Guazzi, Giulia; Lenzi, Bruno; Megale, Valentino; Meroni, Italo

    2018-05-17

    Thermal comfort has become a topic issue in building performance assessment as well as energy efficiency. Three methods are mainly recognized for its assessment. Two of them based on standardized methodologies, face the problem by considering the indoor environment in steady-state conditions (PMV and PPD) and users as active subjects whose thermal perception is influenced by outdoor climatic conditions (adaptive approach). The latter method is the starting point to investigate thermal comfort from an overall perspective by considering endogenous variables besides the traditional physical and environmental ones. Following this perspective, the paper describes the results of an in-field investigation of thermal conditions through the use of nearable and wearable solutions, parametric models and machine learning techniques. The aim of the research is the exploration of the reliability of IoT-based solutions combined with advanced algorithms, in order to create a replicable framework for the assessment and improvement of user thermal satisfaction. For this purpose, an experimental test in real offices was carried out involving eight workers. Parametric models are applied for the assessment of thermal comfort; IoT solutions are used to monitor the environmental variables and the users' parameters; the machine learning CART method allows to predict the users' profile and the thermal comfort perception respect to the indoor environment.

  14. Integrated Method for Personal Thermal Comfort Assessment and Optimization through Users’ Feedback, IoT and Machine Learning: A Case Study †

    Science.gov (United States)

    Currò, Cristian; Danza, Ludovico; Ghellere, Matteo; Guazzi, Giulia; Lenzi, Bruno; Megale, Valentino; Meroni, Italo

    2018-01-01

    Thermal comfort has become a topic issue in building performance assessment as well as energy efficiency. Three methods are mainly recognized for its assessment. Two of them based on standardized methodologies, face the problem by considering the indoor environment in steady-state conditions (PMV and PPD) and users as active subjects whose thermal perception is influenced by outdoor climatic conditions (adaptive approach). The latter method is the starting point to investigate thermal comfort from an overall perspective by considering endogenous variables besides the traditional physical and environmental ones. Following this perspective, the paper describes the results of an in-field investigation of thermal conditions through the use of nearable and wearable solutions, parametric models and machine learning techniques. The aim of the research is the exploration of the reliability of IoT-based solutions combined with advanced algorithms, in order to create a replicable framework for the assessment and improvement of user thermal satisfaction. For this purpose, an experimental test in real offices was carried out involving eight workers. Parametric models are applied for the assessment of thermal comfort; IoT solutions are used to monitor the environmental variables and the users’ parameters; the machine learning CART method allows to predict the users’ profile and the thermal comfort perception respect to the indoor environment. PMID:29772818

  15. Less is more: regularization perspectives on large scale machine learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Deep learning based techniques provide a possible solution at the expanse of theoretical guidance and, especially, of computational requirements. It is then a key challenge for large scale machine learning to devise approaches guaranteed to be accurate and yet computationally efficient. In this talk, we will consider a regularization perspectives on machine learning appealing to classical ideas in linear algebra and inverse problems to scale-up dramatically nonparametric methods such as kernel methods, often dismissed because of prohibitive costs. Our analysis derives optimal theoretical guarantees while providing experimental results at par or out-performing state of the art approaches.

  16. Inverse analysis of turbidites by machine learning

    Science.gov (United States)

    Naruse, H.; Nakao, K.

    2017-12-01

    This study aims to propose a method to estimate paleo-hydraulic conditions of turbidity currents from ancient turbidites by using machine-learning technique. In this method, numerical simulation was repeated under various initial conditions, which produces a data set of characteristic features of turbidites. Then, this data set of turbidites is used for supervised training of a deep-learning neural network (NN). Quantities of characteristic features of turbidites in the training data set are given to input nodes of NN, and output nodes are expected to provide the estimates of initial condition of the turbidity current. The optimization of weight coefficients of NN is then conducted to reduce root-mean-square of the difference between the true conditions and the output values of NN. The empirical relationship with numerical results and the initial conditions is explored in this method, and the discovered relationship is used for inversion of turbidity currents. This machine learning can potentially produce NN that estimates paleo-hydraulic conditions from data of ancient turbidites. We produced a preliminary implementation of this methodology. A forward model based on 1D shallow-water equations with a correction of density-stratification effect was employed. This model calculates a behavior of a surge-like turbidity current transporting mixed-size sediment, and outputs spatial distribution of volume per unit area of each grain-size class on the uniform slope. Grain-size distribution was discretized 3 classes. Numerical simulation was repeated 1000 times, and thus 1000 beds of turbidites were used as the training data for NN that has 21000 input nodes and 5 output nodes with two hidden-layers. After the machine learning finished, independent simulations were conducted 200 times in order to evaluate the performance of NN. As a result of this test, the initial conditions of validation data were successfully reconstructed by NN. The estimated values show very small

  17. MACHINE LEARNING TECHNIQUES USED IN BIG DATA

    Directory of Open Access Journals (Sweden)

    STEFANIA LOREDANA NITA

    2016-07-01

    Full Text Available The classical tools used in data analysis are not enough in order to benefit of all advantages of big data. The amount of information is too large for a complete investigation, and the possible connections and relations between data could be missed, because it is difficult or even impossible to verify all assumption over the information. Machine learning is a great solution in order to find concealed correlations or relationships between data, because it runs at scale machine and works very well with large data sets. The more data we have, the more the machine learning algorithm is useful, because it “learns” from the existing data and applies the found rules on new entries. In this paper, we present some machine learning algorithms and techniques used in big data.

  18. Development of E-Learning Materials for Machining Safety Education

    Science.gov (United States)

    Nakazawa, Tsuyoshi; Mita, Sumiyoshi; Matsubara, Masaaki; Takashima, Takeo; Tanaka, Koichi; Izawa, Satoru; Kawamura, Takashi

    We developed two e-learning materials for Manufacturing Practice safety education: movie learning materials and hazard-detection learning materials. Using these video and sound media, students can learn how to operate machines safely with movie learning materials, which raise the effectiveness of preparation and review for manufacturing practice. Using these materials, students can realize safety operation well. Students can apply knowledge learned in lectures to the detection of hazards and use study methods for hazard detection during machine operation using the hazard-detection learning materials. Particularly, the hazard-detection learning materials raise students‧ safety consciousness and increase students‧ comprehension of knowledge from lectures and comprehension of operations during Manufacturing Practice.

  19. Machine Learning for Quantification of Small Vessel Disease Imaging Biomarkers

    NARCIS (Netherlands)

    Ghafoorian, M.

    2018-01-01

    This thesis is devoted to developing fully automated methods for quantification of small vessel disease imaging bio-markers, namely WMHs and lacunes, using vari- ous machine learning/deep learning and computer vision techniques. The rest of the thesis is organized as follows: Chapter 2 describes

  20. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    International Nuclear Information System (INIS)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.; McEwen, Jason D.

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  1. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    Energy Technology Data Exchange (ETDEWEB)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K. [Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT (United Kingdom); McEwen, Jason D., E-mail: dr.michelle.lochner@gmail.com [Mullard Space Science Laboratory, University College London, Surrey RH5 6NT (United Kingdom)

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  2. A strategy for quantum algorithm design assisted by machine learning

    International Nuclear Information System (INIS)

    Bang, Jeongho; Lee, Jinhyoung; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin

    2014-01-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum–classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch–Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method. (paper)

  3. A strategy for quantum algorithm design assisted by machine learning

    Science.gov (United States)

    Bang, Jeongho; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin; Lee, Jinhyoung

    2014-07-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum-classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch-Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method.

  4. A method for the evaluation of image quality according to the recognition effectiveness of objects in the optical remote sensing image using machine learning algorithm.

    Directory of Open Access Journals (Sweden)

    Tao Yuan

    Full Text Available Objective and effective image quality assessment (IQA is directly related to the application of optical remote sensing images (ORSI. In this study, a new IQA method of standardizing the target object recognition rate (ORR is presented to reflect quality. First, several quality degradation treatments with high-resolution ORSIs are implemented to model the ORSIs obtained in different imaging conditions; then, a machine learning algorithm is adopted for recognition experiments on a chosen target object to obtain ORRs; finally, a comparison with commonly used IQA indicators was performed to reveal their applicability and limitations. The results showed that the ORR of the original ORSI was calculated to be up to 81.95%, whereas the ORR ratios of the quality-degraded images to the original images were 65.52%, 64.58%, 71.21%, and 73.11%. The results show that these data can more accurately reflect the advantages and disadvantages of different images in object identification and information extraction when compared with conventional digital image assessment indexes. By recognizing the difference in image quality from the application effect perspective, using a machine learning algorithm to extract regional gray scale features of typical objects in the image for analysis, and quantitatively assessing quality of ORSI according to the difference, this method provides a new approach for objective ORSI assessment.

  5. A method for the evaluation of image quality according to the recognition effectiveness of objects in the optical remote sensing image using machine learning algorithm.

    Science.gov (United States)

    Yuan, Tao; Zheng, Xinqi; Hu, Xuan; Zhou, Wei; Wang, Wei

    2014-01-01

    Objective and effective image quality assessment (IQA) is directly related to the application of optical remote sensing images (ORSI). In this study, a new IQA method of standardizing the target object recognition rate (ORR) is presented to reflect quality. First, several quality degradation treatments with high-resolution ORSIs are implemented to model the ORSIs obtained in different imaging conditions; then, a machine learning algorithm is adopted for recognition experiments on a chosen target object to obtain ORRs; finally, a comparison with commonly used IQA indicators was performed to reveal their applicability and limitations. The results showed that the ORR of the original ORSI was calculated to be up to 81.95%, whereas the ORR ratios of the quality-degraded images to the original images were 65.52%, 64.58%, 71.21%, and 73.11%. The results show that these data can more accurately reflect the advantages and disadvantages of different images in object identification and information extraction when compared with conventional digital image assessment indexes. By recognizing the difference in image quality from the application effect perspective, using a machine learning algorithm to extract regional gray scale features of typical objects in the image for analysis, and quantitatively assessing quality of ORSI according to the difference, this method provides a new approach for objective ORSI assessment.

  6. Performance comparison of machine learning methods for prognosis of hormone receptor status in breast cancer tissue samples.

    Science.gov (United States)

    Kalinli, Adem; Sarikoc, Fatih; Akgun, Hulya; Ozturk, Figen

    2013-06-01

    We examined the classification and prognostic scoring performances of several computer methods on different feature sets to obtain objective and reproducible analysis of estrogen receptor status in breast cancer tissue samples. Radial basis function network, k-nearest neighborhood search, support vector machines, naive bayes, functional trees, and k-means clustering algorithm were applied to the test datasets. Several features were employed and the classification accuracies of each method for these features were examined. The assessment results of the methods on test images were also experimentally compared with those of two experts. According to the results of our experimental work, a combination of functional trees and the naive bayes classifier gave the best prognostic scores indicating very good kappa agreement values (κ=0.899 and κ=0.949, p<0.001) with the experts. This combination also gave the best dichotomization rate (96.3%) for assessment of estrogen receptor status. Wavelet color features provided better classification accuracy than Laws texture energy and co-occurrence matrix features. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  7. A distributed algorithm for machine learning

    Science.gov (United States)

    Chen, Shihong

    2018-04-01

    This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.

  8. Interactive Algorithms for Unsupervised Machine Learning

    Science.gov (United States)

    2015-06-01

    in Neural Information Processing Systems, 2013. 14 [3] Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi. On combinato- rial...Myung Jin Choi, Vincent Y F Tan , Animashree Anandkumar, and Alan S Willsky. Learn- ing Latent Tree Graphical Models. Journal of Machine Learning

  9. Efficient tuning in supervised machine learning

    NARCIS (Netherlands)

    Koch, Patrick

    2013-01-01

    The tuning of learning algorithm parameters has become more and more important during the last years. With the fast growth of computational power and available memory databases have grown dramatically. This is very challenging for the tuning of parameters arising in machine learning, since the

  10. Applications of Machine Learning in Cancer Prediction and Prognosis

    Directory of Open Access Journals (Sweden)

    Joseph A. Cruz

    2006-01-01

    Full Text Available Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25% improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  11. Machine Learning Phases of Strongly Correlated Fermions

    Directory of Open Access Journals (Sweden)

    Kelvin Ch’ng

    2017-08-01

    Full Text Available Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling. We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  12. Improving Machining Accuracy of CNC Machines with Innovative Design Methods

    Science.gov (United States)

    Yemelyanov, N. V.; Yemelyanova, I. V.; Zubenko, V. L.

    2018-03-01

    The article considers achieving the machining accuracy of CNC machines by applying innovative methods in modelling and design of machining systems, drives and machine processes. The topological method of analysis involves visualizing the system as matrices of block graphs with a varying degree of detail between the upper and lower hierarchy levels. This approach combines the advantages of graph theory and the efficiency of decomposition methods, it also has visual clarity, which is inherent in both topological models and structural matrices, as well as the resiliency of linear algebra as part of the matrix-based research. The focus of the study is on the design of automated machine workstations, systems, machines and units, which can be broken into interrelated parts and presented as algebraic, topological and set-theoretical models. Every model can be transformed into a model of another type, and, as a result, can be interpreted as a system of linear and non-linear equations which solutions determine the system parameters. This paper analyses the dynamic parameters of the 1716PF4 machine at the stages of design and exploitation. Having researched the impact of the system dynamics on the component quality, the authors have developed a range of practical recommendations which have enabled one to reduce considerably the amplitude of relative motion, exclude some resonance zones within the spindle speed range of 0...6000 min-1 and improve machining accuracy.

  13. Serum levels of chemical elements in esophageal squamous cell carcinoma in Anyang, China: a case-control study based on machine learning methods.

    Science.gov (United States)

    Lin, Tong; Liu, Tiebing; Lin, Yucheng; Zhang, Chaoting; Yan, Lailai; Chen, Zhongxue; He, Zhonghu; Wang, Jingyu

    2017-09-24

    Esophageal squamous cell carcinoma (ESCC) is the predominant form of esophageal carcinoma with extremely aggressive nature and low survival rate. The risk factors for ESCC in the high-incidence areas of China remain unclear. We used machine learning methods to investigate whether there was an association between the alterations of serum levels of certain chemical elements and ESCC. Primary healthcare unit in Anyang city, Henan Province of China. 100 patients with ESCC and 100 healthy controls matched for age, sex and region were included. Primary outcome was the classification accuracy. Secondary outcome was the p Value of the t-test or rank-sum test. Both traditional statistical methods of t-test and rank-sum test and fashionable machine learning approaches were employed. Random Forest achieves the best accuracy of 98.38% on the original feature vectors (without dimensionality reduction), and support vector machine outperforms other classifiers by yielding accuracy of 96.56% on embedding spaces (with dimensionality reduction). All six classifiers can achieve accuracies more than 90% based on the single most important element Sr. The other two elements with distinctive difference are S and P, providing accuracies around 80%. More than half of chemical elements were found to be significantly different between patients with ESCC and the controls. These results suggest clear differences between patients with ESCC and controls, implying some potential promising applications in diagnosis, prognosis, pharmacy and nutrition of ESCC. However, the results should be interpreted with caution due to the retrospective design nature, limited sample size and the lack of several potential confounding factors (including obesity, nutritional status, and fruit and vegetable consumption and potential regional carcinogen contacts). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted

  14. Contemporary machine learning: techniques for practitioners in the physical sciences

    Science.gov (United States)

    Spears, Brian

    2017-10-01

    Machine learning is the science of using computers to find relationships in data without explicitly knowing or programming those relationships in advance. Often without realizing it, we employ machine learning every day as we use our phones or drive our cars. Over the last few years, machine learning has found increasingly broad application in the physical sciences. This most often involves building a model relationship between a dependent, measurable output and an associated set of controllable, but complicated, independent inputs. The methods are applicable both to experimental observations and to databases of simulated output from large, detailed numerical simulations. In this tutorial, we will present an overview of current tools and techniques in machine learning - a jumping-off point for researchers interested in using machine learning to advance their work. We will discuss supervised learning techniques for modeling complicated functions, beginning with familiar regression schemes, then advancing to more sophisticated decision trees, modern neural networks, and deep learning methods. Next, we will cover unsupervised learning and techniques for reducing the dimensionality of input spaces and for clustering data. We'll show example applications from both magnetic and inertial confinement fusion. Along the way, we will describe methods for practitioners to help ensure that their models generalize from their training data to as-yet-unseen test data. We will finally point out some limitations to modern machine learning and speculate on some ways that practitioners from the physical sciences may be particularly suited to help. This work was performed by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  15. Using Machine Learning to Predict MCNP Bias

    Energy Technology Data Exchange (ETDEWEB)

    Grechanuk, Pavel Aleksandrovi [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2018-01-09

    For many real-world applications in radiation transport where simulations are compared to experimental measurements, like in nuclear criticality safety, the bias (simulated - experimental keff) in the calculation is an extremely important quantity used for code validation. The objective of this project is to accurately predict the bias of MCNP6 [1] criticality calculations using machine learning (ML) algorithms, with the intention of creating a tool that can complement the current nuclear criticality safety methods. In the latest release of MCNP6, the Whisper tool is available for criticality safety analysts and includes a large catalogue of experimental benchmarks, sensitivity profiles, and nuclear data covariance matrices. This data, coming from 1100+ benchmark cases, is used in this study of ML algorithms for criticality safety bias predictions.

  16. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  17. Parsimonious Wavelet Kernel Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Wang Qin

    2015-11-01

    Full Text Available In this study, a parsimonious scheme for wavelet kernel extreme learning machine (named PWKELM was introduced by combining wavelet theory and a parsimonious algorithm into kernel extreme learning machine (KELM. In the wavelet analysis, bases that were localized in time and frequency to represent various signals effectively were used. Wavelet kernel extreme learning machine (WELM maximized its capability to capture the essential features in “frequency-rich” signals. The proposed parsimonious algorithm also incorporated significant wavelet kernel functions via iteration in virtue of Householder matrix, thus producing a sparse solution that eased the computational burden and improved numerical stability. The experimental results achieved from the synthetic dataset and a gas furnace instance demonstrated that the proposed PWKELM is efficient and feasible in terms of improving generalization accuracy and real time performance.

  18. Introduction to machine learning for brain imaging.

    Science.gov (United States)

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.

  19. Machine learning in the string landscape

    Science.gov (United States)

    Carifio, Jonathan; Halverson, James; Krioukov, Dmitri; Nelson, Brent D.

    2017-09-01

    We utilize machine learning to study the string landscape. Deep data dives and conjecture generation are proposed as useful frameworks for utilizing machine learning in the landscape, and examples of each are presented. A decision tree accurately predicts the number of weak Fano toric threefolds arising from reflexive polytopes, each of which determines a smooth F-theory compactification, and linear regression generates a previously proven conjecture for the gauge group rank in an ensemble of 4/3× 2.96× {10}^{755} F-theory compactifications. Logistic regression generates a new conjecture for when E 6 arises in the large ensemble of F-theory compactifications, which is then rigorously proven. This result may be relevant for the appearance of visible sectors in the ensemble. Through conjecture generation, machine learning is useful not only for numerics, but also for rigorous results.

  20. Biochemical Profile of Heritage and Modern Apple Cultivars and Application of Machine Learning Methods To Predict Usage, Age, and Harvest Season.

    Science.gov (United States)

    Anastasiadi, Maria; Mohareb, Fady; Redfern, Sally P; Berry, Mark; Simmonds, Monique S J; Terry, Leon A

    2017-07-05

    The present study represents the first major attempt to characterize the biochemical profile in different tissues of a large selection of apple cultivars sourced from the United Kingdom's National Fruit Collection comprising dessert, ornamental, cider, and culinary apples. Furthermore, advanced machine learning methods were applied with the objective to identify whether the phenolic and sugar composition of an apple cultivar could be used as a biomarker fingerprint to differentiate between heritage and mainstream commercial cultivars as well as govern the separation among primary usage groups and harvest season. A prediction accuracy of >90% was achieved with the random forest method for all three models. The results highlighted the extraordinary phytochemical potency and unique profile of some heritage, cider, and ornamental apple cultivars, especially in comparison to more mainstream apple cultivars. Therefore, these findings could guide future cultivar selection on the basis of health-promoting phytochemical content.

  1. Modeling Geomagnetic Variations using a Machine Learning Framework

    Science.gov (United States)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  2. Machine learning-based dual-energy CT parametric mapping.

    Science.gov (United States)

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Al Helo, Rose; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C; Rassouli, Negin; Gilkeson, Robert C; Traughber, Bryan J; Cheng, Chee-Wai; Muzic, Raymond F

    2018-05-22

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Zeff), relative electron density (ρe), mean excitation energy (Ix), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 seconds. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency. . © 2018 Institute of Physics and Engineering in

  3. Virtual Things for Machine Learning Applications

    OpenAIRE

    Bovet , Gérôme; Ridi , Antonio; Hennebert , Jean

    2014-01-01

    International audience; Internet-of-Things (IoT) devices, especially sensors are pro-ducing large quantities of data that can be used for gather-ing knowledge. In this field, machine learning technologies are increasingly used to build versatile data-driven models. In this paper, we present a novel architecture able to ex-ecute machine learning algorithms within the sensor net-work, presenting advantages in terms of privacy and data transfer efficiency. We first argument that some classes of ...

  4. Machine Learning Optimization of Evolvable Artificial Cells

    DEFF Research Database (Denmark)

    Caschera, F.; Rasmussen, S.; Hanczyc, M.

    2011-01-01

    can be explored. A machine learning approach (Evo-DoE) could be applied to explore this experimental space and define optimal interactions according to a specific fitness function. Herein an implementation of an evolutionary design of experiments to optimize chemical and biochemical systems based...... on a machine learning process is presented. The optimization proceeds over generations of experiments in iterative loop until optimal compositions are discovered. The fitness function is experimentally measured every time the loop is closed. Two examples of complex systems, namely a liposomal drug formulation...

  5. Machine learning enhanced optical distance sensor

    Science.gov (United States)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  6. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    Science.gov (United States)

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  7. MoleculeNet: a benchmark for molecular machine learning.

    Science.gov (United States)

    Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay

    2018-01-14

    Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

  8. Machine learning with quantum relative entropy

    Energy Technology Data Exchange (ETDEWEB)

    Tsuda, Koji [Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tuebingen, 72076 (Germany)], E-mail: koji.tsuda@tuebingen.mpg.de

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  9. Machine learning with quantum relative entropy

    International Nuclear Information System (INIS)

    Tsuda, Koji

    2009-01-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  10. Machine learning applied to crime prediction

    OpenAIRE

    Vaquero Barnadas, Miquel

    2016-01-01

    Machine Learning is a cornerstone when it comes to artificial intelligence and big data analysis. It provides powerful algorithms that are capable of recognizing patterns, classifying data, and, basically, learn by themselves to perform a specific task. This field has incredibly grown in popularity these days, however, it still remains unknown for the majority of people, and even for most professionals. This project intends to provide an understandable explanation of what is it, what types ar...

  11. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    Science.gov (United States)

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  12. Quantum machine learning for quantum anomaly detection

    Science.gov (United States)

    Liu, Nana; Rebentrost, Patrick

    2018-04-01

    Anomaly detection is used for identifying data that deviate from "normal" data patterns. Its usage on classical data finds diverse applications in many important areas such as finance, fraud detection, medical diagnoses, data cleaning, and surveillance. With the advent of quantum technologies, anomaly detection of quantum data, in the form of quantum states, may become an important component of quantum applications. Machine-learning algorithms are playing pivotal roles in anomaly detection using classical data. Two widely used algorithms are the kernel principal component analysis and the one-class support vector machine. We find corresponding quantum algorithms to detect anomalies in quantum states. We show that these two quantum algorithms can be performed using resources that are logarithmic in the dimensionality of quantum states. For pure quantum states, these resources can also be logarithmic in the number of quantum states used for training the machine-learning algorithm. This makes these algorithms potentially applicable to big quantum data applications.

  13. Applying machine learning methods for characterization of hexagonal prisms from their 2D scattering patterns - an investigation using modelled scattering data

    Science.gov (United States)

    Salawu, Emmanuel Oluwatobi; Hesse, Evelyn; Stopford, Chris; Davey, Neil; Sun, Yi

    2017-11-01

    Better understanding and characterization of cloud particles, whose properties and distributions affect climate and weather, are essential for the understanding of present climate and climate change. Since imaging cloud probes have limitations of optical resolution, especially for small particles (with diameter < 25 μm), instruments like the Small Ice Detector (SID) probes, which capture high-resolution spatial light scattering patterns from individual particles down to 1 μm in size, have been developed. In this work, we have proposed a method using Machine Learning techniques to estimate simulated particles' orientation-averaged projected sizes (PAD) and aspect ratio from their 2D scattering patterns. The two-dimensional light scattering patterns (2DLSP) of hexagonal prisms are computed using the Ray Tracing with Diffraction on Facets (RTDF) model. The 2DLSP cover the same angular range as the SID probes. We generated 2DLSP for 162 hexagonal prisms at 133 orientations for each. In a first step, the 2DLSP were transformed into rotation-invariant Zernike moments (ZMs), which are particularly suitable for analyses of pattern symmetry. Then we used ZMs, summed intensities, and root mean square contrast as inputs to the advanced Machine Learning methods. We created one random forests classifier for predicting prism orientation, 133 orientation-specific (OS) support vector classification models for predicting the prism aspect-ratios, 133 OS support vector regression models for estimating prism sizes, and another 133 OS Support Vector Regression (SVR) models for estimating the size PADs. We have achieved a high accuracy of 0.99 in predicting prism aspect ratios, and a low value of normalized mean square error of 0.004 for estimating the particle's size and size PADs.

  14. Machine learning and data science in soft materials engineering.

    Science.gov (United States)

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  15. Machine learning and data science in soft materials engineering

    Science.gov (United States)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  16. Machine learning on geospatial big data

    CSIR Research Space (South Africa)

    Van Zyl, T

    2014-02-01

    Full Text Available When trying to understand the difference between machine learning and statistics, it is important to note that it is not so much the set of techniques and theory that are used but more importantly the intended use of the results. In fact, many...

  17. ML Confidential : machine learning on encrypted data

    NARCIS (Netherlands)

    Graepel, T.; Lauter, K.; Naehrig, M.; Kwon, T.; Lee, M.-K.; Kwon, D.

    2013-01-01

    We demonstrate that, by using a recently proposed leveled homomorphic encryption scheme, it is possible to delegate the execution of a machine learning algorithm to a computing service while retaining con¿dentiality of the training and test data. Since the computational complexity of the homomorphic

  18. Machine Learning for Flapping Wing Flight Control

    NARCIS (Netherlands)

    Goedhart, Menno; van Kampen, E.; Armanini, S.F.; de Visser, C.C.; Chu, Q.

    2018-01-01

    Flight control of Flapping Wing Micro Air Vehicles is challenging, because of their complex dynamics and variability due to manufacturing inconsistencies. Machine Learning algorithms can be used to tackle these challenges. A Policy Gradient algorithm is used to tune the gains of a

  19. ML Confidential : machine learning on encrypted data

    NARCIS (Netherlands)

    Graepel, T.; Lauter, K.; Naehrig, M.

    2012-01-01

    We demonstrate that by using a recently proposed somewhat homomorphic encryption (SHE) scheme it is possible to delegate the execution of a machine learning (ML) algorithm to a compute service while retaining confidentiality of the training and test data. Since the computational complexity of the

  20. Document Classification Using Distributed Machine Learning

    OpenAIRE

    Aydin, Galip; Hallac, Ibrahim Riza

    2018-01-01

    In this paper, we investigate the performance and success rates of Na\\"ive Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.

  1. The ATLAS Higgs Machine Learning Challenge

    CERN Document Server

    Cowan, Glen; The ATLAS collaboration; Bourdarios, Claire

    2015-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 1990s with Artificial Neural Net and more recently with Boosted Decision Trees, Random Forest etc. Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, data scientists are developing new Machine Learning algorithms to extract meaning from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, and at the same time data scientists have advanced algorithms: the goal of the HiggsML project was to bring the two together by a “challenge”: participants from all over the world and any scientific background could compete online to obtain the best Higgs to tau tau signal significance on a set of ATLAS fully simulated Monte Carlo signal and background. Instead of HEP physicists browsing through machine learning papers and trying to infer which new algorithms might be useful for HEP, then c...

  2. Parallelization of TMVA Machine Learning Algorithms

    CERN Document Server

    Hajili, Mammad

    2017-01-01

    This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.

  3. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  4. Supporting visual quality assessment with machine learning

    NARCIS (Netherlands)

    Gastaldo, P.; Zunino, R.; Redi, J.

    2013-01-01

    Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality

  5. Machine learning a theoretical approach

    CERN Document Server

    Natarajan, Balas K

    2014-01-01

    This is the first comprehensive introduction to computational learning theory. The author's uniform presentation of fundamental results and their applications offers AI researchers a theoretical perspective on the problems they study. The book presents tools for the analysis of probabilistic models of learning, tools that crisply classify what is and is not efficiently learnable. After a general introduction to Valiant's PAC paradigm and the important notion of the Vapnik-Chervonenkis dimension, the author explores specific topics such as finite automata and neural networks. The presentation

  6. Machine Learning and Quantum Mechanics

    Science.gov (United States)

    Chapline, George

    The author has previously pointed out some similarities between selforganizing neural networks and quantum mechanics. These types of neural networks were originally conceived of as away of emulating the cognitive capabilities of the human brain. Recently extensions of these networks, collectively referred to as deep learning networks, have strengthened the connection between self-organizing neural networks and human cognitive capabilities. In this note we consider whether hardware quantum devices might be useful for emulating neural networks with human-like cognitive capabilities, or alternatively whether implementations of deep learning neural networks using conventional computers might lead to better algorithms for solving the many body Schrodinger equation.

  7. Machine learning and computer vision approaches for phenotypic profiling.

    Science.gov (United States)

    Grys, Ben T; Lo, Dara S; Sahin, Nil; Kraus, Oren Z; Morris, Quaid; Boone, Charles; Andrews, Brenda J

    2017-01-02

    With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. © 2017 Grys et al.

  8. Advances in independent component analysis and learning machines

    CERN Document Server

    Bingham, Ella; Laaksonen, Jorma; Lampinen, Jouko

    2015-01-01

    In honour of Professor Erkki Oja, one of the pioneers of Independent Component Analysis (ICA), this book reviews key advances in the theory and application of ICA, as well as its influence on signal processing, pattern recognition, machine learning, and data mining. Examples of topics which have developed from the advances of ICA, which are covered in the book are: A unifying probabilistic model for PCA and ICA Optimization methods for matrix decompositions Insights into the FastICA algorithmUnsupervised deep learning Machine vision and image retrieval A review of developments in the t

  9. Energy landscapes for a machine learning application to series data

    Energy Technology Data Exchange (ETDEWEB)

    Ballard, Andrew J.; Stevenson, Jacob D.; Das, Ritankar; Wales, David J., E-mail: dw34@cam.ac.uk [University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW (United Kingdom)

    2016-03-28

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  10. Energy landscapes for a machine learning application to series data

    International Nuclear Information System (INIS)

    Ballard, Andrew J.; Stevenson, Jacob D.; Das, Ritankar; Wales, David J.

    2016-01-01

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  11. Machine Learning Applications to Resting-State Functional MR Imaging Analysis.

    Science.gov (United States)

    Billings, John M; Eder, Maxwell; Flood, William C; Dhami, Devendra Singh; Natarajan, Sriraam; Whitlow, Christopher T

    2017-11-01

    Machine learning is one of the most exciting and rapidly expanding fields within computer science. Academic and commercial research entities are investing in machine learning methods, especially in personalized medicine via patient-level classification. There is great promise that machine learning methods combined with resting state functional MR imaging will aid in diagnosis of disease and guide potential treatment for conditions thought to be impossible to identify based on imaging alone, such as psychiatric disorders. We discuss machine learning methods and explore recent advances. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  13. Evaluation on knowledge extraction and machine learning in ...

    African Journals Online (AJOL)

    Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity. ... No 5S (2017) >. Log in or Register to get access to full text downloads. ... Keywords: ambiguity; lexical knowledge; machine learning; Malay word ...

  14. Financial signal processing and machine learning

    CERN Document Server

    Kulkarni,Sanjeev R; Dmitry M. Malioutov

    2016-01-01

    The modern financial industry has been required to deal with large and diverse portfolios in a variety of asset classes often with limited market data available. Financial Signal Processing and Machine Learning unifies a number of recent advances made in signal processing and machine learning for the design and management of investment portfolios and financial engineering. This book bridges the gap between these disciplines, offering the latest information on key topics including characterizing statistical dependence and correlation in high dimensions, constructing effective and robust risk measures, and their use in portfolio optimization and rebalancing. The book focuses on signal processing approaches to model return, momentum, and mean reversion, addressing theoretical and implementation aspects. It highlights the connections between portfolio theory, sparse learning and compressed sensing, sparse eigen-portfolios, robust optimization, non-Gaussian data-driven risk measures, graphical models, causal analy...

  15. MLnet report: training in Europe on machine learning

    OpenAIRE

    Ellebrecht, Mario; Morik, Katharina

    1999-01-01

    Machine learning techniques offer opportunities for a variety of applications and the theory of machine learning investigates problems that are of interest for other fields of computer science (e.g., complexity theory, logic programming, pattern recognition). However, the impacts of machine learning can only be recognized by those who know the techniques and are able to apply them. Hence, teaching machine learning is necessary before this field can diversify computer science. In order ...

  16. Optimal interference code based on machine learning

    Science.gov (United States)

    Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

    2016-10-01

    In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.

  17. A Machine Learning Concept for DTN Routing

    Science.gov (United States)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  18. A machine learning approach to the accurate prediction of monitor units for a compact proton machine.

    Science.gov (United States)

    Sun, Baozhou; Lam, Dao; Yang, Deshan; Grantham, Kevin; Zhang, Tiezhi; Mutic, Sasa; Zhao, Tianyu

    2018-05-01

    Clinical treatment planning systems for proton therapy currently do not calculate monitor units (MUs) in passive scatter proton therapy due to the complexity of the beam delivery systems. Physical phantom measurements are commonly employed to determine the field-specific output factors (OFs) but are often subject to limited machine time, measurement uncertainties and intensive labor. In this study, a machine learning-based approach was developed to predict output (cGy/MU) and derive MUs, incorporating the dependencies on gantry angle and field size for a single-room proton therapy system. The goal of this study was to develop a secondary check tool for OF measurements and eventually eliminate patient-specific OF measurements. The OFs of 1754 fields previously measured in a water phantom with calibrated ionization chambers and electrometers for patient-specific fields with various range and modulation width combinations for 23 options were included in this study. The training data sets for machine learning models in three different methods (Random Forest, XGBoost and Cubist) included 1431 (~81%) OFs. Ten-fold cross-validation was used to prevent "overfitting" and to validate each model. The remaining 323 (~19%) OFs were used to test the trained models. The difference between the measured and predicted values from machine learning models was analyzed. Model prediction accuracy was also compared with that of the semi-empirical model developed by Kooy (Phys. Med. Biol. 50, 2005). Additionally, gantry angle dependence of OFs was measured for three groups of options categorized on the selection of the second scatters. Field size dependence of OFs was investigated for the measurements with and without patient-specific apertures. All three machine learning methods showed higher accuracy than the semi-empirical model which shows considerably large discrepancy of up to 7.7% for the treatment fields with full range and full modulation width. The Cubist-based solution

  19. Geometrical methods in learning theory

    International Nuclear Information System (INIS)

    Burdet, G.; Combe, Ph.; Nencka, H.

    2001-01-01

    The methods of information theory provide natural approaches to learning algorithms in the case of stochastic formal neural networks. Most of the classical techniques are based on some extremization principle. A geometrical interpretation of the associated algorithms provides a powerful tool for understanding the learning process and its stability and offers a framework for discussing possible new learning rules. An illustration is given using sequential and parallel learning in the Boltzmann machine

  20. Machine learning for network-based malware detection

    DEFF Research Database (Denmark)

    Stevanovic, Matija

    and based on different, mutually complementary, principles of traffic analysis. The proposed approaches rely on machine learning algorithms (MLAs) for automated and resource-efficient identification of the patterns of malicious network traffic. We evaluated the proposed methods through extensive evaluations...

  1. RSO Characterization with Photometric Data Using Machine Learning

    Science.gov (United States)

    2015-10-18

    RSO Characterization with Photometric Data Using Machine Learning Michael Howard Charles River Analytics, Inc. Bernie Klem SASSO, Inc. Joe...and its behavior. This paper explores object characterization methods using photometric data. An important property of RSO photometric signatures is... photometric signature include geometry, orientation, material characteristics and stability. For this reason, it should be possible to recover these

  2. Machine learning of the reactor core loading pattern critical parameters

    International Nuclear Information System (INIS)

    Trontl, K.; Pevec, D.; Smuc, T.

    2007-01-01

    The usual approach to loading pattern optimization involves high degree of engineering judgment, a set of heuristic rules, an optimization algorithm and a computer code used for evaluating proposed loading patterns. The speed of the optimization process is highly dependent on the computer code used for the evaluation. In this paper we investigate the applicability of a machine learning model which could be used for fast loading pattern evaluation. We employed a recently introduced machine learning technique, Support Vector Regression (SVR), which has a strong theoretical background in statistical learning theory. Superior empirical performance of the method has been reported on difficult regression problems in different fields of science and technology. SVR is a data driven, kernel based, nonlinear modelling paradigm, in which model parameters are automatically determined by solving a quadratic optimization problem. The main objective of the work reported in this paper was to evaluate the possibility of applying SVR method for reactor core loading pattern modelling. The starting set of experimental data for training and testing of the machine learning algorithm was obtained using a two-dimensional diffusion theory reactor physics computer code. We illustrate the performance of the solution and discuss its applicability, i.e., complexity, speed and accuracy, with a projection to a more realistic scenario involving machine learning from the results of more accurate and time consuming three-dimensional core modelling code. (author)

  3. Improved Extreme Learning Machine based on the Sensitivity Analysis

    Science.gov (United States)

    Cui, Licheng; Zhai, Huawei; Wang, Benchao; Qu, Zengtang

    2018-03-01

    Extreme learning machine and its improved ones is weak in some points, such as computing complex, learning error and so on. After deeply analyzing, referencing the importance of hidden nodes in SVM, an novel analyzing method of the sensitivity is proposed which meets people’s cognitive habits. Based on these, an improved ELM is proposed, it could remove hidden nodes before meeting the learning error, and it can efficiently manage the number of hidden nodes, so as to improve the its performance. After comparing tests, it is better in learning time, accuracy and so on.

  4. (Machine) learning to do more with less

    Science.gov (United States)

    Cohen, Timothy; Freytsis, Marat; Ostdiek, Bryan

    2018-02-01

    Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard "fully supervised" approach (which relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called "weakly supervised" technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to a class of systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided on GitHub.

  5. Machine Learning and Inverse Problem in Geodynamics

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R.

    2017-12-01

    During the past few decades numerical modeling and traditional HPC have been widely deployed in many diverse fields for problem solutions. However, in recent years the rapid emergence of machine learning (ML), a subfield of the artificial intelligence (AI), in many fields of sciences, engineering, and finance seems to mark a turning point in the replacement of traditional modeling procedures with artificial intelligence-based techniques. The study of the circulation in the interior of Earth relies on the study of high pressure mineral physics, geochemistry, and petrology where the number of the mantle parameters is large and the thermoelastic parameters are highly pressure- and temperature-dependent. More complexity arises from the fact that many of these parameters that are incorporated in the numerical models as input parameters are not yet well established. In such complex systems the application of machine learning algorithms can play a valuable role. Our focus in this study is the application of supervised machine learning (SML) algorithms in predicting mantle properties with the emphasis on SML techniques in solving the inverse problem. As a sample problem we focus on the spin transition in ferropericlase and perovskite that may cause slab and plume stagnation at mid-mantle depths. The degree of the stagnation depends on the degree of negative density anomaly at the spin transition zone. The training and testing samples for the machine learning models are produced by the numerical convection models with known magnitudes of density anomaly (as the class labels of the samples). The volume fractions of the stagnated slabs and plumes which can be considered as measures for the degree of stagnation are assigned as sample features. The machine learning models can determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at mid-mantle depths. Employing support vector machine (SVM) algorithms we show that SML techniques

  6. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

    Science.gov (United States)

    Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao

    2017-11-01

    Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Machine learning analysis of binaural rowing sounds

    DEFF Research Database (Denmark)

    Johard, Leonard; Ruffaldi, Emanuele; Hoffmann, Pablo F.

    2011-01-01

    Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition metho...... methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train.......Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition...

  8. Electrical machining method of insulating ceramics

    International Nuclear Information System (INIS)

    Fukuzawa, Y.; Mohri, N.; Tani, T.

    1999-01-01

    This paper describes a new electrical discharge machining method for insulating ceramics using an assisting electrode with either a sinking electrical discharge machine or a wire electrical discharge machine. In this method, the metal sheet or mesh is attached to the ceramic surface as an assisting material for the discharge generation around the insulator surface. When the machining condition changes from the attached material to the workpiece, a cracked carbon layer is formed on the workpiece surface. As this layer has an electrical conductivity, electrical discharge occurs in working oil between the tool electrode and the surface of the workpiece. The carbon is formed from the working oil during this electrical discharge. Even after the material is machined, an electrical discharge occurs in the gap region between the tool electrode and the ceramic because an electrically conductive layer is generated continuously. Insulating ceramics can be machined by the electrical discharge machining method using the above mentioned surface modification phenomenon. In this paper the authors show a machined example demonstrating that the proposed method is available for machining a complex shape on insulating ceramics. Copyright (1999) AD-TECH - International Foundation for the Advancement of Technology Ltd

  9. A Teaching System To Learn Programming: the Programmer's Learning Machine

    OpenAIRE

    Quinson , Martin; Oster , Gérald

    2015-01-01

    International audience; The Programmer's Learning Machine (PLM) is an interactive exerciser for learning programming and algorithms. Using an integrated and graphical environment that provides a short feedback loop, it allows students to learn in a (semi)-autonomous way. This generic platform also enables teachers to create specific programming microworlds that match their teaching goals. This paper discusses our design goals and motivations, introduces the existing material and the proposed ...

  10. Comparative Study on Theoretical and Machine Learning Methods for Acquiring Compressed Liquid Densities of 1,1,1,2,3,3,3-Heptafluoropropane (R227ea via Song and Mason Equation, Support Vector Machine, and Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Hao Li

    2016-01-01

    Full Text Available 1,1,1,2,3,3,3-Heptafluoropropane (R227ea is a good refrigerant that reduces greenhouse effects and ozone depletion. In practical applications, we usually have to know the compressed liquid densities at different temperatures and pressures. However, the measurement requires a series of complex apparatus and operations, wasting too much manpower and resources. To solve these problems, here, Song and Mason equation, support vector machine (SVM, and artificial neural networks (ANNs were used to develop theoretical and machine learning models, respectively, in order to predict the compressed liquid densities of R227ea with only the inputs of temperatures and pressures. Results show that compared with the Song and Mason equation, appropriate machine learning models trained with precise experimental samples have better predicted results, with lower root mean square errors (RMSEs (e.g., the RMSE of the SVM trained with data provided by Fedele et al. [1] is 0.11, while the RMSE of the Song and Mason equation is 196.26. Compared to advanced conventional measurements, knowledge-based machine learning models are proved to be more time-saving and user-friendly.

  11. Outsmarting neural networks: an alternative paradigm for machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  12. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution.

    Science.gov (United States)

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis.

  13. Manifold learning in machine vision and robotics

    Science.gov (United States)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  14. Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    Science.gov (United States)

    Howard, Rebecca; Rattray, Magnus; Prosperi, Mattia; Custovic, Adnan

    2015-07-01

    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as 'asthma endotypes'. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies.

  15. Designing anticancer peptides by constructive machine learning.

    Science.gov (United States)

    Grisoni, Francesca; Neuhaus, Claudia; Gabernet, Gisela; Müller, Alex; Hiss, Jan; Schneider, Gisbert

    2018-04-21

    Constructive machine learning enables the automated generation of novel chemical structures without the need for explicit molecular design rules. This study presents the experimental application of such a generative model to design membranolytic anticancer peptides (ACPs) de novo. A recurrent neural network with long short-term memory cells was trained on alpha-helical cationic amphipathic peptide sequences and then fine-tuned with 26 known ACPs. This optimized model was used to generate unique and novel amino acid sequences. Twelve of the peptides were synthesized and tested for their activity on MCF7 human breast adenocarcinoma cells and selectivity against human erythrocytes. Ten of these peptides were active against cancer cells. Six of the active peptides killed MCF7 cancer cells without affecting human erythrocytes with at least threefold selectivity. These results advocate constructive machine learning for the automated design of peptides with desired biological activities. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Machine Learning for ATLAS DDM Network Metrics

    CERN Document Server

    Lassnig, Mario; The ATLAS collaboration; Vamosi, Ralf

    2016-01-01

    The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from our ongoing automation efforts. First, we describe our framework for distributed data management and network metrics, automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  17. Machine learning for micro-tomography

    Science.gov (United States)

    Parkinson, Dilworth Y.; Pelt, Daniël. M.; Perciano, Talita; Ushizima, Daniela; Krishnan, Harinarayan; Barnard, Harold S.; MacDowell, Alastair A.; Sethian, James

    2017-09-01

    Machine learning has revolutionized a number of fields, but many micro-tomography users have never used it for their work. The micro-tomography beamline at the Advanced Light Source (ALS), in collaboration with the Center for Applied Mathematics for Energy Research Applications (CAMERA) at Lawrence Berkeley National Laboratory, has now deployed a series of tools to automate data processing for ALS users using machine learning. This includes new reconstruction algorithms, feature extraction tools, and image classification and recommen- dation systems for scientific image. Some of these tools are either in automated pipelines that operate on data as it is collected or as stand-alone software. Others are deployed on computing resources at Berkeley Lab-from workstations to supercomputers-and made accessible to users through either scripting or easy-to-use graphical interfaces. This paper presents a progress report on this work.

  18. The development of damage identification methods for buildings with image recognition and machine learning techniques utilizing aerial photographs of the 2016 Kumamoto earthquake

    Science.gov (United States)

    Shohei, N.; Nakamura, H.; Fujiwara, H.; Naoichi, M.; Hiromitsu, T.

    2017-12-01

    It is important to get schematic information of the damage situation immediately after the earthquake utilizing photographs shot from an airplane in terms of the investigation and the decision-making for authorities. In case of the 2016 Kumamoto earthquake, we have acquired more than 1,800 orthographic projection photographs adjacent to damaged areas. These photos have taken between April 16th and 19th by airplanes, then we have distinguished damages of all buildings with 4 levels, and organized as approximately 296,000 GIS data corresponding to the fundamental Geospatial data published by Geospatial Information Authority of Japan. These data have organized by effort of hundreds of engineers. However, it is not considered practical for more extensive disasters like the Nankai Trough earthquake by only human powers. So, we have been developing the automatic damage identification method utilizing image recognition and machine learning techniques. First, we have extracted training data of more than 10,000 buildings which have equally damage levels divided in 4 grades. With these training data, we have been raster scanning in each scanning ranges of entire images, then clipping patch images which represents damage levels each. By utilizing these patch images, we have been developing discriminant models by two ways. One is a model using the Support Vector Machine (SVM). First, extract a feature quantity of each patch images. Then, with these vector values, calculate the histogram density as a method of Bag of Visual Words (BoVW), then classify borders with each damage grades by SVM. The other one is a model using the multi-layered Neural Network. First, design a multi-layered Neural Network. Second, input patch images and damage levels based on a visual judgement, and then, optimize learning parameters with error backpropagation method. By use of both discriminant models, we are going to discriminate damage levels in each patches, then create the image that shows

  19. Simulation-driven machine learning: Bearing fault classification

    Science.gov (United States)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  20. Network anomaly detection a machine learning perspective

    CERN Document Server

    Bhattacharyya, Dhruba Kumar

    2013-01-01

    With the rapid rise in the ubiquity and sophistication of Internet technology and the accompanying growth in the number of network attacks, network intrusion detection has become increasingly important. Anomaly-based network intrusion detection refers to finding exceptional or nonconforming patterns in network traffic data compared to normal behavior. Finding these anomalies has extensive applications in areas such as cyber security, credit card and insurance fraud detection, and military surveillance for enemy activities. Network Anomaly Detection: A Machine Learning Perspective presents mach