WorldWideScience

Sample records for machine-learning based methods

  1. Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method

    Directory of Open Access Journals (Sweden)

    Muneki Yasuda

    2018-04-01

    Full Text Available The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the most important models in Markov random fields. However, the inference and learning problems in the Boltzmann machine are NP-hard. The investigation of an effective learning algorithm for the Boltzmann machine is one of the most important challenges in the field of statistical machine learning. In this paper, we study Boltzmann machine learning based on the (first-order spatial Monte Carlo integration method, referred to as the 1-SMCI learning method, which was proposed in the author’s previous paper. In the first part of this paper, we compare the method with the maximum pseudo-likelihood estimation (MPLE method using a theoretical and a numerical approaches, and show the 1-SMCI learning method is more effective than the MPLE. In the latter part, we compare the 1-SMCI learning method with other effective methods, ratio matching and minimum probability flow, using a numerical experiment, and show the 1-SMCI learning method outperforms them.

  2. Model-based machine learning.

    Science.gov (United States)

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  3. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  4. Performance of machine learning methods for ligand-based virtual screening.

    Science.gov (United States)

    Plewczynski, Dariusz; Spieser, Stéphane A H; Koch, Uwe

    2009-05-01

    Computational screening of compound databases has become increasingly popular in pharmaceutical research. This review focuses on the evaluation of ligand-based virtual screening using active compounds as templates in the context of drug discovery. Ligand-based screening techniques are based on comparative molecular similarity analysis of compounds with known and unknown activity. We provide an overview of publications that have evaluated different machine learning methods, such as support vector machines, decision trees, ensemble methods such as boosting, bagging and random forests, clustering methods, neuronal networks, naïve Bayesian, data fusion methods and others.

  5. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    Science.gov (United States)

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  6. In silico machine learning methods in drug development.

    Science.gov (United States)

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases.

  7. BEBP: An Poisoning Method Against Machine Learning Based IDSs

    OpenAIRE

    Li, Pan; Liu, Qiang; Zhao, Wentao; Wang, Dongxu; Wang, Siqi

    2018-01-01

    In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). However, practical IDSs generally update their decision module by feeding new data then retraining learning models in a periodical way. Hence, some attacks that comprise the data for training or testing classifiers significantly challenge the detecting capability of machine learning-based IDSs. Poisoning attack, which is one of the most recognized security threats towards machine learning...

  8. Machine learning-based methods for prediction of linear B-cell epitopes.

    Science.gov (United States)

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  9. Improved method for SNR prediction in machine-learning-based test

    NARCIS (Netherlands)

    Sheng, Xiaoqin; Kerkhoff, Hans G.

    2010-01-01

    This paper applies an improved method for testing the signal-to-noise ratio (SNR) of Analogue-to-Digital Converters (ADC). In previous work, a noisy and nonlinear pulse signal is exploited as the input stimulus to obtain the signature results of ADC. By applying a machine-learning-based approach,

  10. Gamma/hadron segregation for a ground based imaging atmospheric Cherenkov telescope using machine learning methods: Random Forest leads

    International Nuclear Information System (INIS)

    Sharma Mradul; Koul Maharaj Krishna; Mitra Abhas; Nayak Jitadeepa; Bose Smarajit

    2014-01-01

    A detailed case study of γ-hadron segregation for a ground based atmospheric Cherenkov telescope is presented. We have evaluated and compared various supervised machine learning methods such as the Random Forest method, Artificial Neural Network, Linear Discriminant method, Naive Bayes Classifiers, Support Vector Machines as well as the conventional dynamic supercut method by simulating triggering events with the Monte Carlo method and applied the results to a Cherenkov telescope. It is demonstrated that the Random Forest method is the most sensitive machine learning method for γ-hadron segregation. (research papers)

  11. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles.

    Science.gov (United States)

    Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua

    2018-04-25

    Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.

  12. An Interactive Web-based Learning System for Assisting Machining Technology Education

    Directory of Open Access Journals (Sweden)

    Min Jou

    2008-05-01

    Full Text Available The key technique of manufacturing methods is machining. The degree of technique of machining directly affects the quality of the product. Therefore, the machining technique is of primary importance in promoting student practice ability during the training process. Currently, practical training is applied in shop floor to discipline student’s practice ability. Much time and cost are used to teach these techniques. Particularly, computerized machines are continuously increasing in use. The development of educating engineers on computerized machines becomes much more difficult than with traditional machines. This is because of the limitation of the extremely expensive cost of teaching. The quality and quantity of teaching cannot always be promoted in this respect. The traditional teaching methods can not respond well to the needs of the future. Therefore, this research aims to the following topics; (1.Propose the teaching strategies for the students to learning machining processing planning through web-based learning system. (2.Establish on-line teaching material for the computer-aided manufacturing courses including CNC coding method, CNC simulation. (3.Develop the virtual machining laboratory to bring the machining practical training to web-based learning system. (4.Integrate multi-media and virtual laboratory in the developed e-learning web-based system to enhance the effectiveness of machining education through web-based system.

  13. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  14. Machine learning-based dual-energy CT parametric mapping.

    Science.gov (United States)

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Al Helo, Rose; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C; Rassouli, Negin; Gilkeson, Robert C; Traughber, Bryan J; Cheng, Chee-Wai; Muzic, Raymond F

    2018-05-22

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Zeff), relative electron density (ρe), mean excitation energy (Ix), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 seconds. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency. . © 2018 Institute of Physics and Engineering in

  15. A multilevel-ROI-features-based machine learning method for detection of morphometric biomarkers in Parkinson's disease.

    Science.gov (United States)

    Peng, Bo; Wang, Suhong; Zhou, Zhiyong; Liu, Yan; Tong, Baotong; Zhang, Tao; Dai, Yakang

    2017-06-09

    Machine learning methods have been widely used in recent years for detection of neuroimaging biomarkers in regions of interest (ROIs) and assisting diagnosis of neurodegenerative diseases. The innovation of this study is to use multilevel-ROI-features-based machine learning method to detect sensitive morphometric biomarkers in Parkinson's disease (PD). Specifically, the low-level ROI features (gray matter volume, cortical thickness, etc.) and high-level correlative features (connectivity between ROIs) are integrated to construct the multilevel ROI features. Filter- and wrapper- based feature selection method and multi-kernel support vector machine (SVM) are used in the classification algorithm. T1-weighted brain magnetic resonance (MR) images of 69 PD patients and 103 normal controls from the Parkinson's Progression Markers Initiative (PPMI) dataset are included in the study. The machine learning method performs well in classification between PD patients and normal controls with an accuracy of 85.78%, a specificity of 87.79%, and a sensitivity of 87.64%. The most sensitive biomarkers between PD patients and normal controls are mainly distributed in frontal lobe, parental lobe, limbic lobe, temporal lobe, and central region. The classification performance of our method with multilevel ROI features is significantly improved comparing with other classification methods using single-level features. The proposed method shows promising identification ability for detecting morphometric biomarkers in PD, thus confirming the potentiality of our method in assisting diagnosis of the disease. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  17. Housing Value Forecasting Based on Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Jingyi Mu

    2014-01-01

    Full Text Available In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM, least squares support vector machine (LSSVM, and partial least squares (PLS methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.

  18. Deep learning versus traditional machine learning methods for aggregated energy demand prediction

    NARCIS (Netherlands)

    Paterakis, N.G.; Mocanu, E.; Gibescu, M.; Stappers, B.; van Alst, W.

    2018-01-01

    In this paper the more advanced, in comparison with traditional machine learning approaches, deep learning methods are explored with the purpose of accurately predicting the aggregated energy consumption. Despite the fact that a wide range of machine learning methods have been applied to

  19. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

    Science.gov (United States)

    Zeng, Xueqiang; Luo, Gang

    2017-12-01

    Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

  20. Studying depression using imaging and machine learning methods.

    Science.gov (United States)

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  1. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides

    Directory of Open Access Journals (Sweden)

    Stanislawski Jerzy

    2013-01-01

    Full Text Available Abstract Background Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%. The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile to 0.5 CPU-hours (simplified 3D profile to seconds (machine learning. Conclusions We showed that the simplified profile generation method does not introduce an error with regard to the original method, while

  2. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.

    Science.gov (United States)

    Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd

    2013-01-17

    Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset

  3. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    Science.gov (United States)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  4. Housing Value Forecasting Based on Machine Learning Methods

    OpenAIRE

    Mu, Jingyi; Wu, Fang; Zhang, Aihua

    2014-01-01

    In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing...

  5. Machine learning for network-based malware detection

    DEFF Research Database (Denmark)

    Stevanovic, Matija

    and based on different, mutually complementary, principles of traffic analysis. The proposed approaches rely on machine learning algorithms (MLAs) for automated and resource-efficient identification of the patterns of malicious network traffic. We evaluated the proposed methods through extensive evaluations...

  6. Studying depression using imaging and machine learning methods

    Directory of Open Access Journals (Sweden)

    Meenal J. Patel

    2016-01-01

    Full Text Available Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1 presents a background on depression, imaging, and machine learning methodologies; (2 reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3 suggests directions for future depression-related studies.

  7. Use of machine learning methods to classify Universities based on the income structure

    Science.gov (United States)

    Terlyga, Alexandra; Balk, Igor

    2017-10-01

    In this paper we discuss use of machine learning methods such as self organizing maps, k-means and Ward’s clustering to perform classification of universities based on their income. This classification will allow us to quantitate classification of universities as teaching, research, entrepreneur, etc. which is important tool for government, corporations and general public alike in setting expectation and selecting universities to achieve different goals.

  8. DROUGHT FORECASTING BASED ON MACHINE LEARNING OF REMOTE SENSING AND LONG-RANGE FORECAST DATA

    Directory of Open Access Journals (Sweden)

    J. Rhee

    2016-06-01

    Full Text Available The reduction of drought impacts may be achieved through sustainable drought management and proactive measures against drought disaster. Accurate and timely provision of drought information is essential. In this study, drought forecasting models to provide high-resolution drought information based on drought indicators for ungauged areas were developed. The developed models predict drought indices of the 6-month Standardized Precipitation Index (SPI6 and the 6-month Standardized Precipitation Evapotranspiration Index (SPEI6. An interpolation method based on multiquadric spline interpolation method as well as three machine learning models were tested. Three machine learning models of Decision Tree, Random Forest, and Extremely Randomized Trees were tested to enhance the provision of drought initial conditions based on remote sensing data, since initial conditions is one of the most important factors for drought forecasting. Machine learning-based methods performed better than interpolation methods for both classification and regression, and the methods using climatology data outperformed the methods using long-range forecast. The model based on climatological data and the machine learning method outperformed overall.

  9. Sample-Based Extreme Learning Machine with Missing Data

    Directory of Open Access Journals (Sweden)

    Hang Gao

    2015-01-01

    Full Text Available Extreme learning machine (ELM has been extensively studied in machine learning community during the last few decades due to its high efficiency and the unification of classification, regression, and so forth. Though bearing such merits, existing ELM algorithms cannot efficiently handle the issue of missing data, which is relatively common in practical applications. The problem of missing data is commonly handled by imputation (i.e., replacing missing values with substituted values according to available information. However, imputation methods are not always effective. In this paper, we propose a sample-based learning framework to address this issue. Based on this framework, we develop two sample-based ELM algorithms for classification and regression, respectively. Comprehensive experiments have been conducted in synthetic data sets, UCI benchmark data sets, and a real world fingerprint image data set. As indicated, without introducing extra computational complexity, the proposed algorithms do more accurate and stable learning than other state-of-the-art ones, especially in the case of higher missing ratio.

  10. Unsupervised process monitoring and fault diagnosis with machine learning methods

    CERN Document Server

    Aldrich, Chris

    2013-01-01

    This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data

  11. Trip Travel Time Forecasting Based on Selective Forgetting Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Zhiming Gui

    2014-01-01

    Full Text Available Travel time estimation on road networks is a valuable traffic metric. In this paper, we propose a machine learning based method for trip travel time estimation in road networks. The method uses the historical trip information extracted from taxis trace data as the training data. An optimized online sequential extreme machine, selective forgetting extreme learning machine, is adopted to make the prediction. Its selective forgetting learning ability enables the prediction algorithm to adapt to trip conditions changes well. Experimental results using real-life taxis trace data show that the forecasting model provides an effective and practical way for the travel time forecasting.

  12. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    Directory of Open Access Journals (Sweden)

    Zekić-Sušac Marijana

    2014-09-01

    Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

  13. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods

    Directory of Open Access Journals (Sweden)

    Pontil Massimiliano

    2009-10-01

    Full Text Available Abstract Background Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots" at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. Results We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which ΔΔG ≥ 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. Conclusion We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been

  14. Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

    Science.gov (United States)

    Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

    2017-05-01

    Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.

  15. Different protein-protein interface patterns predicted by different machine learning methods.

    Science.gov (United States)

    Wang, Wei; Yang, Yongxiao; Yin, Jianxin; Gong, Xinqi

    2017-11-22

    Different types of protein-protein interactions make different protein-protein interface patterns. Different machine learning methods are suitable to deal with different types of data. Then, is it the same situation that different interface patterns are preferred for prediction by different machine learning methods? Here, four different machine learning methods were employed to predict protein-protein interface residue pairs on different interface patterns. The performances of the methods for different types of proteins are different, which suggest that different machine learning methods tend to predict different protein-protein interface patterns. We made use of ANOVA and variable selection to prove our result. Our proposed methods taking advantages of different single methods also got a good prediction result compared to single methods. In addition to the prediction of protein-protein interactions, this idea can be extended to other research areas such as protein structure prediction and design.

  16. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets.

    Science.gov (United States)

    Korotcov, Alexandru; Tkachenko, Valery; Russo, Daniel P; Ekins, Sean

    2017-12-04

    Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further

  17. A SEMI-AUTOMATIC RULE SET BUILDING METHOD FOR URBAN LAND COVER CLASSIFICATION BASED ON MACHINE LEARNING AND HUMAN KNOWLEDGE

    Directory of Open Access Journals (Sweden)

    H. Y. Gu

    2017-09-01

    Full Text Available Classification rule set is important for Land Cover classification, which refers to features and decision rules. The selection of features and decision are based on an iterative trial-and-error approach that is often utilized in GEOBIA, however, it is time-consuming and has a poor versatility. This study has put forward a rule set building method for Land cover classification based on human knowledge and machine learning. The use of machine learning is to build rule sets effectively which will overcome the iterative trial-and-error approach. The use of human knowledge is to solve the shortcomings of existing machine learning method on insufficient usage of prior knowledge, and improve the versatility of rule sets. A two-step workflow has been introduced, firstly, an initial rule is built based on Random Forest and CART decision tree. Secondly, the initial rule is analyzed and validated based on human knowledge, where we use statistical confidence interval to determine its threshold. The test site is located in Potsdam City. We utilised the TOP, DSM and ground truth data. The results show that the method could determine rule set for Land Cover classification semi-automatically, and there are static features for different land cover classes.

  18. Recent machine learning advancements in sensor-based mobility analysis: Deep learning for Parkinson's disease assessment.

    Science.gov (United States)

    Eskofier, Bjoern M; Lee, Sunghoon I; Daneault, Jean-Francois; Golabchi, Fatemeh N; Ferreira-Carvalho, Gabriela; Vergara-Diaz, Gloria; Sapienza, Stefano; Costante, Gianluca; Klucken, Jochen; Kautz, Thomas; Bonato, Paolo

    2016-08-01

    The development of wearable sensors has opened the door for long-term assessment of movement disorders. However, there is still a need for developing methods suitable to monitor motor symptoms in and outside the clinic. The purpose of this paper was to investigate deep learning as a method for this monitoring. Deep learning recently broke records in speech and image classification, but it has not been fully investigated as a potential approach to analyze wearable sensor data. We collected data from ten patients with idiopathic Parkinson's disease using inertial measurement units. Several motor tasks were expert-labeled and used for classification. We specifically focused on the detection of bradykinesia. For this, we compared standard machine learning pipelines with deep learning based on convolutional neural networks. Our results showed that deep learning outperformed other state-of-the-art machine learning algorithms by at least 4.6 % in terms of classification rate. We contribute a discussion of the advantages and disadvantages of deep learning for sensor-based movement assessment and conclude that deep learning is a promising method for this field.

  19. Research on machine learning framework based on random forest algorithm

    Science.gov (United States)

    Ren, Qiong; Cheng, Hui; Han, Hai

    2017-03-01

    With the continuous development of machine learning, industry and academia have released a lot of machine learning frameworks based on distributed computing platform, and have been widely used. However, the existing framework of machine learning is limited by the limitations of machine learning algorithm itself, such as the choice of parameters and the interference of noises, the high using threshold and so on. This paper introduces the research background of machine learning framework, and combined with the commonly used random forest algorithm in machine learning classification algorithm, puts forward the research objectives and content, proposes an improved adaptive random forest algorithm (referred to as ARF), and on the basis of ARF, designs and implements the machine learning framework.

  20. Advanced methods in NDE using machine learning approaches

    Science.gov (United States)

    Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

    2018-04-01

    Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability

  1. Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables.

    Science.gov (United States)

    Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco

    2018-03-01

    This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.

  2. Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches

    DEFF Research Database (Denmark)

    Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna

    2013-01-01

    machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region......-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods...

  3. Automatic vetting of planet candidates from ground based surveys: Machine learning with NGTS

    Science.gov (United States)

    Armstrong, David J.; Günther, Maximilian N.; McCormac, James; Smith, Alexis M. S.; Bayliss, Daniel; Bouchy, François; Burleigh, Matthew R.; Casewell, Sarah; Eigmüller, Philipp; Gillen, Edward; Goad, Michael R.; Hodgkin, Simon T.; Jenkins, James S.; Louden, Tom; Metrailler, Lionel; Pollacco, Don; Poppenhaeger, Katja; Queloz, Didier; Raynard, Liam; Rauer, Heike; Udry, Stéphane; Walker, Simon R.; Watson, Christopher A.; West, Richard G.; Wheatley, Peter J.

    2018-05-01

    State of the art exoplanet transit surveys are producing ever increasing quantities of data. To make the best use of this resource, in detecting interesting planetary systems or in determining accurate planetary population statistics, requires new automated methods. Here we describe a machine learning algorithm that forms an integral part of the pipeline for the NGTS transit survey, demonstrating the efficacy of machine learning in selecting planetary candidates from multi-night ground based survey data. Our method uses a combination of random forests and self-organising-maps to rank planetary candidates, achieving an AUC score of 97.6% in ranking 12368 injected planets against 27496 false positives in the NGTS data. We build on past examples by using injected transit signals to form a training set, a necessary development for applying similar methods to upcoming surveys. We also make the autovet code used to implement the algorithm publicly accessible. autovet is designed to perform machine learned vetting of planetary candidates, and can utilise a variety of methods. The apparent robustness of machine learning techniques, whether on space-based or the qualitatively different ground-based data, highlights their importance to future surveys such as TESS and PLATO and the need to better understand their advantages and pitfalls in an exoplanetary context.

  4. Robust Visual Knowledge Transfer via Extreme Learning Machine Based Domain Adaptation.

    Science.gov (United States)

    Zhang, Lei; Zhang, David

    2016-08-10

    We address the problem of visual knowledge adaptation by leveraging labeled patterns from source domain and a very limited number of labeled instances in target domain to learn a robust classifier for visual categorization. This paper proposes a new extreme learning machine based cross-domain network learning framework, that is called Extreme Learning Machine (ELM) based Domain Adaptation (EDA). It allows us to learn a category transformation and an ELM classifier with random projection by minimizing the -norm of the network output weights and the learning error simultaneously. The unlabeled target data, as useful knowledge, is also integrated as a fidelity term to guarantee the stability during cross domain learning. It minimizes the matching error between the learned classifier and a base classifier, such that many existing classifiers can be readily incorporated as base classifiers. The network output weights cannot only be analytically determined, but also transferrable. Additionally, a manifold regularization with Laplacian graph is incorporated, such that it is beneficial to semi-supervised learning. Extensively, we also propose a model of multiple views, referred as MvEDA. Experiments on benchmark visual datasets for video event recognition and object recognition, demonstrate that our EDA methods outperform existing cross-domain learning methods.

  5. Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Zhiling Guo

    2016-03-01

    Full Text Available In this study, a method based on supervised machine learning is proposed to identify village buildings from open high-resolution remote sensing images. We select Google Earth (GE RGB images to perform the classification in order to examine its suitability for village mapping, and investigate the feasibility of using machine learning methods to provide automatic classification in such fields. By analyzing the characteristics of GE images, we design different features on the basis of two kinds of supervised machine learning methods for classification: adaptive boosting (AdaBoost and convolutional neural networks (CNN. To recognize village buildings via their color and texture information, the RGB color features and a large number of Haar-like features in a local window are utilized in the AdaBoost method; with multilayer trained networks based on gradient descent algorithms and back propagation, CNN perform the identification by mining deeper information from buildings and their neighborhood. Experimental results from the testing area at Savannakhet province in Laos show that our proposed AdaBoost method achieves an overall accuracy of 96.22% and the CNN method is also competitive with an overall accuracy of 96.30%.

  6. Machine Learning Method Applied in Readout System of Superheated Droplet Detector

    Science.gov (United States)

    Liu, Yi; Sullivan, Clair Julia; d'Errico, Francesco

    2017-07-01

    Direct readability is one advantage of superheated droplet detectors in neutron dosimetry. Utilizing such a distinct characteristic, an imaging readout system analyzes image of the detector for neutron dose readout. To improve the accuracy and precision of algorithms in the imaging readout system, machine learning algorithms were developed. Deep learning neural network and support vector machine algorithms are applied and compared with generally used Hough transform and curvature analysis methods. The machine learning methods showed a much higher accuracy and better precision in recognizing circular gas bubbles.

  7. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    Directory of Open Access Journals (Sweden)

    Vladimir S. Kublanov

    2017-01-01

    Full Text Available The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components.

  8. Supervised machine learning algorithms to diagnose stress for vehicle drivers based on physiological sensor signals.

    Science.gov (United States)

    Barua, Shaibal; Begum, Shahina; Ahmed, Mobyen Uddin

    2015-01-01

    Machine learning algorithms play an important role in computer science research. Recent advancement in sensor data collection in clinical sciences lead to a complex, heterogeneous data processing, and analysis for patient diagnosis and prognosis. Diagnosis and treatment of patients based on manual analysis of these sensor data are difficult and time consuming. Therefore, development of Knowledge-based systems to support clinicians in decision-making is important. However, it is necessary to perform experimental work to compare performances of different machine learning methods to help to select appropriate method for a specific characteristic of data sets. This paper compares classification performance of three popular machine learning methods i.e., case-based reasoning, neutral networks and support vector machine to diagnose stress of vehicle drivers using finger temperature and heart rate variability. The experimental results show that case-based reasoning outperforms other two methods in terms of classification accuracy. Case-based reasoning has achieved 80% and 86% accuracy to classify stress using finger temperature and heart rate variability. On contrary, both neural network and support vector machine have achieved less than 80% accuracy by using both physiological signals.

  9. IoT Security Techniques Based on Machine Learning

    OpenAIRE

    Xiao, Liang; Wan, Xiaoyue; Lu, Xiaozhen; Zhang, Yanyong; Wu, Di

    2018-01-01

    Internet of things (IoT) that integrate a variety of devices into networks to provide advanced and intelligent services have to protect user privacy and address attacks such as spoofing attacks, denial of service attacks, jamming and eavesdropping. In this article, we investigate the attack model for IoT systems, and review the IoT security solutions based on machine learning techniques including supervised learning, unsupervised learning and reinforcement learning. We focus on the machine le...

  10. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior......Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...

  11. Research progress in machine learning methods for gene-gene interaction detection.

    Science.gov (United States)

    Peng, Zhe-Ye; Tang, Zi-Jun; Xie, Min-Zhu

    2018-03-20

    Complex diseases are results of gene-gene and gene-environment interactions. However, the detection of high-dimensional gene-gene interactions is computationally challenging. In the last two decades, machine-learning approaches have been developed to detect gene-gene interactions with some successes. In this review, we summarize the progress in research on machine learning methods, as applied to gene-gene interaction detection. It systematically examines the principles and limitations of the current machine learning methods used in genome wide association studies (GWAS) to detect gene-gene interactions, such as neural networks (NN), random forest (RF), support vector machines (SVM) and multifactor dimensionality reduction (MDR), and provides some insights on the future research directions in the field.

  12. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

    Science.gov (United States)

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-06-19

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.

  13. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  14. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  15. Improved Extreme Learning Machine based on the Sensitivity Analysis

    Science.gov (United States)

    Cui, Licheng; Zhai, Huawei; Wang, Benchao; Qu, Zengtang

    2018-03-01

    Extreme learning machine and its improved ones is weak in some points, such as computing complex, learning error and so on. After deeply analyzing, referencing the importance of hidden nodes in SVM, an novel analyzing method of the sensitivity is proposed which meets people’s cognitive habits. Based on these, an improved ELM is proposed, it could remove hidden nodes before meeting the learning error, and it can efficiently manage the number of hidden nodes, so as to improve the its performance. After comparing tests, it is better in learning time, accuracy and so on.

  16. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  17. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.

    Science.gov (United States)

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi; Salleh, Abdul Hakim Mohamed

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.

  18. A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology

    Directory of Open Access Journals (Sweden)

    Ching Lee Koo

    2013-01-01

    Full Text Available Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs, support vector machine (SVM, and random forests (RFs in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.

  19. Surface electromyography based muscle fatigue detection using high-resolution time-frequency methods and machine learning algorithms.

    Science.gov (United States)

    Karthick, P A; Ghosh, Diptasree Maitra; Ramakrishnan, S

    2018-02-01

    Surface electromyography (sEMG) based muscle fatigue research is widely preferred in sports science and occupational/rehabilitation studies due to its noninvasiveness. However, these signals are complex, multicomponent and highly nonstationary with large inter-subject variations, particularly during dynamic contractions. Hence, time-frequency based machine learning methodologies can improve the design of automated system for these signals. In this work, the analysis based on high-resolution time-frequency methods, namely, Stockwell transform (S-transform), B-distribution (BD) and extended modified B-distribution (EMBD) are proposed to differentiate the dynamic muscle nonfatigue and fatigue conditions. The nonfatigue and fatigue segments of sEMG signals recorded from the biceps brachii of 52 healthy volunteers are preprocessed and subjected to S-transform, BD and EMBD. Twelve features are extracted from each method and prominent features are selected using genetic algorithm (GA) and binary particle swarm optimization (BPSO). Five machine learning algorithms, namely, naïve Bayes, support vector machine (SVM) of polynomial and radial basis kernel, random forest and rotation forests are used for the classification. The results show that all the proposed time-frequency distributions (TFDs) are able to show the nonstationary variations of sEMG signals. Most of the features exhibit statistically significant difference in the muscle fatigue and nonfatigue conditions. The maximum number of features (66%) is reduced by GA and BPSO for EMBD and BD-TFD respectively. The combination of EMBD- polynomial kernel based SVM is found to be most accurate (91% accuracy) in classifying the conditions with the features selected using GA. The proposed methods are found to be capable of handling the nonstationary and multicomponent variations of sEMG signals recorded in dynamic fatiguing contractions. Particularly, the combination of EMBD- polynomial kernel based SVM could be used to

  20. Component Pin Recognition Using Algorithms Based on Machine Learning

    Science.gov (United States)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  1. An Android malware detection system based on machine learning

    Science.gov (United States)

    Wen, Long; Yu, Haiyang

    2017-08-01

    The Android smartphone, with its open source character and excellent performance, has attracted many users. However, the convenience of the Android platform also has motivated the development of malware. The traditional method which detects the malware based on the signature is unable to detect unknown applications. The article proposes a machine learning-based lightweight system that is capable of identifying malware on Android devices. In this system we extract features based on the static analysis and the dynamitic analysis, then a new feature selection approach based on principle component analysis (PCA) and relief are presented in the article to decrease the dimensions of the features. After that, a model will be constructed with support vector machine (SVM) for classification. Experimental results show that our system provides an effective method in Android malware detection.

  2. A Comparison Study of Machine Learning Based Algorithms for Fatigue Crack Growth Calculation.

    Science.gov (United States)

    Wang, Hongxun; Zhang, Weifang; Sun, Fuqiang; Zhang, Wei

    2017-05-18

    The relationships between the fatigue crack growth rate ( d a / d N ) and stress intensity factor range ( Δ K ) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are diverse in different materials. However, most existing fatigue crack growth models cannot handle these nonlinearities appropriately. The machine learning method provides a flexible approach to the modeling of fatigue crack growth because of its excellent nonlinear approximation and multivariable learning ability. In this paper, a fatigue crack growth calculation method is proposed based on three different machine learning algorithms (MLAs): extreme learning machine (ELM), radial basis function network (RBFN) and genetic algorithms optimized back propagation network (GABP). The MLA based method is validated using testing data of different materials. The three MLAs are compared with each other as well as the classical two-parameter model ( K * approach). The results show that the predictions of MLAs are superior to those of K * approach in accuracy and effectiveness, and the ELM based algorithms show overall the best agreement with the experimental data out of the three MLAs, for its global optimization and extrapolation ability.

  3. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  4. Machine Learning Methods for Identifying Composition of Uranium Deposits in Kazakhstan

    Directory of Open Access Journals (Sweden)

    Kuchin Yan

    2017-12-01

    Full Text Available The paper explores geophysical methods of wells survey, as well as their role in the development of Kazakhstan’s uranium deposit mining efforts. An analysis of the existing methods for solving the problem of interpreting geophysical data using machine learning in petroleum geophysics is made. The requirements and possible applications of machine learning methods in regard to uranium deposits of Kazakhstan are formulated in the paper.

  5. Considerations upon the Machine Learning Technologies

    OpenAIRE

    Alin Munteanu; Cristina Ofelia Sofran

    2006-01-01

    Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  6. Machine Learning-Empowered Biometric Methods for Biomedicine Applications

    Directory of Open Access Journals (Sweden)

    Qingxue Zhang

    2017-07-01

    Full Text Available Nowadays, pervasive computing technologies are paving a promising way for advanced smart health applications. However, a key impediment faced by wide deployment of these assistive smart devices, is the increasing privacy and security issue, such as how to protect access to sensitive patient data in the health record. Focusing on this challenge, biometrics are attracting intense attention in terms of effective user identification to enable confidential health applications. In this paper, we take special interest in two bio-potential-based biometric modalities, electrocardiogram (ECG and electroencephalogram (EEG, considering that they are both unique to individuals, and more reliable than token (identity card and knowledge-based (username/password methods. After extracting effective features in multiple domains from ECG/EEG signals, several advanced machine learning algorithms are introduced to perform the user identification task, including Neural Network, K-nearest Neighbor, Bagging, Random Forest and AdaBoost. Experimental results on two public ECG and EEG datasets show that ECG is a more robust biometric modality compared to EEG, leveraging a higher signal to noise ratio and also more distinguishable morphological patterns. Among different machine learning classifiers, the random forest greatly outperforms the others and owns an identification rate as high as 98%. This study is expected to demonstrate that properly selected biometric empowered by an effective machine learner owns a great potential, to enable confidential biomedicine applications in the era of smart digital health.

  7. Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

    Science.gov (United States)

    Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

    2010-10-01

    Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.

  8. Considerations upon the Machine Learning Technologies

    Directory of Open Access Journals (Sweden)

    Alin Munteanu

    2006-01-01

    Full Text Available Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  9. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  10. An Extreme Learning Machine Based on the Mixed Kernel Function of Triangular Kernel and Generalized Hermite Dirichlet Kernel

    Directory of Open Access Journals (Sweden)

    Senyue Zhang

    2016-01-01

    Full Text Available According to the characteristics that the kernel function of extreme learning machine (ELM and its performance have a strong correlation, a novel extreme learning machine based on a generalized triangle Hermitian kernel function was proposed in this paper. First, the generalized triangle Hermitian kernel function was constructed by using the product of triangular kernel and generalized Hermite Dirichlet kernel, and the proposed kernel function was proved as a valid kernel function of extreme learning machine. Then, the learning methodology of the extreme learning machine based on the proposed kernel function was presented. The biggest advantage of the proposed kernel is its kernel parameter values only chosen in the natural numbers, which thus can greatly shorten the computational time of parameter optimization and retain more of its sample data structure information. Experiments were performed on a number of binary classification, multiclassification, and regression datasets from the UCI benchmark repository. The experiment results demonstrated that the robustness and generalization performance of the proposed method are outperformed compared to other extreme learning machines with different kernels. Furthermore, the learning speed of proposed method is faster than support vector machine (SVM methods.

  11. Functional networks inference from rule-based machine learning models.

    Science.gov (United States)

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The

  12. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    Science.gov (United States)

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  13. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    Science.gov (United States)

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  14. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  15. Machine learning and medical imaging

    CERN Document Server

    Shen, Dinggang; Sabuncu, Mert

    2016-01-01

    Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, a...

  16. An efficient flow-based botnet detection using supervised machine learning

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2014-01-01

    Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper...... introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs...... to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates...

  17. Rule based systems for big data a machine learning approach

    CERN Document Server

    Liu, Han; Cocea, Mihaela

    2016-01-01

    The ideas introduced in this book explore the relationships among rule based systems, machine learning and big data. Rule based systems are seen as a special type of expert systems, which can be built by using expert knowledge or learning from real data. The book focuses on the development and evaluation of rule based systems in terms of accuracy, efficiency and interpretability. In particular, a unified framework for building rule based systems, which consists of the operations of rule generation, rule simplification and rule representation, is presented. Each of these operations is detailed using specific methods or techniques. In addition, this book also presents some ensemble learning frameworks for building ensemble rule based systems.

  18. Classification of older adults with/without a fall history using machine learning methods.

    Science.gov (United States)

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  19. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    Science.gov (United States)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  20. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

    OpenAIRE

    Weng, Wei-Hung; Wagholikar, Kavishwar B.; McCray, Alexa T.; Szolovits, Peter; Chueh, Henry C.

    2017-01-01

    Background The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. Methods We constructed the pipeline using the clinical ...

  1. Evaluation of Machine Learning Methods for LHC Optics Measurements and Corrections Software

    CERN Document Server

    AUTHOR|(CDS)2206853; Henning, Peter

    The field of artificial intelligence is driven by the goal to provide machines with human-like intelligence. However modern science is currently facing problems with high complexity that cannot be solved by humans in the same timescale as by machines. Therefore there is a demand on automation of complex tasks. To identify the category of tasks which can be performed by machines in the domain of optics measurements and correction on the Large Hadron Collider (LHC) is one of the central research subjects of this thesis. The application of machine learning methods and concepts of artificial intelligence can be found in various industry and scientific branches. In High Energy Physics these concepts are mostly used in offline analysis of experiments data and to perform regression tasks. In Accelerator Physics the machine learning approach has not found a wide application yet. Therefore potential tasks for machine learning solutions can be specified in this domain. The appropriate methods and their suitability for...

  2. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Science.gov (United States)

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Machine Learning Applications to Resting-State Functional MR Imaging Analysis.

    Science.gov (United States)

    Billings, John M; Eder, Maxwell; Flood, William C; Dhami, Devendra Singh; Natarajan, Sriraam; Whitlow, Christopher T

    2017-11-01

    Machine learning is one of the most exciting and rapidly expanding fields within computer science. Academic and commercial research entities are investing in machine learning methods, especially in personalized medicine via patient-level classification. There is great promise that machine learning methods combined with resting state functional MR imaging will aid in diagnosis of disease and guide potential treatment for conditions thought to be impossible to identify based on imaging alone, such as psychiatric disorders. We discuss machine learning methods and explore recent advances. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Building Customer Churn Prediction Models in Fitness Industry with Machine Learning Methods

    OpenAIRE

    Shan, Min

    2017-01-01

    With the rapid growth of digital systems, churn management has become a major focus within customer relationship management in many industries. Ample research has been conducted for churn prediction in different industries with various machine learning methods. This thesis aims to combine feature selection and supervised machine learning methods for defining models of churn prediction and apply them on fitness industry. Forward selection is chosen as feature selection methods. Support Vector ...

  5. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  6. Machine Learning Methods for Attack Detection in the Smart Grid.

    Science.gov (United States)

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  7. Machine learning methods without tears: a primer for ecologists.

    Science.gov (United States)

    Olden, Julian D; Lawler, Joshua J; Poff, N LeRoy

    2008-06-01

    Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.

  8. Machine learning in cardiovascular medicine: are we there yet?

    Science.gov (United States)

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Mlifdect: Android Malware Detection Based on Parallel Machine Learning and Information Fusion

    Directory of Open Access Journals (Sweden)

    Xin Wang

    2017-01-01

    Full Text Available In recent years, Android malware has continued to grow at an alarming rate. More recent malicious apps’ employing highly sophisticated detection avoidance techniques makes the traditional machine learning based malware detection methods far less effective. More specifically, they cannot cope with various types of Android malware and have limitation in detection by utilizing a single classification algorithm. To address this limitation, we propose a novel approach in this paper that leverages parallel machine learning and information fusion techniques for better Android malware detection, which is named Mlifdect. To implement this approach, we first extract eight types of features from static analysis on Android apps and build two kinds of feature sets after feature selection. Then, a parallel machine learning detection model is developed for speeding up the process of classification. Finally, we investigate the probability analysis based and Dempster-Shafer theory based information fusion approaches which can effectively obtain the detection results. To validate our method, other state-of-the-art detection works are selected for comparison with real-world Android apps. The experimental results demonstrate that Mlifdect is capable of achieving higher detection accuracy as well as a remarkable run-time efficiency compared to the existing malware detection solutions.

  10. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  11. Extreme Learning Machine and Moving Least Square Regression Based Solar Panel Vision Inspection

    Directory of Open Access Journals (Sweden)

    Heng Liu

    2017-01-01

    Full Text Available In recent years, learning based machine intelligence has aroused a lot of attention across science and engineering. Particularly in the field of automatic industry inspection, the machine learning based vision inspection plays a more and more important role in defect identification and feature extraction. Through learning from image samples, many features of industry objects, such as shapes, positions, and orientations angles, can be obtained and then can be well utilized to determine whether there is defect or not. However, the robustness and the quickness are not easily achieved in such inspection way. In this work, for solar panel vision inspection, we present an extreme learning machine (ELM and moving least square regression based approach to identify solder joint defect and detect the panel position. Firstly, histogram peaks distribution (HPD and fractional calculus are applied for image preprocessing. Then an ELM-based defective solder joints identification is discussed in detail. Finally, moving least square regression (MLSR algorithm is introduced for solar panel position determination. Experimental results and comparisons show that the proposed ELM and MLSR based inspection method is efficient not only in detection accuracy but also in processing speed.

  12. Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model

    Science.gov (United States)

    Pathak, Jaideep; Wikner, Alexander; Fussell, Rebeckah; Chandra, Sarthak; Hunt, Brian R.; Girvan, Michelle; Ott, Edward

    2018-04-01

    A model-based approach to forecasting chaotic dynamical systems utilizes knowledge of the mechanistic processes governing the dynamics to build an approximate mathematical model of the system. In contrast, machine learning techniques have demonstrated promising results for forecasting chaotic systems purely from past time series measurements of system state variables (training data), without prior knowledge of the system dynamics. The motivation for this paper is the potential of machine learning for filling in the gaps in our underlying mechanistic knowledge that cause widely-used knowledge-based models to be inaccurate. Thus, we here propose a general method that leverages the advantages of these two approaches by combining a knowledge-based model and a machine learning technique to build a hybrid forecasting scheme. Potential applications for such an approach are numerous (e.g., improving weather forecasting). We demonstrate and test the utility of this approach using a particular illustrative version of a machine learning known as reservoir computing, and we apply the resulting hybrid forecaster to a low-dimensional chaotic system, as well as to a high-dimensional spatiotemporal chaotic system. These tests yield extremely promising results in that our hybrid technique is able to accurately predict for a much longer period of time than either its machine-learning component or its model-based component alone.

  13. Airline Passenger Profiling Based on Fuzzy Deep Machine Learning.

    Science.gov (United States)

    Zheng, Yu-Jun; Sheng, Wei-Guo; Sun, Xing-Ming; Chen, Sheng-Yong

    2017-12-01

    Passenger profiling plays a vital part of commercial aviation security, but classical methods become very inefficient in handling the rapidly increasing amounts of electronic records. This paper proposes a deep learning approach to passenger profiling. The center of our approach is a Pythagorean fuzzy deep Boltzmann machine (PFDBM), whose parameters are expressed by Pythagorean fuzzy numbers such that each neuron can learn how a feature affects the production of the correct output from both the positive and negative sides. We propose a hybrid algorithm combining a gradient-based method and an evolutionary algorithm for training the PFDBM. Based on the novel learning model, we develop a deep neural network (DNN) for classifying normal passengers and potential attackers, and further develop an integrated DNN for identifying group attackers whose individual features are insufficient to reveal the abnormality. Experiments on data sets from Air China show that our approach provides much higher learning ability and classification accuracy than existing profilers. It is expected that the fuzzy deep learning approach can be adapted for a variety of complex pattern analysis tasks.

  14. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  15. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  16. Fault Diagnosis for Engine Based on Single-Stage Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Fei Gao

    2016-01-01

    Full Text Available Single-Stage Extreme Learning Machine (SS-ELM is presented to dispose of the mechanical fault diagnosis in this paper. Based on it, the traditional mapping type of extreme learning machine (ELM has been changed and the eigenvectors extracted from signal processing methods are directly regarded as outputs of the network’s hidden layer. Then the uncertainty that training data transformed from the input space to the ELM feature space with the ELM mapping and problem of the selection of the hidden nodes are avoided effectively. The experiment results of diesel engine fault diagnosis show good performance of the SS-ELM algorithm.

  17. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  18. PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research.

    Science.gov (United States)

    Koul, Atesh; Becchio, Cristina; Cavallo, Andrea

    2017-12-12

    Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox - "PredPsych" - that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

  19. Machine learning and radiology.

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  20. Machine Learning Based Localization and Classification with Atomic Magnetometers

    Science.gov (United States)

    Deans, Cameron; Griffin, Lewis D.; Marmugi, Luca; Renzoni, Ferruccio

    2018-01-01

    We demonstrate identification of position, material, orientation, and shape of objects imaged by a Rb 85 atomic magnetometer performing electromagnetic induction imaging supported by machine learning. Machine learning maximizes the information extracted from the images created by the magnetometer, demonstrating the use of hidden data. Localization 2.6 times better than the spatial resolution of the imaging system and successful classification up to 97% are obtained. This circumvents the need of solving the inverse problem and demonstrates the extension of machine learning to diffusive systems, such as low-frequency electrodynamics in media. Automated collection of task-relevant information from quantum-based electromagnetic imaging will have a relevant impact from biomedicine to security.

  1. Comparisons of likelihood and machine learning methods of individual classification

    Science.gov (United States)

    Guinand, B.; Topchy, A.; Page, K.S.; Burnham-Curtis, M. K.; Punch, W.F.; Scribner, K.T.

    2002-01-01

    Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin (“assignment tests”). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high FST), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0–2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to “learn” and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks. In recent years, characterization of highly polymorphic molecular markers such as mini- and microsatellites and development of novel methods of analysis have enabled researchers to extend investigations of ecological and evolutionary processes below the population level to the level of

  2. The influence of negative training set size on machine learning-based virtual screening.

    Science.gov (United States)

    Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J

    2014-01-01

    The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.

  3. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    Directory of Open Access Journals (Sweden)

    Philippe Burlina

    Full Text Available To evaluate the use of ultrasound coupled with machine learning (ML and deep learning (DL techniques for automated or semi-automated classification of myositis.Eighty subjects comprised of 19 with inclusion body myositis (IBM, 14 with polymyositis (PM, 14 with dermatomyositis (DM, and 33 normal (N subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally were acquired. We considered three problems of classification including (A normal vs. affected (DM, PM, IBM; (B normal vs. IBM patients; and (C IBM vs. other types of myositis (DM or PM. We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification.The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A, 86.6% ± 2.4% for (B and 74.8% ± 3.9% for (C, while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A, 84.3% ± 2.3% for (B and 68.9% ± 2.5% for (C.This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  4. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods.

    Science.gov (United States)

    Burlina, Philippe; Billings, Seth; Joshi, Neil; Albayda, Jemima

    2017-01-01

    To evaluate the use of ultrasound coupled with machine learning (ML) and deep learning (DL) techniques for automated or semi-automated classification of myositis. Eighty subjects comprised of 19 with inclusion body myositis (IBM), 14 with polymyositis (PM), 14 with dermatomyositis (DM), and 33 normal (N) subjects were included in this study, where 3214 muscle ultrasound images of 7 muscles (observed bilaterally) were acquired. We considered three problems of classification including (A) normal vs. affected (DM, PM, IBM); (B) normal vs. IBM patients; and (C) IBM vs. other types of myositis (DM or PM). We studied the use of an automated DL method using deep convolutional neural networks (DL-DCNNs) for diagnostic classification and compared it with a semi-automated conventional ML method based on random forests (ML-RF) and "engineered" features. We used the known clinical diagnosis as the gold standard for evaluating performance of muscle classification. The performance of the DL-DCNN method resulted in accuracies ± standard deviation of 76.2% ± 3.1% for problem (A), 86.6% ± 2.4% for (B) and 74.8% ± 3.9% for (C), while the ML-RF method led to accuracies of 72.3% ± 3.3% for problem (A), 84.3% ± 2.3% for (B) and 68.9% ± 2.5% for (C). This study demonstrates the application of machine learning methods for automatically or semi-automatically classifying inflammatory muscle disease using muscle ultrasound. Compared to the conventional random forest machine learning method used here, which has the drawback of requiring manual delineation of muscle/fat boundaries, DCNN-based classification by and large improved the accuracies in all classification problems while providing a fully automated approach to classification.

  5. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    Science.gov (United States)

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie

  6. Introduction to machine learning

    OpenAIRE

    Baştanlar, Yalın; Özuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning app...

  7. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

    Science.gov (United States)

    Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546

  8. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach.

    Science.gov (United States)

    Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

  9. Classification of follicular lymphoma images: a holistic approach with symbol-based machine learning methods.

    Science.gov (United States)

    Zorman, Milan; Sánchez de la Rosa, José Luis; Dinevski, Dejan

    2011-12-01

    It is not very often to see a symbol-based machine learning approach to be used for the purpose of image classification and recognition. In this paper we will present such an approach, which we first used on the follicular lymphoma images. Lymphoma is a broad term encompassing a variety of cancers of the lymphatic system. Lymphoma is differentiated by the type of cell that multiplies and how the cancer presents itself. It is very important to get an exact diagnosis regarding lymphoma and to determine the treatments that will be most effective for the patient's condition. Our work was focused on the identification of lymphomas by finding follicles in microscopy images provided by the Laboratory of Pathology in the University Hospital of Tenerife, Spain. We divided our work in two stages: in the first stage we did image pre-processing and feature extraction, and in the second stage we used different symbolic machine learning approaches for pixel classification. Symbolic machine learning approaches are often neglected when looking for image analysis tools. They are not only known for a very appropriate knowledge representation, but also claimed to lack computational power. The results we got are very promising and show that symbolic approaches can be successful in image analysis applications.

  10. Predicting Solar Activity Using Machine-Learning Methods

    Science.gov (United States)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  11. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  12. Deep learning: Using machine learning to study biological vision

    OpenAIRE

    Majaj, Najib; Pelli, Denis

    2017-01-01

    Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.

  13. A deep learning-based multi-model ensemble method for cancer prediction.

    Science.gov (United States)

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Teraflop-scale Incremental Machine Learning

    OpenAIRE

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-u...

  15. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  16. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  17. Advanced Cell Classifier: User-Friendly Machine-Learning-Based Software for Discovering Phenotypes in High-Content Imaging Data.

    Science.gov (United States)

    Piccinini, Filippo; Balassa, Tamas; Szkalisity, Abel; Molnar, Csaba; Paavolainen, Lassi; Kujala, Kaisa; Buzas, Krisztina; Sarazova, Marie; Pietiainen, Vilja; Kutay, Ulrike; Smith, Kevin; Horvath, Peter

    2017-06-28

    High-content, imaging-based screens now routinely generate data on a scale that precludes manual verification and interrogation. Software applying machine learning has become an essential tool to automate analysis, but these methods require annotated examples to learn from. Efficiently exploring large datasets to find relevant examples remains a challenging bottleneck. Here, we present Advanced Cell Classifier (ACC), a graphical software package for phenotypic analysis that addresses these difficulties. ACC applies machine-learning and image-analysis methods to high-content data generated by large-scale, cell-based experiments. It features methods to mine microscopic image data, discover new phenotypes, and improve recognition performance. We demonstrate that these features substantially expedite the training process, successfully uncover rare phenotypes, and improve the accuracy of the analysis. ACC is extensively documented, designed to be user-friendly for researchers without machine-learning expertise, and distributed as a free open-source tool at www.cellclassifier.org. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Arctic Sea Ice Thickness Estimation from CryoSat-2 Satellite Data Using Machine Learning-Based Lead Detection

    Directory of Open Access Journals (Sweden)

    Sanggyun Lee

    2016-08-01

    Full Text Available Satellite altimeters have been used to monitor Arctic sea ice thickness since the early 2000s. In order to estimate sea ice thickness from satellite altimeter data, leads (i.e., cracks between ice floes should first be identified for the calculation of sea ice freeboard. In this study, we proposed novel approaches for lead detection using two machine learning algorithms: decision trees and random forest. CryoSat-2 satellite data collected in March and April of 2011–2014 over the Arctic region were used to extract waveform parameters that show the characteristics of leads, ice floes and ocean, including stack standard deviation, stack skewness, stack kurtosis, pulse peakiness and backscatter sigma-0. The parameters were used to identify leads in the machine learning models. Results show that the proposed approaches, with overall accuracy >90%, produced much better performance than existing lead detection methods based on simple thresholding approaches. Sea ice thickness estimated based on the machine learning-detected leads was compared to the averaged Airborne Electromagnetic (AEM-bird data collected over two days during the CryoSat Validation experiment (CryoVex field campaign in April 2011. This comparison showed that the proposed machine learning methods had better performance (up to r = 0.83 and Root Mean Square Error (RMSE = 0.29 m compared to thickness estimation based on existing lead detection methods (RMSE = 0.86–0.93 m. Sea ice thickness based on the machine learning approaches showed a consistent decline from 2011–2013 and rebounded in 2014.

  19. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  20. Kernel methods for interpretable machine learning of order parameters

    Science.gov (United States)

    Ponte, Pedro; Melko, Roger G.

    2017-11-01

    Machine learning is capable of discriminating phases of matter, and finding associated phase transitions, directly from large data sets of raw state configurations. In the context of condensed matter physics, most progress in the field of supervised learning has come from employing neural networks as classifiers. Although very powerful, such algorithms suffer from a lack of interpretability, which is usually desired in scientific applications in order to associate learned features with physical phenomena. In this paper, we explore support vector machines (SVMs), which are a class of supervised kernel methods that provide interpretable decision functions. We find that SVMs can learn the mathematical form of physical discriminators, such as order parameters and Hamiltonian constraints, for a set of two-dimensional spin models: the ferromagnetic Ising model, a conserved-order-parameter Ising model, and the Ising gauge theory. The ability of SVMs to provide interpretable classification highlights their potential for automating feature detection in both synthetic and experimental data sets for condensed matter and other many-body systems.

  1. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  2. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    Science.gov (United States)

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Prototype-based models in machine learning.

    Science.gov (United States)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We discuss basic schemes of competitive vector quantization as well as the so-called neural gas approach and Kohonen's topology-preserving self-organizing map. Supervised learning in prototype systems is exemplified in terms of learning vector quantization. Most frequently, the familiar Euclidean distance serves as a dissimilarity measure. We present extensions of the framework to nonstandard measures and give an introduction to the use of adaptive distances in relevance learning. © 2016 Wiley Periodicals, Inc.

  4. Studying depression using imaging and machine learning methods

    OpenAIRE

    Patel, Meenal J.; Khalaf, Alexander; Aizenstein, Howard J.

    2015-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presen...

  5. Estimating building energy consumption using extreme learning machine method

    International Nuclear Information System (INIS)

    Naji, Sareh; Keivani, Afram; Shamshirband, Shahaboddin; Alengaram, U. Johnson; Jumaat, Mohd Zamin; Mansor, Zulkefli; Lee, Malrey

    2016-01-01

    The current energy requirements of buildings comprise a large percentage of the total energy consumed around the world. The demand of energy, as well as the construction materials used in buildings, are becoming increasingly problematic for the earth's sustainable future, and thus have led to alarming concern. The energy efficiency of buildings can be improved, and in order to do so, their operational energy usage should be estimated early in the design phase, so that buildings are as sustainable as possible. An early energy estimate can greatly help architects and engineers create sustainable structures. This study proposes a novel method to estimate building energy consumption based on the ELM (Extreme Learning Machine) method. This method is applied to building material thicknesses and their thermal insulation capability (K-value). For this purpose up to 180 simulations are carried out for different material thicknesses and insulation properties, using the EnergyPlus software application. The estimation and prediction obtained by the ELM model are compared with GP (genetic programming) and ANNs (artificial neural network) models for accuracy. The simulation results indicate that an improvement in predictive accuracy is achievable with the ELM approach in comparison with GP and ANN. - Highlights: • Buildings consume huge amounts of energy for operation. • Envelope materials and insulation influence building energy consumption. • Extreme learning machine is used to estimate energy usage of a sample building. • The key effective factors in this study are insulation thickness and K-value.

  6. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  7. Pressure Prediction of Coal Slurry Transportation Pipeline Based on Particle Swarm Optimization Kernel Function Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Xue-cun Yang

    2015-01-01

    Full Text Available For coal slurry pipeline blockage prediction problem, through the analysis of actual scene, it is determined that the pressure prediction from each measuring point is the premise of pipeline blockage prediction. Kernel function of support vector machine is introduced into extreme learning machine, the parameters are optimized by particle swarm algorithm, and blockage prediction method based on particle swarm optimization kernel function extreme learning machine (PSOKELM is put forward. The actual test data from HuangLing coal gangue power plant are used for simulation experiments and compared with support vector machine prediction model optimized by particle swarm algorithm (PSOSVM and kernel function extreme learning machine prediction model (KELM. The results prove that mean square error (MSE for the prediction model based on PSOKELM is 0.0038 and the correlation coefficient is 0.9955, which is superior to prediction model based on PSOSVM in speed and accuracy and superior to KELM prediction model in accuracy.

  8. Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

    Science.gov (United States)

    Karim, Mohammad Ehsanul; Pang, Menglan; Platt, Robert W

    2018-03-01

    The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

  9. Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning.

    Science.gov (United States)

    van Ginneken, Bram

    2017-03-01

    Half a century ago, the term "computer-aided diagnosis" (CAD) was introduced in the scientific literature. Pulmonary imaging, with chest radiography and computed tomography, has always been one of the focus areas in this field. In this study, I describe how machine learning became the dominant technology for tackling CAD in the lungs, generally producing better results than do classical rule-based approaches, and how the field is now rapidly changing: in the last few years, we have seen how even better results can be obtained with deep learning. The key differences among rule-based processing, machine learning, and deep learning are summarized and illustrated for various applications of CAD in the chest.

  10. Kernel Methods for Machine Learning with Life Science Applications

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie

    Kernel methods refer to a family of widely used nonlinear algorithms for machine learning tasks like classification, regression, and feature extraction. By exploiting the so-called kernel trick straightforward extensions of classical linear algorithms are enabled as long as the data only appear a...

  11. Discrimination of Rock Fracture and Blast Events Based on Signal Complexity and Machine Learning

    Directory of Open Access Journals (Sweden)

    Zilong Zhou

    2018-01-01

    Full Text Available The automatic discrimination of rock fracture and blast events is complex and challenging due to the similar waveform characteristics. To solve this problem, a new method based on the signal complexity analysis and machine learning has been proposed in this paper. First, the permutation entropy values of signals at different scale factors are calculated to reflect complexity of signals and constructed into a feature vector set. Secondly, based on the feature vector set, back-propagation neural network (BPNN as a means of machine learning is applied to establish a discriminator for rock fracture and blast events. Then to evaluate the classification performances of the new method, the classifying accuracies of support vector machine (SVM, naive Bayes classifier, and the new method are compared, and the receiver operating characteristic (ROC curves are also analyzed. The results show the new method obtains the best classification performances. In addition, the influence of different scale factor q and number of training samples n on discrimination results is discussed. It is found that the classifying accuracy of the new method reaches the highest value when q = 8–15 or 8–20 and n=140.

  12. Machine Learning and Applied Linguistics

    OpenAIRE

    Vajjala, Sowmya

    2018-01-01

    This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.

  13. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  14. Feasibility of Machine Learning Methods for Separating Wood and Leaf Points from Terrestrial Laser Scanning Data

    Science.gov (United States)

    Wang, D.; Hollaus, M.; Pfeifer, N.

    2017-09-01

    Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI) and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM), Na¨ıve Bayes (NB), Random Forest (RF), and Gaussian Mixture Model (GMM), for separating wood and leaf points from terrestrial laser scanning (TLS) data. Two trees, an Erytrophleum fordii and a Betula pendula (silver birch) are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.

  15. FEASIBILITY OF MACHINE LEARNING METHODS FOR SEPARATING WOOD AND LEAF POINTS FROM TERRESTRIAL LASER SCANNING DATA

    Directory of Open Access Journals (Sweden)

    D. Wang

    2017-09-01

    Full Text Available Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM, Na¨ıve Bayes (NB, Random Forest (RF, and Gaussian Mixture Model (GMM, for separating wood and leaf points from terrestrial laser scanning (TLS data. Two trees, an Erytrophleum fordii and a Betula pendula (silver birch are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.

  16. On the Use of Machine Learning for Identifying Botnet Network Traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    contemporary approaches use machine learning techniques for identifying malicious traffic. This paper presents a survey of contemporary botnet detection methods that rely on machine learning for identifying botnet network traffic. The paper provides a comprehensive overview on the existing scientific work thus...... contributing to the better understanding of capabilities, limitations and opportunities of using machine learning for identifying botnet traffic. Furthermore, the paper outlines possibilities for the future development of machine learning-based botnet detection systems....

  17. Machine learning methods for credibility assessment of interviewees based on posturographic data.

    Science.gov (United States)

    Saripalle, Sashi K; Vemulapalli, Spandana; King, Gregory W; Burgoon, Judee K; Derakhshani, Reza

    2015-01-01

    This paper discusses the advantages of using posturographic signals from force plates for non-invasive credibility assessment. The contributions of our work are two fold: first, the proposed method is highly efficient and non invasive. Second, feasibility for creating an autonomous credibility assessment system using machine-learning algorithms is studied. This study employs an interview paradigm that includes subjects responding with truthful and deceptive intent while their center of pressure (COP) signal is being recorded. Classification models utilizing sets of COP features for deceptive responses are derived and best accuracy of 93.5% for test interval is reported.

  18. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' < or = 18 mag), with 4500 K < or = Teff < or = 7000 K, corresponding to those with the most reliable SSPP estimates, I find that the model predicts [Fe/H] values with a root-mean-squared-error (RMSE) of approx.0.27 dex. The RMSE from this machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  19. Quantum Machine Learning

    OpenAIRE

    Romero García, Cristian

    2017-01-01

    [EN] In a world in which accessible information grows exponentially, the selection of the appropriate information turns out to be an extremely relevant problem. In this context, the idea of Machine Learning (ML), a subfield of Artificial Intelligence, emerged to face problems in data mining, pattern recognition, automatic prediction, among others. Quantum Machine Learning is an interdisciplinary research area combining quantum mechanics with methods of ML, in which quantum properties allow fo...

  20. Machine-Learning Research

    OpenAIRE

    Dietterich, Thomas G.

    1997-01-01

    Machine-learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (1) the improvement of classification accuracy by learning ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

  1. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  2. Improved machine learning method for analysis of gas phase chemistry of peptides

    Directory of Open Access Journals (Sweden)

    Ahn Natalie

    2008-12-01

    Full Text Available Abstract Background Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrimination is achieved with theoretical spectra that are based on simulating gas phase chemistry of the peptides, but the limited understanding of those processes affects the accuracy of predictions from theoretical spectra. Results We employed a robust data mining strategy using new feature annotation functions of MAE software, which revealed under-prediction of the frequency of occurrence in fragmentation of the second peptide bond. We applied methods of exploratory data analysis to pre-process the information in the MS/MS spectra, including data normalization and attribute selection, to reduce the attributes to a smaller, less correlated set for machine learning studies. We then compared our rule building machine learning program, DataSqueezer, with commonly used association rules and decision tree algorithms. All used machine learning algorithms produced similar results that were consistent with expected properties for a second gas phase mechanism at the second peptide bond. Conclusion The results provide compelling evidence that we have identified underlying chemical properties in the data that suggest the existence of an additional gas phase mechanism for the second peptide bond. Thus, the methods described in this study provide a valuable approach for analyses of this kind in the future.

  3. Application of machine learning methods in bioinformatics

    Science.gov (United States)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  4. Alumina Concentration Detection Based on the Kernel Extreme Learning Machine.

    Science.gov (United States)

    Zhang, Sen; Zhang, Tao; Yin, Yixin; Xiao, Wendong

    2017-09-01

    The concentration of alumina in the electrolyte is of great significance during the production of aluminum. The amount of the alumina concentration may lead to unbalanced material distribution and low production efficiency and affect the stability of the aluminum reduction cell and current efficiency. The existing methods cannot meet the needs for online measurement because industrial aluminum electrolysis has the characteristics of high temperature, strong magnetic field, coupled parameters, and high nonlinearity. Currently, there are no sensors or equipment that can detect the alumina concentration on line. Most companies acquire the alumina concentration from the electrolyte samples which are analyzed through an X-ray fluorescence spectrometer. To solve the problem, the paper proposes a soft sensing model based on a kernel extreme learning machine algorithm that takes the kernel function into the extreme learning machine. K-fold cross validation is used to estimate the generalization error. The proposed soft sensing algorithm can detect alumina concentration by the electrical signals such as voltages and currents of the anode rods. The predicted results show that the proposed approach can give more accurate estimations of alumina concentration with faster learning speed compared with the other methods such as the basic ELM, BP, and SVM.

  5. Online transfer learning with extreme learning machine

    Science.gov (United States)

    Yin, Haibo; Yang, Yun-an

    2017-05-01

    In this paper, we propose a new transfer learning algorithm for online training. The proposed algorithm, which is called Online Transfer Extreme Learning Machine (OTELM), is based on Online Sequential Extreme Learning Machine (OSELM) while it introduces Semi-Supervised Extreme Learning Machine (SSELM) to transfer knowledge from the source to the target domain. With the manifold regularization, SSELM picks out instances from the source domain that are less relevant to those in the target domain to initialize the online training, so as to improve the classification performance. Experimental results demonstrate that the proposed OTELM can effectively use instances in the source domain to enhance the learning performance.

  6. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    Science.gov (United States)

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. All rights reserved.

  7. A Novel Bearing Fault Diagnosis Method Based on Gaussian Restricted Boltzmann Machine

    Directory of Open Access Journals (Sweden)

    Xiao-hui He

    2016-01-01

    Full Text Available To realize the fault diagnosis of bearing effectively, this paper presents a novel bearing fault diagnosis method based on Gaussian restricted Boltzmann machine (Gaussian RBM. Vibration signals are firstly resampled to the same equivalent speed. Subsequently, the envelope spectrums of the resampled data are used directly as the feature vectors to represent the fault types of bearing. Finally, in order to deal with the high-dimensional feature vectors based on envelope spectrum, a classifier model based on Gaussian RBM is applied. Gaussian RBM has the ability to provide a closed-form representation of the distribution underlying the training data, and it is very convenient for modeling high-dimensional real-valued data. Experiments on 10 different data sets verify the performance of the proposed method. The superiority of Gaussian RBM classifier is also confirmed by comparing with other classifiers, such as extreme learning machine, support vector machine, and deep belief network. The robustness of the proposed method is also studied in this paper. It can be concluded that the proposed method can realize the bearing fault diagnosis accurately and effectively.

  8. Towards large-scale FAME-based bacterial species identification using machine learning techniques.

    Science.gov (United States)

    Slabbinck, Bram; De Baets, Bernard; Dawyndt, Peter; De Vos, Paul

    2009-05-01

    In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species

  9. Machine learning methods for metabolic pathway prediction

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2010-01-01

    Full Text Available Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

  10. Machine learning methods for metabolic pathway prediction

    Science.gov (United States)

    2010-01-01

    Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. PMID:20064214

  11. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  12. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    Science.gov (United States)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  13. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  14. Machine Learning Methods for Production Cases Analysis

    Science.gov (United States)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  15. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

    Science.gov (United States)

    2013-01-01

    Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313

  16. Summary of vulnerability related technologies based on machine learning

    Science.gov (United States)

    Zhao, Lei; Chen, Zhihao; Jia, Qiong

    2018-04-01

    As the scale of information system increases by an order of magnitude, the complexity of system software is getting higher. The vulnerability interaction from design, development and deployment to implementation stages greatly increases the risk of the entire information system being attacked successfully. Considering the limitations and lags of the existing mainstream security vulnerability detection techniques, this paper summarizes the development and current status of related technologies based on the machine learning methods applied to deal with massive and irregular data, and handling security vulnerabilities.

  17. A User-Oriented Splog Filtering Based on a Machine Learning

    Science.gov (United States)

    Yoshinaka, Takayuki; Ishii, Soichi; Fukuhara, Tomohiro; Masuda, Hidetaka; Nakagawa, Hiroshi

    A method for filtering spam blogs (splogs) based on a machine learning technique, and its evaluation results are described. Today, spam blogs (splogs) became one of major issues on the Web. The problem of splogs is that values of blog sites are different by people. We propose a novel user-oriented splog filtering method that can adapt each user's preference for valuable blogs. We use the SVM(Support Vector Machine) for creating a personalized splog filter for each user. We had two experiments: (1) an experiment of individual splog judgement, and (2) an experiment for user oriented splog filtering. From the former experiment, we found existence of 'gray' blogs that are needed to treat by persons. From the latter experiment, we found that we can provide appropriate personalized filters by choosing the best feature set for each user. An overview of proposed method, and evaluation results are described.

  18. Towards a Standard-based Domain-specific Platform to Solve Machine Learning-based Problems

    Directory of Open Access Journals (Sweden)

    Vicente García-Díaz

    2015-12-01

    Full Text Available Machine learning is one of the most important subfields of computer science and can be used to solve a variety of interesting artificial intelligence problems. There are different languages, framework and tools to define the data needed to solve machine learning-based problems. However, there is a great number of very diverse alternatives which makes it difficult the intercommunication, portability and re-usability of the definitions, designs or algorithms that any developer may create. In this paper, we take the first step towards a language and a development environment independent of the underlying technologies, allowing developers to design solutions to solve machine learning-based problems in a simple and fast way, automatically generating code for other technologies. That can be considered a transparent bridge among current technologies. We rely on Model-Driven Engineering approach, focusing on the creation of models to abstract the definition of artifacts from the underlying technologies.

  19. Machine learning methods in predicting the student academic motivation

    Directory of Open Access Journals (Sweden)

    Ivana Đurđević Babić

    2017-01-01

    Full Text Available Academic motivation is closely related to academic performance. For educators, it is equally important to detect early students with a lack of academic motivation as it is to detect those with a high level of academic motivation. In endeavouring to develop a classification model for predicting student academic motivation based on their behaviour in learning management system (LMS courses, this paper intends to establish links between the predicted student academic motivation and their behaviour in the LMS course. Students from all years at the Faculty of Education in Osijek participated in this research. Three machine learning classifiers (neural networks, decision trees, and support vector machines were used. To establish whether a significant difference in the performance of models exists, a t-test of the difference in proportions was used. Although, all classifiers were successful, the neural network model was shown to be the most successful in detecting the student academic motivation based on their behaviour in LMS course.

  20. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  1. Machine Learning Methods to Predict Diabetes Complications.

    Science.gov (United States)

    Dagliati, Arianna; Marini, Simone; Sacchi, Lucia; Cogni, Giulia; Teliti, Marsida; Tibollo, Valentina; De Cata, Pasquale; Chiovato, Luca; Bellazzi, Riccardo

    2018-03-01

    One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical practice.

  2. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  3. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  4. Simulation-driven machine learning: Bearing fault classification

    Science.gov (United States)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  5. Impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines

    Science.gov (United States)

    Marulcu, Ismail; Barnett, Michael

    2016-01-01

    Background: Elementary Science Education is struggling with multiple challenges. National and State test results confirm the need for deeper understanding in elementary science education. Moreover, national policy statements and researchers call for increased exposure to engineering and technology in elementary science education. The basic motivation of this study is to suggest a solution to both improving elementary science education and increasing exposure to engineering and technology in it. Purpose/Hypothesis: This mixed-method study examined the impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines. We hypothesize that the LEGO-engineering design unit is as successful as the inquiry-based unit in terms of students' science content learning of simple machines. Design/Method: We used a mixed-methods approach to investigate our research questions; we compared the control and the experimental groups' scores from the tests and interviews by using Analysis of Covariance (ANCOVA) and compared each group's pre- and post-scores by using paired t-tests. Results: Our findings from the paired t-tests show that both the experimental and comparison groups significantly improved their scores from the pre-test to post-test on the multiple-choice, open-ended, and interview items. Moreover, ANCOVA results show that students in the experimental group, who learned simple machines with the design-based unit, performed significantly better on the interview questions. Conclusions: Our analyses revealed that the design-based Design a people mover: Simple machines unit was, if not better, as successful as the inquiry-based FOSS Levers and pulleys unit in terms of students' science content learning.

  6. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  7. Health Informatics via Machine Learning for the Clinical Management of Patients.

    Science.gov (United States)

    Clifton, D A; Niehaus, K E; Charlton, P; Colopy, G W

    2015-08-13

    To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale.

  8. Runtime Optimizations for Tree-Based Machine Learning Models

    NARCIS (Netherlands)

    N. Asadi; J.J.P. Lin (Jimmy); A.P. de Vries (Arjen)

    2014-01-01

    htmlabstractTree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression

  9. A strategy for quantum algorithm design assisted by machine learning

    International Nuclear Information System (INIS)

    Bang, Jeongho; Lee, Jinhyoung; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin

    2014-01-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum–classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch–Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method. (paper)

  10. A strategy for quantum algorithm design assisted by machine learning

    Science.gov (United States)

    Bang, Jeongho; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin; Lee, Jinhyoung

    2014-07-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum-classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch-Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method.

  11. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

    Science.gov (United States)

    Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

    2011-01-01

    Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433

  12. Extremely Randomized Machine Learning Methods for Compound Activity Prediction

    Directory of Open Access Journals (Sweden)

    Wojciech M. Czarnecki

    2015-11-01

    Full Text Available Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called ‘extremely randomized methods’—Extreme Entropy Machine and Extremely Randomized Trees—for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their ‘non-extreme’ competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

  13. MACHINE LEARNING METHODS IN DIGITAL AGRICULTURE: ALGORITHMS AND CASES

    Directory of Open Access Journals (Sweden)

    Aleksandr Vasilyevich Koshkarov

    2018-05-01

    Full Text Available Ensuring food security is a major challenge in many countries. With a growing global population, the issues of improving the efficiency of agriculture have become most relevant. Farmers are looking for new ways to increase yields, and governments of different countries are developing new programs to support agriculture. This contributes to a more active implementation of digital technologies in agriculture, helping farmers to make better decisions, increase yields and take care of the environment. The central point is the collection and analysis of data. In the industry of agriculture, data can be collected from different sources and may contain useful patterns that identify potential problems or opportunities. Data should be analyzed using machine learning algorithms to extract useful insights. Such methods of precision farming allow the farmer to monitor individual parts of the field, optimize the consumption of water and chemicals, and identify problems quickly. Purpose: to make an overview of the machine learning algorithms used for data analysis in agriculture. Methodology: an overview of the relevant literature; a survey of farmers. Results: relevant algorithms of machine learning for the analysis of data in agriculture at various levels were identified: soil analysis (soil assessment, soil classification, soil fertility predictions, weather forecast (simulation of climate change, temperature and precipitation prediction, and analysis of vegetation (weed identification, vegetation classification, plant disease identification, crop forecasting. Practical implications: agriculture, crop production.

  14. NetiNeti: discovery of scientific names from text using machine learning methods

    Directory of Open Access Journals (Sweden)

    Akella Lakshmi

    2012-08-01

    Full Text Available Abstract Background A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. Results We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing, a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE. NetiNeti performs better (precision = 98.9% and recall = 70.5% compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3% on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Conclusions We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.

  15. Radar HRRP Target Recognition Based on Stacked Autoencoder and Extreme Learning Machine.

    Science.gov (United States)

    Zhao, Feixiang; Liu, Yongxiang; Huo, Kai; Zhang, Shuanghui; Zhang, Zhongshuai

    2018-01-10

    A novel radar high-resolution range profile (HRRP) target recognition method based on a stacked autoencoder (SAE) and extreme learning machine (ELM) is presented in this paper. As a key component of deep structure, the SAE does not only learn features by making use of data, it also obtains feature expressions at different levels of data. However, with the deep structure, it is hard to achieve good generalization performance with a fast learning speed. ELM, as a new learning algorithm for single hidden layer feedforward neural networks (SLFNs), has attracted great interest from various fields for its fast learning speed and good generalization performance. However, ELM needs more hidden nodes than conventional tuning-based learning algorithms due to the random set of input weights and hidden biases. In addition, the existing ELM methods cannot utilize the class information of targets well. To solve this problem, a regularized ELM method based on the class information of the target is proposed. In this paper, SAE and the regularized ELM are combined to make full use of their advantages and make up for each of their shortcomings. The effectiveness of the proposed method is demonstrated by experiments with measured radar HRRP data. The experimental results show that the proposed method can achieve good performance in the two aspects of real-time and accuracy, especially when only a few training samples are available.

  16. Book review: A first course in Machine Learning

    DEFF Research Database (Denmark)

    Ortiz-Arroyo, Daniel

    2016-01-01

    "The new edition of A First Course in Machine Learning by Rogers and Girolami is an excellent introduction to the use of statistical methods in machine learning. The book introduces concepts such as mathematical modeling, inference, and prediction, providing ‘just in time’ the essential background...... to change models and parameter values to make [it] easier to understand and apply these models in real applications. The authors [also] introduce more advanced, state-of-the-art machine learning methods, such as Gaussian process models and advanced mixture models, which are used across machine learning....... This makes the book interesting not only to students with little or no background in machine learning but also to more advanced graduate students interested in statistical approaches to machine learning." —Daniel Ortiz-Arroyo, Associate Professor, Aalborg University Esbjerg, Denmark...

  17. Machine Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  18. Machine Learning an algorithmic perspective

    CERN Document Server

    Marsland, Stephen

    2009-01-01

    Traditional books on machine learning can be divided into two groups - those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement le

  19. Using Machine Learning to Predict Student Performance

    OpenAIRE

    Pojon, Murat

    2017-01-01

    This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance. Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification ...

  20. Machine learning versus knowledge based classification of legal texts

    NARCIS (Netherlands)

    de Maat, E.; Krabben, K.; Winkels, R.; Winkels, R.G.F.

    2010-01-01

    This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but

  1. Rapid tomographic reconstruction based on machine learning for time-resolved combustion diagnostics

    Science.gov (United States)

    Yu, Tao; Cai, Weiwei; Liu, Yingzheng

    2018-04-01

    Optical tomography has attracted surged research efforts recently due to the progress in both the imaging concepts and the sensor and laser technologies. The high spatial and temporal resolutions achievable by these methods provide unprecedented opportunity for diagnosis of complicated turbulent combustion. However, due to the high data throughput and the inefficiency of the prevailing iterative methods, the tomographic reconstructions which are typically conducted off-line are computationally formidable. In this work, we propose an efficient inversion method based on a machine learning algorithm, which can extract useful information from the previous reconstructions and build efficient neural networks to serve as a surrogate model to rapidly predict the reconstructions. Extreme learning machine is cited here as an example for demonstrative purpose simply due to its ease of implementation, fast learning speed, and good generalization performance. Extensive numerical studies were performed, and the results show that the new method can dramatically reduce the computational time compared with the classical iterative methods. This technique is expected to be an alternative to existing methods when sufficient training data are available. Although this work is discussed under the context of tomographic absorption spectroscopy, we expect it to be useful also to other high speed tomographic modalities such as volumetric laser-induced fluorescence and tomographic laser-induced incandescence which have been demonstrated for combustion diagnostics.

  2. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  3. Method of Automatic Ontology Mapping through Machine Learning and Logic Mining

    Institute of Scientific and Technical Information of China (English)

    王英林

    2004-01-01

    Ontology mapping is the bottleneck of handling conflicts among heterogeneous ontologies and of implementing reconfiguration or interoperability of legacy systems. We proposed an ontology mapping method by using machine learning, type constraints and logic mining techniques. This method is able to find concept correspondences through instances and the result is optimized by using an error function; it is able to find attribute correspondence between two equivalent concepts and the mapping accuracy is enhanced by combining together instances learning, type constraints and the logic relations that are imbedded in instances; moreover, it solves the most common kind of categorization conflicts. We then proposed a merging algorithm to generate the shared ontology and proposed a reconfigurable architecture for interoperation based on multi agents. The legacy systems are encapsulated as information agents to participate in the integration system. Finally we give a simplified case study.

  4. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  5. Cosmic String Detection with Tree-Based Machine Learning

    Science.gov (United States)

    Vafaei Sadr, A.; Farhang, M.; Movahed, S. M. S.; Bassett, B.; Kunz, M.

    2018-05-01

    We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies. The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of the processed CMB maps that boost cosmic string detectability. Our proposed classifiers, after training, give results similar to or better than claimed detectability levels from other methods for string tension, Gμ. They can make 3σ detection of strings with Gμ ≳ 2.1 × 10-10 for noise-free, 0.9΄-resolution CMB observations. The minimum detectable tension increases to Gμ ≳ 3.0 × 10-8 for a more realistic, CMB S4-like (II) strategy, improving over previous results.

  6. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    Science.gov (United States)

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.

  7. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  8. Machine learning applications in genetics and genomics.

    Science.gov (United States)

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  9. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    Science.gov (United States)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  10. Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning.

    Science.gov (United States)

    Gorban, A N; Mirkes, E M; Zinovyev, A

    2016-12-01

    Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L 1 norm or even sub-linear potentials corresponding to quasinorms L p (0application of min-plus algebra. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized and sparse regression, leading to the improvement in the computational cost/accuracy trade-off. We demonstrate that on synthetic and real-life datasets PQSQ-based machine learning methods achieve orders of magnitude faster computational performance than the corresponding state-of-the-art methods, having similar or better approximation accuracy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Research on intelligent machine self-perception method based on LSTM

    Science.gov (United States)

    Wang, Qiang; Cheng, Tao

    2018-05-01

    In this paper, we use the advantages of LSTM in feature extraction and processing high-dimensional and complex nonlinear data, and apply it to the autonomous perception of intelligent machines. Compared with the traditional multi-layer neural network, this model has memory, can handle time series information of any length. Since the multi-physical domain signals of processing machines have a certain timing relationship, and there is a contextual relationship between states and states, using this deep learning method to realize the self-perception of intelligent processing machines has strong versatility and adaptability. The experiment results show that the method proposed in this paper can obviously improve the sensing accuracy under various working conditions of the intelligent machine, and also shows that the algorithm can well support the intelligent processing machine to realize self-perception.

  12. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  13. A multi-label learning based kernel automatic recommendation method for support vector machine.

    Science.gov (United States)

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.

  14. A machine learning approach to the accurate prediction of monitor units for a compact proton machine.

    Science.gov (United States)

    Sun, Baozhou; Lam, Dao; Yang, Deshan; Grantham, Kevin; Zhang, Tiezhi; Mutic, Sasa; Zhao, Tianyu

    2018-05-01

    Clinical treatment planning systems for proton therapy currently do not calculate monitor units (MUs) in passive scatter proton therapy due to the complexity of the beam delivery systems. Physical phantom measurements are commonly employed to determine the field-specific output factors (OFs) but are often subject to limited machine time, measurement uncertainties and intensive labor. In this study, a machine learning-based approach was developed to predict output (cGy/MU) and derive MUs, incorporating the dependencies on gantry angle and field size for a single-room proton therapy system. The goal of this study was to develop a secondary check tool for OF measurements and eventually eliminate patient-specific OF measurements. The OFs of 1754 fields previously measured in a water phantom with calibrated ionization chambers and electrometers for patient-specific fields with various range and modulation width combinations for 23 options were included in this study. The training data sets for machine learning models in three different methods (Random Forest, XGBoost and Cubist) included 1431 (~81%) OFs. Ten-fold cross-validation was used to prevent "overfitting" and to validate each model. The remaining 323 (~19%) OFs were used to test the trained models. The difference between the measured and predicted values from machine learning models was analyzed. Model prediction accuracy was also compared with that of the semi-empirical model developed by Kooy (Phys. Med. Biol. 50, 2005). Additionally, gantry angle dependence of OFs was measured for three groups of options categorized on the selection of the second scatters. Field size dependence of OFs was investigated for the measurements with and without patient-specific apertures. All three machine learning methods showed higher accuracy than the semi-empirical model which shows considerably large discrepancy of up to 7.7% for the treatment fields with full range and full modulation width. The Cubist-based solution

  15. Fingerprint-Based Machine Learning Approach to Identify Potent and Selective 5-HT2BR Ligands

    Directory of Open Access Journals (Sweden)

    Krzysztof Rataj

    2018-05-01

    Full Text Available The identification of subtype-selective GPCR (G-protein coupled receptor ligands is a challenging task. In this study, we developed a computational protocol to find compounds with 5-HT2BR versus 5-HT1BR selectivity. Our approach employs the hierarchical combination of machine learning methods, docking, and multiple scoring methods. First, we applied machine learning tools to filter a large database of druglike compounds by the new Neighbouring Substructures Fingerprint (NSFP. This two-dimensional fingerprint contains information on the connectivity of the substructural features of a compound. Preselected subsets of the database were then subjected to docking calculations. The main indicators of compounds’ selectivity were their different interactions with the secondary binding pockets of both target proteins, while binding modes within the orthosteric binding pocket were preserved. The combined methodology of ligand-based and structure-based methods was validated prospectively, resulting in the identification of hits with nanomolar affinity and ten-fold to ten thousand-fold selectivities.

  16. Machine learning application in the life time of materials

    OpenAIRE

    Yu, Xiaojiao

    2017-01-01

    Materials design and development typically takes several decades from the initial discovery to commercialization with the traditional trial and error development approach. With the accumulation of data from both experimental and computational results, data based machine learning becomes an emerging field in materials discovery, design and property prediction. This manuscript reviews the history of materials science as a disciplinary the most common machine learning method used in materials sc...

  17. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features.

    Science.gov (United States)

    Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

    2017-07-18

    Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.

  18. Improved Saturated Hydraulic Conductivity Pedotransfer Functions Using Machine Learning Methods

    Science.gov (United States)

    Araya, S. N.; Ghezzehei, T. A.

    2017-12-01

    Saturated hydraulic conductivity (Ks) is one of the fundamental hydraulic properties of soils. Its measurement, however, is cumbersome and instead pedotransfer functions (PTFs) are often used to estimate it. Despite a lot of progress over the years, generic PTFs that estimate hydraulic conductivity generally don't have a good performance. We develop significantly improved PTFs by applying state of the art machine learning techniques coupled with high-performance computing on a large database of over 20,000 soils—USKSAT and the Florida Soil Characterization databases. We compared the performance of four machine learning algorithms (k-nearest neighbors, gradient boosted model, support vector machine, and relevance vector machine) and evaluated the relative importance of several soil properties in explaining Ks. An attempt is also made to better account for soil structural properties; we evaluated the importance of variables derived from transformations of soil water retention characteristics and other soil properties. The gradient boosted models gave the best performance with root mean square errors less than 0.7 and mean errors in the order of 0.01 on a log scale of Ks [cm/h]. The effective particle size, D10, was found to be the single most important predictor. Other important predictors included percent clay, bulk density, organic carbon percent, coefficient of uniformity and values derived from water retention characteristics. Model performances were consistently better for Ks values greater than 10 cm/h. This study maximizes the extraction of information from a large database to develop generic machine learning based PTFs to estimate Ks. The study also evaluates the importance of various soil properties and their transformations in explaining Ks.

  19. Machine learning in heart failure: ready for prime time.

    Science.gov (United States)

    Awan, Saqib Ejaz; Sohel, Ferdous; Sanfilippo, Frank Mario; Bennamoun, Mohammed; Dwivedi, Girish

    2018-03-01

    The aim of this review is to present an up-to-date overview of the application of machine learning methods in heart failure including diagnosis, classification, readmissions and medication adherence. Recent studies have shown that the application of machine learning techniques may have the potential to improve heart failure outcomes and management, including cost savings by improving existing diagnostic and treatment support systems. Recently developed deep learning methods are expected to yield even better performance than traditional machine learning techniques in performing complex tasks by learning the intricate patterns hidden in big medical data. The review summarizes the recent developments in the application of machine and deep learning methods in heart failure management.

  20. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development.

    Science.gov (United States)

    Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

    2015-01-01

    Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.

  1. Machine learning in radiation oncology theory and applications

    CERN Document Server

    El Naqa, Issam; Murphy, Martin J

    2015-01-01

    ​This book provides a complete overview of the role of machine learning in radiation oncology and medical physics, covering basic theory, methods, and a variety of applications in medical physics and radiotherapy. An introductory section explains machine learning, reviews supervised and unsupervised learning methods, discusses performance evaluation, and summarizes potential applications in radiation oncology. Detailed individual sections are then devoted to the use of machine learning in quality assurance; computer-aided detection, including treatment planning and contouring; image-guided rad

  2. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules.

    Science.gov (United States)

    Vyas, Renu; Bapat, Sanket; Jain, Esha; Tambe, Sanjeev S; Karthikeyan, Muthukumarasamy; Kulkarni, Bhaskar D

    2015-01-01

    The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX and NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network (ANN), k nearest neighbor (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree--(DT) and Random Forest--(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82%) and toxicophoric (70.64%). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic were studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.

  3. Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records.

    Science.gov (United States)

    Forsyth, Alexander W; Barzilay, Regina; Hughes, Kevin S; Lui, Dickson; Lorenz, Karl A; Enzinger, Andrea; Tulsky, James A; Lindvall, Charlotta

    2018-02-27

    Clinicians document cancer patients' symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped. To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes. The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning. The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes. We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  4. Comparison of four machine learning methods for object-oriented change detection in high-resolution satellite imagery

    Science.gov (United States)

    Bai, Ting; Sun, Kaimin; Deng, Shiquan; Chen, Yan

    2018-03-01

    High resolution image change detection is one of the key technologies of remote sensing application, which is of great significance for resource survey, environmental monitoring, fine agriculture, military mapping and battlefield environment detection. In this paper, for high-resolution satellite imagery, Random Forest (RF), Support Vector Machine (SVM), Deep belief network (DBN), and Adaboost models were established to verify the possibility of different machine learning applications in change detection. In order to compare detection accuracy of four machine learning Method, we applied these four machine learning methods for two high-resolution images. The results shows that SVM has higher overall accuracy at small samples compared to RF, Adaboost, and DBN for binary and from-to change detection. With the increase in the number of samples, RF has higher overall accuracy compared to Adaboost, SVM and DBN.

  5. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  6. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  7. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  8. The Value Simulation-Based Learning Added to Machining Technology in Singapore

    Science.gov (United States)

    Fang, Linda; Tan, Hock Soon; Thwin, Mya Mya; Tan, Kim Cheng; Koh, Caroline

    2011-01-01

    This study seeks to understand the value simulation-based learning (SBL) added to the learning of Machining Technology in a 15-week core subject course offered to university students. The research questions were: (1) How did SBL enhance classroom learning? (2) How did SBL help participants in their test? (3) How did SBL prepare participants for…

  9. METAPHOR: Probability density estimation for machine learning based photometric redshifts

    Science.gov (United States)

    Amaro, V.; Cavuoti, S.; Brescia, M.; Vellucci, C.; Tortora, C.; Longo, G.

    2017-06-01

    We present METAPHOR (Machine-learning Estimation Tool for Accurate PHOtometric Redshifts), a method able to provide a reliable PDF for photometric galaxy redshifts estimated through empirical techniques. METAPHOR is a modular workflow, mainly based on the MLPQNA neural network as internal engine to derive photometric galaxy redshifts, but giving the possibility to easily replace MLPQNA with any other method to predict photo-z's and their PDF. We present here the results about a validation test of the workflow on the galaxies from SDSS-DR9, showing also the universality of the method by replacing MLPQNA with KNN and Random Forest models. The validation test include also a comparison with the PDF's derived from a traditional SED template fitting method (Le Phare).

  10. A Novel Machine Learning Strategy Based on Two-Dimensional Numerical Models in Financial Engineering

    Directory of Open Access Journals (Sweden)

    Qingzhen Xu

    2013-01-01

    Full Text Available Machine learning is the most commonly used technique to address larger and more complex tasks by analyzing the most relevant information already present in databases. In order to better predict the future trend of the index, this paper proposes a two-dimensional numerical model for machine learning to simulate major U.S. stock market index and uses a nonlinear implicit finite-difference method to find numerical solutions of the two-dimensional simulation model. The proposed machine learning method uses partial differential equations to predict the stock market and can be extensively used to accelerate large-scale data processing on the history database. The experimental results show that the proposed algorithm reduces the prediction error and improves forecasting precision.

  11. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach.

    Science.gov (United States)

    Pasupa, Kitsuchart; Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets-Maximum Unbiased Validation Dataset-which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6.

  12. Comparative analysis of expert and machine-learning methods for classification of body cavity effusions in companion animals.

    Science.gov (United States)

    Hotz, Christine S; Templeton, Steven J; Christopher, Mary M

    2005-03-01

    A rule-based expert system using CLIPS programming language was created to classify body cavity effusions as transudates, modified transudates, exudates, chylous, and hemorrhagic effusions. The diagnostic accuracy of the rule-based system was compared with that produced by 2 machine-learning methods: Rosetta, a rough sets algorithm and RIPPER, a rule-induction method. Results of 508 body cavity fluid analyses (canine, feline, equine) obtained from the University of California-Davis Veterinary Medical Teaching Hospital computerized patient database were used to test CLIPS and to test and train RIPPER and Rosetta. The CLIPS system, using 17 rules, achieved an accuracy of 93.5% compared with pathologist consensus diagnoses. Rosetta accurately classified 91% of effusions by using 5,479 rules. RIPPER achieved the greatest accuracy (95.5%) using only 10 rules. When the original rules of the CLIPS application were replaced with those of RIPPER, the accuracy rates were identical. These results suggest that both rule-based expert systems and machine-learning methods hold promise for the preliminary classification of body fluids in the clinical laboratory.

  13. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  14. Status Checking System of Home Appliances using machine learning

    Directory of Open Access Journals (Sweden)

    Yoon Chi-Yurl

    2017-01-01

    Full Text Available This paper describes status checking system of home appliances based on machine learning, which can be applied to existing household appliances without networking function. Designed status checking system consists of sensor modules, a wireless communication module, cloud server, android application and a machine learning algorithm. The developed system applied to washing machine analyses and judges the four-kinds of appliance’s status such as staying, washing, rinsing and spin-drying. The measurements of sensor and transmission of sensing data are operated on an Arduino board and the data are transmitted to cloud server in real time. The collected data are parsed by an Android application and injected into the machine learning algorithm for learning the status of the appliances. The machine learning algorithm compares the stored learning data with collected real-time data from the appliances. Our results are expected to contribute as a base technology to design an automatic control system based on machine learning technology for household appliances in real-time.

  15. Data Mining and Machine Learning Methods for Dementia Research.

    Science.gov (United States)

    Li, Rui

    2018-01-01

    Patient data in clinical research often includes large amounts of structured information, such as neuroimaging data, neuropsychological test results, and demographic variables. Given the various sources of information, we can develop computerized methods that can be a great help to clinicians to discover hidden patterns in the data. The computerized methods often employ data mining and machine learning algorithms, lending themselves as the computer-aided diagnosis (CAD) tool that assists clinicians in making diagnostic decisions. In this chapter, we review state-of-the-art methods used in dementia research, and briefly introduce some recently proposed algorithms subsequently.

  16. Machine Learning.

    Science.gov (United States)

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  17. Less is more: regularization perspectives on large scale machine learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Deep learning based techniques provide a possible solution at the expanse of theoretical guidance and, especially, of computational requirements. It is then a key challenge for large scale machine learning to devise approaches guaranteed to be accurate and yet computationally efficient. In this talk, we will consider a regularization perspectives on machine learning appealing to classical ideas in linear algebra and inverse problems to scale-up dramatically nonparametric methods such as kernel methods, often dismissed because of prohibitive costs. Our analysis derives optimal theoretical guarantees while providing experimental results at par or out-performing state of the art approaches.

  18. Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

    Science.gov (United States)

    Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

    2017-10-01

    Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.

  19. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting

    OpenAIRE

    Zhaoxuan Li; SM Mahbobur Rahman; Rolando Vega; Bing Dong

    2016-01-01

    We evaluate and compare two common methods, artificial neural networks (ANN) and support vector regression (SVR), for predicting energy productions from a solar photovoltaic (PV) system in Florida 15 min, 1 h and 24 h ahead of time. A hierarchical approach is proposed based on the machine learning algorithms tested. The production data used in this work corresponds to 15 min averaged power measurements collected from 2014. The accuracy of the model is determined using computing error statisti...

  20. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Science.gov (United States)

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  1. Diagnosis of Dementia by Machine learning methods in Epidemiological studies: a pilot exploratory study from south India.

    Science.gov (United States)

    Bhagyashree, Sheshadri Iyengar Raghavan; Nagaraj, Kiran; Prince, Martin; Fall, Caroline H D; Krishna, Murali

    2018-01-01

    There are limited data on the use of artificial intelligence methods for the diagnosis of dementia in epidemiological studies in low- and middle-income country (LMIC) settings. A culture and education fair battery of cognitive tests was developed and validated for population based studies in low- and middle-income countries including India by the 10/66 Dementia Research Group. We explored the machine learning methods based on the 10/66 battery of cognitive tests for the diagnosis of dementia based in a birth cohort study in South India. The data sets for 466 men and women for this study were obtained from the on-going Mysore Studies of Natal effect of Health and Ageing (MYNAH), in south India. The data sets included: demographics, performance on the 10/66 cognitive function tests, the 10/66 diagnosis of mental disorders and population based normative data for the 10/66 battery of cognitive function tests. Diagnosis of dementia from the rule based approach was compared against the 10/66 diagnosis of dementia. We have applied machine learning techniques to identify minimal number of the 10/66 cognitive function tests required for diagnosing dementia and derived an algorithm to improve the accuracy of dementia diagnosis. Of 466 subjects, 27 had 10/66 diagnosis of dementia, 19 of whom were correctly identified as having dementia by Jrip classification with 100% accuracy. This pilot exploratory study indicates that machine learning methods can help identify community dwelling older adults with 10/66 criterion diagnosis of dementia with good accuracy in a LMIC setting such as India. This should reduce the duration of the diagnostic assessment and make the process easier and quicker for clinicians, patients and will be useful for 'case' ascertainment in population based epidemiological studies.

  2. Can machine learning explain human learning?

    NARCIS (Netherlands)

    Vahdat, M.; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

    2016-01-01

    Learning Analytics (LA) has a major interest in exploring and understanding the learning process of humans and, for this purpose, benefits from both Cognitive Science, which studies how humans learn, and Machine Learning, which studies how algorithms learn from data. Usually, Machine Learning is

  3. Predicting the dissolution kinetics of silicate glasses using machine learning

    Science.gov (United States)

    Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu

    2018-05-01

    Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.

  4. Machine learning in laboratory medicine: waiting for the flood?

    Science.gov (United States)

    Cabitza, Federico; Banfi, Giuseppe

    2018-03-28

    This review focuses on machine learning and on how methods and models combining data analytics and artificial intelligence have been applied to laboratory medicine so far. Although still in its infancy, the potential for applying machine learning to laboratory data for both diagnostic and prognostic purposes deserves more attention by the readership of this journal, as well as by physician-scientists who will want to take advantage of this new computer-based support in pathology and laboratory medicine.

  5. PredPsych: A toolbox for predictive machine learning based approach in experimental psychology research

    OpenAIRE

    Cavallo, Andrea; Becchio, Cristina; Koul, Atesh

    2016-01-01

    Recent years have seen an increased interest in machine learning based predictive methods for analysing quantitative behavioural data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible software framework. The goal of this work was to build an open-source toolbox – “PredPsych” – that could make these methods readily available to all psychologists. PredPsych is a...

  6. Android Used in The Learning Innovation Atwood Machines on Lagrange Mechanics Methods

    Directory of Open Access Journals (Sweden)

    Shabrina Shabrina

    2017-12-01

    Full Text Available Android is one of the smartphone operating system platforms that is now widely developed in learning media. Android allows the learning process to be more flexible and not oriented to be teacher center, but it allows to be student center. The Atwood machines is an experimental tool that is often used to observe mechanical laws in constantly accelerated motion which can also be described by the Lagrange mechanics methods. As an innovative and alternative learning activity, Atwood Android-based learning apps are running for two experimental variations, which are variations in load in cart and load masses that are hung. The experiment of load-carrier mass variation found that the larger load mass in the cart, the smaller the acceleration experienced by the system. Meanwhile, the experiment on the variation of the loaded mass found that the larger the loaded mass, the greater the acceleration experienced by the system.

  7. Fast learning method for convolutional neural networks using extreme learning machine and its application to lane detection.

    Science.gov (United States)

    Kim, Jihun; Kim, Jonghong; Jang, Gil-Jin; Lee, Minho

    2017-03-01

    Deep learning has received significant attention recently as a promising solution to many problems in the area of artificial intelligence. Among several deep learning architectures, convolutional neural networks (CNNs) demonstrate superior performance when compared to other machine learning methods in the applications of object detection and recognition. We use a CNN for image enhancement and the detection of driving lanes on motorways. In general, the process of lane detection consists of edge extraction and line detection. A CNN can be used to enhance the input images before lane detection by excluding noise and obstacles that are irrelevant to the edge detection result. However, training conventional CNNs requires considerable computation and a big dataset. Therefore, we suggest a new learning algorithm for CNNs using an extreme learning machine (ELM). The ELM is a fast learning method used to calculate network weights between output and hidden layers in a single iteration and thus, can dramatically reduce learning time while producing accurate results with minimal training data. A conventional ELM can be applied to networks with a single hidden layer; as such, we propose a stacked ELM architecture in the CNN framework. Further, we modify the backpropagation algorithm to find the targets of hidden layers and effectively learn network weights while maintaining performance. Experimental results confirm that the proposed method is effective in reducing learning time and improving performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Research into Financial Position of Listed Companies following Classification via Extreme Learning Machine Based upon DE Optimization

    Directory of Open Access Journals (Sweden)

    Fu Yu

    2016-01-01

    Full Text Available By means of the model of extreme learning machine based upon DE optimization, this article particularly centers on the optimization thinking of such a model as well as its application effect in the field of listed company’s financial position classification. It proves that the improved extreme learning machine algorithm based upon DE optimization eclipses the traditional extreme learning machine algorithm following comparison. Meanwhile, this article also intends to introduce certain research thinking concerning extreme learning machine into the economics classification area so as to fulfill the purpose of computerizing the speedy but effective evaluation of massive financial statements of listed companies pertain to different classes

  9. A Review of Current Machine Learning Methods Used for Cancer Recurrence Modeling and Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Hemphill, Geralyn M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type has become a necessity in cancer research. A major challenge in cancer management is the classification of patients into appropriate risk groups for better treatment and follow-up. Such risk assessment is critically important in order to optimize the patient’s health and the use of medical resources, as well as to avoid cancer recurrence. This paper focuses on the application of machine learning methods for predicting the likelihood of a recurrence of cancer. It is not meant to be an extensive review of the literature on the subject of machine learning techniques for cancer recurrence modeling. Other recent papers have performed such a review, and I will rely heavily on the results and outcomes from these papers. The electronic databases that were used for this review include PubMed, Google, and Google Scholar. Query terms used include “cancer recurrence modeling”, “cancer recurrence and machine learning”, “cancer recurrence modeling and machine learning”, and “machine learning for cancer recurrence and prediction”. The most recent and most applicable papers to the topic of this review have been included in the references. It also includes a list of modeling and classification methods to predict cancer recurrence.

  10. Research into Financial Position of Listed Companies following Classification via Extreme Learning Machine Based upon DE Optimization

    OpenAIRE

    Fu Yu; Mu Jiong; Duan Xu Liang

    2016-01-01

    By means of the model of extreme learning machine based upon DE optimization, this article particularly centers on the optimization thinking of such a model as well as its application effect in the field of listed company’s financial position classification. It proves that the improved extreme learning machine algorithm based upon DE optimization eclipses the traditional extreme learning machine algorithm following comparison. Meanwhile, this article also intends to introduce certain research...

  11. Machine learning derived risk prediction of anorexia nervosa.

    Science.gov (United States)

    Guo, Yiran; Wei, Zhi; Keating, Brendan J; Hakonarson, Hakon

    2016-01-20

    Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children's Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC's of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting.

  12. An introduction to quantum machine learning

    Science.gov (United States)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  13. Survey of Machine Learning Methods for Database Security

    Science.gov (United States)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  14. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  15. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  16. A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.

    Science.gov (United States)

    Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

    2018-06-12

    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.

  17. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  18. NMF-Based Image Quality Assessment Using Extreme Learning Machine.

    Science.gov (United States)

    Wang, Shuigen; Deng, Chenwei; Lin, Weisi; Huang, Guang-Bin; Zhao, Baojun

    2017-01-01

    Numerous state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage process: distortion description followed by distortion effects pooling. As for the first stage, the distortion descriptors or measurements are expected to be effective representatives of human visual variations, while the second stage should well express the relationship among quality descriptors and the perceptual visual quality. However, most of the existing quality descriptors (e.g., luminance, contrast, and gradient) do not seem to be consistent with human perception, and the effects pooling is often done in ad-hoc ways. In this paper, we propose a novel full-reference IQA metric. It applies non-negative matrix factorization (NMF) to measure image degradations by making use of the parts-based representation of NMF. On the other hand, a new machine learning technique [extreme learning machine (ELM)] is employed to address the limitations of the existing pooling techniques. Compared with neural networks and support vector regression, ELM can achieve higher learning accuracy with faster learning speed. Extensive experimental results demonstrate that the proposed metric has better performance and lower computational complexity in comparison with the relevant state-of-the-art approaches.

  19. Finding protein sites using machine learning methods

    Directory of Open Access Journals (Sweden)

    Jaime Leonardo Bobadilla Molina

    2003-07-01

    Full Text Available The increasing amount of protein three-dimensional (3D structures determined by x-ray and NMR technologies as well as structures predicted by computational methods results in the need for automated methods to provide inital annotations. We have developed a new method for recognizing sites in three-dimensional protein structures. Our method is based on a previosly reported algorithm for creating descriptions of protein microenviroments using physical and chemical properties at multiple levels of detail. The recognition method takes three inputs: 1. A set of control nonsites that share some structural or functional role. 2. A set of control nonsites that lack this role. 3. A single query site. A support vector machine classifier is built using feature vectors where each component represents a property in a given volume. Validation against an independent test set shows that this recognition approach has high sensitivity and specificity. We also describe the results of scanning four calcium binding proteins (with the calcium removed using a three dimensional grid of probe points at 1.25 angstrom spacing. The system finds the sites in the proteins giving points at or near the blinding sites. Our results show that property based descriptions along with support vector machines can be used for recognizing protein sites in unannotated structures.

  20. Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

    Science.gov (United States)

    Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

    2017-12-01

    The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3

  1. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...

  2. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    Science.gov (United States)

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  3. Modeling Music Emotion Judgments Using Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Naresh N. Vempala

    2018-01-01

    Full Text Available Emotion judgments and five channels of physiological data were obtained from 60 participants listening to 60 music excerpts. Various machine learning (ML methods were used to model the emotion judgments inclusive of neural networks, linear regression, and random forests. Input for models of perceived emotion consisted of audio features extracted from the music recordings. Input for models of felt emotion consisted of physiological features extracted from the physiological recordings. Models were trained and interpreted with consideration of the classic debate in music emotion between cognitivists and emotivists. Our models supported a hybrid position wherein emotion judgments were influenced by a combination of perceived and felt emotions. In comparing the different ML approaches that were used for modeling, we conclude that neural networks were optimal, yielding models that were flexible as well as interpretable. Inspection of a committee machine, encompassing an ensemble of networks, revealed that arousal judgments were predominantly influenced by felt emotion, whereas valence judgments were predominantly influenced by perceived emotion.

  4. Learning About Climate and Atmospheric Models Through Machine Learning

    Science.gov (United States)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  5. A Machine Learning Regression scheme to design a FR-Image Quality Assessment Algorithm

    OpenAIRE

    Charrier , Christophe; Lezoray , Olivier; Lebrun , Gilles

    2012-01-01

    International audience; A crucial step in image compression is the evaluation of its performance, and more precisely available ways to measure the quality of compressed images. In this paper, a machine learning expert, providing a quality score is proposed. This quality measure is based on a learned classification process in order to respect that of human observers. The proposed method namely Machine Learning-based Image Quality Measurment (MLIQM) first classifies the quality using multi Supp...

  6. Applications of machine learning in cancer prediction and prognosis.

    Science.gov (United States)

    Cruz, Joseph A; Wishart, David S

    2007-02-11

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  7. A review of machine learning in obesity.

    Science.gov (United States)

    DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M

    2018-05-01

    Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.

  8. Ensemble Machine Learning Methods and Applications

    CERN Document Server

    Ma, Yunqian

    2012-01-01

    It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed “ensemble learning” by researchers in computational intelligence and machine learning, it is known to improve a decision system’s robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as “boosting” and “random forest” facilitate solutions to key computational issues such as face detection and are now being applied in areas as diverse as object trackingand bioinformatics.   Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs. At once a solid theoretical study and a practical guide, the volume is a windfall for r...

  9. Machine learning enhanced optical distance sensor

    Science.gov (United States)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  10. Machine learning models in breast cancer survival prediction.

    Science.gov (United States)

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of

  11. Source localization in an ocean waveguide using supervised machine learning.

    Science.gov (United States)

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  12. Machine learning methods for clinical forms analysis in mental health.

    Science.gov (United States)

    Strauss, John; Peguero, Arturo Martinez; Hirst, Graeme

    2013-01-01

    In preparation for a clinical information system implementation, the Centre for Addiction and Mental Health (CAMH) Clinical Information Transformation project completed multiple preparation steps. An automated process was desired to supplement the onerous task of manual analysis of clinical forms. We used natural language processing (NLP) and machine learning (ML) methods for a series of 266 separate clinical forms. For the investigation, documents were represented by feature vectors. We used four ML algorithms for our examination of the forms: cluster analysis, k-nearest neigh-bours (kNN), decision trees and support vector machines (SVM). Parameters for each algorithm were optimized. SVM had the best performance with a precision of 64.6%. Though we did not find any method sufficiently accurate for practical use, to our knowledge this approach to forms has not been used previously in mental health.

  13. Revisit of Machine Learning Supported Biological and Biomedical Studies.

    Science.gov (United States)

    Yu, Xiang-Tian; Wang, Lu; Zeng, Tao

    2018-01-01

    Generally, machine learning includes many in silico methods to transform the principles underlying natural phenomenon to human understanding information, which aim to save human labor, to assist human judge, and to create human knowledge. It should have wide application potential in biological and biomedical studies, especially in the era of big biological data. To look through the application of machine learning along with biological development, this review provides wide cases to introduce the selection of machine learning methods in different practice scenarios involved in the whole biological and biomedical study cycle and further discusses the machine learning strategies for analyzing omics data in some cutting-edge biological studies. Finally, the notes on new challenges for machine learning due to small-sample high-dimension are summarized from the key points of sample unbalance, white box, and causality.

  14. Machine Learning Methods for Prediction of CDK-Inhibitors

    Science.gov (United States)

    Ramana, Jayashree; Gupta, Dinesh

    2010-01-01

    Progression through the cell cycle involves the coordinated activities of a suite of cyclin/cyclin-dependent kinase (CDK) complexes. The activities of the complexes are regulated by CDK inhibitors (CDKIs). Apart from its role as cell cycle regulators, CDKIs are involved in apoptosis, transcriptional regulation, cell fate determination, cell migration and cytoskeletal dynamics. As the complexes perform crucial and diverse functions, these are important drug targets for tumour and stem cell therapeutic interventions. However, CDKIs are represented by proteins with considerable sequence heterogeneity and may fail to be identified by simple similarity search methods. In this work we have evaluated and developed machine learning methods for identification of CDKIs. We used different compositional features and evolutionary information in the form of PSSMs, from CDKIs and non-CDKIs for generating SVM and ANN classifiers. In the first stage, both the ANN and SVM models were evaluated using Leave-One-Out Cross-Validation and in the second stage these were tested on independent data sets. The PSSM-based SVM model emerged as the best classifier in both the stages and is publicly available through a user-friendly web interface at http://bioinfo.icgeb.res.in/cdkipred. PMID:20967128

  15. Machine learning based Intelligent cognitive network using fog computing

    Science.gov (United States)

    Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

    2017-05-01

    In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.

  16. Application of machine-learning methods to solid-state chemistry: ferromagnetism in transition metal alloys

    International Nuclear Information System (INIS)

    Landrum, G.A.Gregory A.; Genin, Hugh

    2003-01-01

    Machine-learning methods are a collection of techniques for building predictive models from experimental data. The algorithms are problem-independent: the chemistry and physics of the problem being studied are contained in the descriptors used to represent the known data. The application of a variety of machine-learning methods to the prediction of ferromagnetism in ordered and disordered transition metal alloys is presented. Applying a decision tree algorithm to build a predictive model for ordered phases results in a model that is 100% accurate. The same algorithm achieves 99% accuracy when trained on a data set containing both ordered and disordered phases. Details of the descriptor sets for both applications are also presented

  17. Accelerated Monte Carlo system reliability analysis through machine-learning-based surrogate models of network connectivity

    International Nuclear Information System (INIS)

    Stern, R.E.; Song, J.; Work, D.B.

    2017-01-01

    The two-terminal reliability problem in system reliability analysis is known to be computationally intractable for large infrastructure graphs. Monte Carlo techniques can estimate the probability of a disconnection between two points in a network by selecting a representative sample of network component failure realizations and determining the source-terminal connectivity of each realization. To reduce the runtime required for the Monte Carlo approximation, this article proposes an approximate framework in which the connectivity check of each sample is estimated using a machine-learning-based classifier. The framework is implemented using both a support vector machine (SVM) and a logistic regression based surrogate model. Numerical experiments are performed on the California gas distribution network using the epicenter and magnitude of the 1989 Loma Prieta earthquake as well as randomly-generated earthquakes. It is shown that the SVM and logistic regression surrogate models are able to predict network connectivity with accuracies of 99% for both methods, and are 1–2 orders of magnitude faster than using a Monte Carlo method with an exact connectivity check. - Highlights: • Surrogate models of network connectivity are developed by machine-learning algorithms. • Developed surrogate models can reduce the runtime required for Monte Carlo simulations. • Support vector machine and logistic regressions are employed to develop surrogate models. • Numerical example of California gas distribution network demonstrate the proposed approach. • The developed models have accuracies 99%, and are 1–2 orders of magnitude faster than MCS.

  18. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    OpenAIRE

    Zekić-Sušac, Marijana; Pfeifer, Sanja; Šarlija, Nataša

    2014-01-01

    Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART ...

  19. A machine learning model with human cognitive biases capable of learning from small and biased datasets.

    Science.gov (United States)

    Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro

    2018-05-09

    Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.

  20. How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach.

    Science.gov (United States)

    Ichikawa, Daisuke; Saito, Toki; Ujita, Waka; Oyama, Hiroshi

    2016-12-01

    Our purpose was to develop a new machine-learning approach (a virtual health check-up) toward identification of those at high risk of hyperuricemia. Applying the system to general health check-ups is expected to reduce medical costs compared with administering an additional test. Data were collected during annual health check-ups performed in Japan between 2011 and 2013 (inclusive). We prepared training and test datasets from the health check-up data to build prediction models; these were composed of 43,524 and 17,789 persons, respectively. Gradient-boosting decision tree (GBDT), random forest (RF), and logistic regression (LR) approaches were trained using the training dataset and were then used to predict hyperuricemia in the test dataset. Undersampling was applied to build the prediction models to deal with the imbalanced class dataset. The results showed that the RF and GBDT approaches afforded the best performances in terms of sensitivity and specificity, respectively. The area under the curve (AUC) values of the models, which reflected the total discriminative ability of the classification, were 0.796 [95% confidence interval (CI): 0.766-0.825] for the GBDT, 0.784 [95% CI: 0.752-0.815] for the RF, and 0.785 [95% CI: 0.752-0.819] for the LR approaches. No significant differences were observed between pairs of each approach. Small changes occurred in the AUCs after applying undersampling to build the models. We developed a virtual health check-up that predicted the development of hyperuricemia using machine-learning methods. The GBDT, RF, and LR methods had similar predictive capability. Undersampling did not remarkably improve predictive power. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Probabilistic models and machine learning in structural bioinformatics

    DEFF Research Database (Denmark)

    Hamelryck, Thomas

    2009-01-01

    . Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis...

  2. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  3. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  4. Peak detection method evaluation for ion mobility spectrometry by using machine learning approaches.

    Science.gov (United States)

    Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna; Baumbach, Jörg Ingo; Rahmann, Sven; Baumbach, Jan

    2013-04-16

    Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.

  5. Identifying Green Infrastructure from Social Media and Crowdsourcing- An Image Based Machine-Learning Approach.

    Science.gov (United States)

    Rai, A.; Minsker, B. S.

    2016-12-01

    In this work we introduce a novel dataset GRID: GReen Infrastructure Detection Dataset and a framework for identifying urban green storm water infrastructure (GI) designs (wetlands/ponds, urban trees, and rain gardens/bioswales) from social media and satellite aerial images using computer vision and machine learning methods. Along with the hydrologic benefits of GI, such as reducing runoff volumes and urban heat islands, GI also provides important socio-economic benefits such as stress recovery and community cohesion. However, GI is installed by many different parties and cities typically do not know where GI is located, making study of its impacts or siting new GI difficult. We use object recognition learning methods (template matching, sliding window approach, and Random Hough Forest method) and supervised machine learning algorithms (e.g., support vector machines) as initial screening approaches to detect potential GI sites, which can then be investigated in more detail using on-site surveys. Training data were collected from GPS locations of Flickr and Instagram image postings and Amazon Mechanical Turk identification of each GI type. Sliding window method outperformed other methods and achieved an average F measure, which is combined metric for precision and recall performance measure of 0.78.

  6. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  7. Machine learning and computer vision approaches for phenotypic profiling.

    Science.gov (United States)

    Grys, Ben T; Lo, Dara S; Sahin, Nil; Kraus, Oren Z; Morris, Quaid; Boone, Charles; Andrews, Brenda J

    2017-01-02

    With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. © 2017 Grys et al.

  8. MoleculeNet: a benchmark for molecular machine learning.

    Science.gov (United States)

    Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay

    2018-01-14

    Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

  9. Recent Advances in Predictive (Machine) Learning

    Energy Technology Data Exchange (ETDEWEB)

    Friedman, J

    2004-01-24

    Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

  10. Can Machines Learn Respiratory Virus Epidemiology?: A Comparative Study of Likelihood-Free Methods for the Estimation of Epidemiological Dynamics

    Directory of Open Access Journals (Sweden)

    Heidi L. Tessmer

    2018-03-01

    Full Text Available To estimate and predict the transmission dynamics of respiratory viruses, the estimation of the basic reproduction number, R0, is essential. Recently, approximate Bayesian computation methods have been used as likelihood free methods to estimate epidemiological model parameters, particularly R0. In this paper, we explore various machine learning approaches, the multi-layer perceptron, convolutional neural network, and long-short term memory, to learn and estimate the parameters. Further, we compare the accuracy of the estimates and time requirements for machine learning and the approximate Bayesian computation methods on both simulated and real-world epidemiological data from outbreaks of influenza A(H1N1pdm09, mumps, and measles. We find that the machine learning approaches can be verified and tested faster than the approximate Bayesian computation method, but that the approximate Bayesian computation method is more robust across different datasets.

  11. Machine learning-based kinetic modeling: a robust and reproducible solution for quantitative analysis of dynamic PET data.

    Science.gov (United States)

    Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia

    2017-05-07

    A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.

  12. Machine learning-based kinetic modeling: a robust and reproducible solution for quantitative analysis of dynamic PET data

    Science.gov (United States)

    Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia

    2017-05-01

    A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.

  13. Process signal selection method to improve the impact mitigation of sensor broken for diagnosis using machine learning

    International Nuclear Information System (INIS)

    Minowa, Hirotsugu; Gofuku, Akio

    2014-01-01

    Accidents of industrial plants cause large loss on human, economic, social credibility. In recent, studies of diagnostic methods using techniques of machine learning are expected to detect early and correctly abnormality occurred in a plant. However, the general diagnostic machines are generated generally to require all process signals (hereafter, signals) for plant diagnosis. Thus if trouble occurs such as process sensor is broken, the diagnostic machine cannot diagnose or may decrease diagnostic performance. Therefore, we propose an important process signal selection method to improve impact mitigation without reducing the diagnostic performance by reducing the adverse effect of noises on multi-agent diagnostic system. The advantage of our method is the general-purpose property that allows to be applied to various supervised machine learning and to set the various parameters to decide termination of search. The experiment evaluation revealed that diagnostic machines generated by our method using SVM improved the impact mitigation and did not reduce performance about the diagnostic accuracy, the velocity of diagnosis, predictions of plant state near accident occurrence, in comparison with the basic diagnostic machine which diagnoses by using all signals. This paper reports our proposed method and the results evaluated which our method was applied to the simulated abnormal of the fast-breeder reactor Monju. (author)

  14. DNS Tunneling Detection Method Based on Multilabel Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Ahmed Almusawi

    2018-01-01

    Full Text Available DNS tunneling is a method used by malicious users who intend to bypass the firewall to send or receive commands and data. This has a significant impact on revealing or releasing classified information. Several researchers have examined the use of machine learning in terms of detecting DNS tunneling. However, these studies have treated the problem of DNS tunneling as a binary classification where the class label is either legitimate or tunnel. In fact, there are different types of DNS tunneling such as FTP-DNS tunneling, HTTP-DNS tunneling, HTTPS-DNS tunneling, and POP3-DNS tunneling. Therefore, there is a vital demand to not only detect the DNS tunneling but rather classify such tunnel. This study aims to propose a multilabel support vector machine in order to detect and classify the DNS tunneling. The proposed method has been evaluated using a benchmark dataset that contains numerous DNS queries and is compared with a multilabel Bayesian classifier based on the number of corrected classified DNS tunneling instances. Experimental results demonstrate the efficacy of the proposed SVM classification method by obtaining an f-measure of 0.80.

  15. An active role for machine learning in drug development

    Science.gov (United States)

    Murphy, Robert F.

    2014-01-01

    Due to the complexity of biological systems, cutting-edge machine-learning methods will be critical for future drug development. In particular, machine-vision methods to extract detailed information from imaging assays and active-learning methods to guide experimentation will be required to overcome the dimensionality problem in drug development. PMID:21587249

  16. PMLB: a large benchmark suite for machine learning evaluation and comparison.

    Science.gov (United States)

    Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

    2017-01-01

    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

  17. Prediction of Student Dropout in E-Learning Program Through the Use of Machine Learning Method

    Directory of Open Access Journals (Sweden)

    Mingjie Tan

    2015-02-01

    Full Text Available The high rate of dropout is a serious problem in E-learning program. Thus it has received extensive concern from the education administrators and researchers. Predicting the potential dropout students is a workable solution to prevent dropout. Based on the analysis of related literature, this study selected student’s personal characteristic and academic performance as input attributions. Prediction models were developed using Artificial Neural Network (ANN, Decision Tree (DT and Bayesian Networks (BNs. A large sample of 62375 students was utilized in the procedures of model training and testing. The results of each model were presented in confusion matrix, and analyzed by calculating the rates of accuracy, precision, recall, and F-measure. The results suggested all of the three machine learning methods were effective in student dropout prediction, and DT presented a better performance. Finally, some suggestions were made for considerable future research.

  18. Robust Matching Pursuit Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Zejian Yuan

    2018-01-01

    Full Text Available Extreme learning machine (ELM is a popular learning algorithm for single hidden layer feedforward networks (SLFNs. It was originally proposed with the inspiration from biological learning and has attracted massive attentions due to its adaptability to various tasks with a fast learning ability and efficient computation cost. As an effective sparse representation method, orthogonal matching pursuit (OMP method can be embedded into ELM to overcome the singularity problem and improve the stability. Usually OMP recovers a sparse vector by minimizing a least squares (LS loss, which is efficient for Gaussian distributed data, but may suffer performance deterioration in presence of non-Gaussian data. To address this problem, a robust matching pursuit method based on a novel kernel risk-sensitive loss (in short KRSLMP is first proposed in this paper. The KRSLMP is then applied to ELM to solve the sparse output weight vector, and the new method named the KRSLMP-ELM is developed for SLFN learning. Experimental results on synthetic and real-world data sets confirm the effectiveness and superiority of the proposed method.

  19. Machine learning of molecular properties: Locality and active learning

    Science.gov (United States)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  20. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.

    Science.gov (United States)

    Ain, Qurrat Ul; Aleksandrova, Antoniya; Roessler, Florian D; Ballester, Pedro J

    2015-01-01

    Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.

  1. Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

    2018-03-01

    Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p breast cancer detected in the next subsequent mammography screening.

  2. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    Science.gov (United States)

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  3. Combining Formal Logic and Machine Learning for Sentiment Analysis

    DEFF Research Database (Denmark)

    Petersen, Niklas Christoffer; Villadsen, Jørgen

    2014-01-01

    This paper presents a formal logical method for deep structural analysis of the syntactical properties of texts using machine learning techniques for efficient syntactical tagging. To evaluate the method it is used for entity level sentiment analysis as an alternative to pure machine learning...

  4. Gaussian processes for machine learning.

    Science.gov (United States)

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

  5. Bayesian and Classical Machine Learning Methods: A Comparison for Tree Species Classification with LiDAR Waveform Signatures

    Directory of Open Access Journals (Sweden)

    Tan Zhou

    2017-12-01

    Full Text Available A plethora of information contained in full-waveform (FW Light Detection and Ranging (LiDAR data offers prospects for characterizing vegetation structures. This study aims to investigate the capacity of FW LiDAR data alone for tree species identification through the integration of waveform metrics with machine learning methods and Bayesian inference. Specifically, we first conducted automatic tree segmentation based on the waveform-based canopy height model (CHM using three approaches including TreeVaW, watershed algorithms and the combination of TreeVaW and watershed (TW algorithms. Subsequently, the Random forests (RF and Conditional inference forests (CF models were employed to identify important tree-level waveform metrics derived from three distinct sources, such as raw waveforms, composite waveforms, the waveform-based point cloud and the combined variables from these three sources. Further, we discriminated tree (gray pine, blue oak, interior live oak and shrub species through the RF, CF and Bayesian multinomial logistic regression (BMLR using important waveform metrics identified in this study. Results of the tree segmentation demonstrated that the TW algorithms outperformed other algorithms for delineating individual tree crowns. The CF model overcomes waveform metrics selection bias caused by the RF model which favors correlated metrics and enhances the accuracy of subsequent classification. We also found that composite waveforms are more informative than raw waveforms and waveform-based point cloud for characterizing tree species in our study area. Both classical machine learning methods (the RF and CF and the BMLR generated satisfactory average overall accuracy (74% for the RF, 77% for the CF and 81% for the BMLR and the BMLR slightly outperformed the other two methods. However, these three methods suffered from low individual classification accuracy for the blue oak which is prone to being misclassified as the interior live oak due

  6. Applications of Machine Learning in Cancer Prediction and Prognosis

    Directory of Open Access Journals (Sweden)

    Joseph A. Cruz

    2006-01-01

    Full Text Available Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25% improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  7. Investigating the impact of a LEGO(TM)-based, engineering-oriented curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines

    Science.gov (United States)

    Marulcu, Ismail

    This mixed method study examined the impact of a LEGO-based, engineering-oriented curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines. This study takes a social constructivist theoretical stance that science learning involves learning scientific concepts and their relations to each other. From this perspective, students are active participants, and they construct their conceptual understanding through the guidance of their teacher. With the goal of better understanding the use of engineering education materials in classrooms the National Academy of Engineering and National Research Council in the book "Engineering in K-12 Education" conducted an in-depth review of the potential benefits of including engineering in K--12 schools as (a) improved learning and achievement in science and mathematics, (b) increased awareness of engineering and the work of engineers, (c) understanding of and the ability to engage in engineering design, (d) interest in pursuing engineering as a career, and (e) increased technological literacy (Katehi, Pearson, & Feder, 2009). However, they also noted a lack of reliable data and rigorous research to support these assertions. Data sources included identical written tests and interviews, classroom observations and videos, teacher interviews, and classroom artifacts. To investigate the impact of the design-based simple machines curriculum compared to the scientific inquiry-based simple machines curriculum on student learning outcomes, I compared the control and the experimental groups' scores on the tests and interviews by using ANCOVA. To analyze and characterize the classroom observation videotapes, I used Jordan and Henderson's (1995) method and divide them into episodes. My analyses revealed that the design-based Design a People Mover: Simple Machines unit was, if not better, as successful as the inquiry-based FOSS Levers and Pulleys unit in terms of students' content learning. I also

  8. Outsmarting neural networks: an alternative paradigm for machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  9. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  10. A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.

    Science.gov (United States)

    Habibi, Narjeskhatoon; Mohd Hashim, Siti Z; Norouzi, Alireza; Samian, Mohammed Razip

    2014-05-08

    Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.

  11. Online learning control using adaptive critic designs with sparse kernel machines.

    Science.gov (United States)

    Xu, Xin; Hou, Zhongsheng; Lian, Chuanqiang; He, Haibo

    2013-05-01

    In the past decade, adaptive critic designs (ACDs), including heuristic dynamic programming (HDP), dual heuristic programming (DHP), and their action-dependent ones, have been widely studied to realize online learning control of dynamical systems. However, because neural networks with manually designed features are commonly used to deal with continuous state and action spaces, the generalization capability and learning efficiency of previous ACDs still need to be improved. In this paper, a novel framework of ACDs with sparse kernel machines is presented by integrating kernel methods into the critic of ACDs. To improve the generalization capability as well as the computational efficiency of kernel machines, a sparsification method based on the approximately linear dependence analysis is used. Using the sparse kernel machines, two kernel-based ACD algorithms, that is, kernel HDP (KHDP) and kernel DHP (KDHP), are proposed and their performance is analyzed both theoretically and empirically. Because of the representation learning and generalization capability of sparse kernel machines, KHDP and KDHP can obtain much better performance than previous HDP and DHP with manually designed neural networks. Simulation and experimental results of two nonlinear control problems, that is, a continuous-action inverted pendulum problem and a ball and plate control problem, demonstrate the effectiveness of the proposed kernel ACD methods.

  12. An improved segmentation-based HMM learning method for Condition-based Maintenance

    International Nuclear Information System (INIS)

    Liu, T; Lemeire, J; Cartella, F; Meganck, S

    2012-01-01

    In the domain of condition-based maintenance (CBM), persistence of machine states is a valid assumption. Based on this assumption, we present an improved Hidden Markov Model (HMM) learning algorithm for the assessment of equipment states. By a good estimation of initial parameters, more accurate learning can be achieved than by regular HMM learning methods which start with randomly chosen initial parameters. It is also better in avoiding getting trapped in local maxima. The data is segmented with a change-point analysis method which uses a combination of cumulative sum charts (CUSUM) and bootstrapping techniques. The method determines a confidence level that a state change happens. After the data is segmented, in order to label and combine the segments corresponding to the same states, a clustering technique is used based on a low-pass filter or root mean square (RMS) values of the features. The segments with their labelled hidden state are taken as 'evidence' to estimate the parameters of an HMM. Then, the estimated parameters are served as initial parameters for the traditional Baum-Welch (BW) learning algorithms, which are used to improve the parameters and train the model. Experiments on simulated and real data demonstrate that both performance and convergence speed is improved.

  13. An Improved Bacterial-Foraging Optimization-Based Machine Learning Framework for Predicting the Severity of Somatization Disorder

    Directory of Open Access Journals (Sweden)

    Xinen Lv

    2018-02-01

    Full Text Available It is of great clinical significance to establish an accurate intelligent model to diagnose the somatization disorder of community correctional personnel. In this study, a novel machine learning framework is proposed to predict the severity of somatization disorder in community correction personnel. The core of this framework is to adopt the improved bacterial foraging optimization (IBFO to optimize two key parameters (penalty coefficient and the kernel width of a kernel extreme learning machine (KELM and build an IBFO-based KELM (IBFO-KELM for the diagnosis of somatization disorder patients. The main innovation point of the IBFO-KELM model is the introduction of opposition-based learning strategies in traditional bacteria foraging optimization, which increases the diversity of bacterial species, keeps a uniform distribution of individuals of initial population, and improves the convergence rate of the BFO optimization process as well as the probability of escaping from the local optimal solution. In order to verify the effectiveness of the method proposed in this study, a 10-fold cross-validation method based on data from a symptom self-assessment scale (SCL-90 is used to make comparison among IBFO-KELM, BFO-KELM (model based on the original bacterial foraging optimization model, GA-KELM (model based on genetic algorithm, PSO-KELM (model based on particle swarm optimization algorithm and Grid-KELM (model based on grid search method. The experimental results show that the proposed IBFO-KELM prediction model has better performance than other methods in terms of classification accuracy, Matthews correlation coefficient (MCC, sensitivity and specificity. It can distinguish very well between severe somatization disorder and mild somatization and assist the psychological doctor with clinical diagnosis.

  14. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data.

    Science.gov (United States)

    Pesesky, Mitchell W; Hussain, Tahir; Wallace, Meghan; Patel, Sanket; Andleeb, Saadia; Burnham, Carey-Ann D; Dantas, Gautam

    2016-01-01

    The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0 and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance factors and

  15. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data

    Directory of Open Access Journals (Sweden)

    Mitchell Pesesky

    2016-11-01

    Full Text Available The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitate initial use of empiric (frequently broad-spectrum antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0% and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance

  16. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    Science.gov (United States)

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  17. Predictive Power of Machine Learning for Optimizing Solar Water Heater Performance: The Potential Application of High-Throughput Screening

    Directory of Open Access Journals (Sweden)

    Hao Li

    2017-01-01

    Full Text Available Predicting the performance of solar water heater (SWH is challenging due to the complexity of the system. Fortunately, knowledge-based machine learning can provide a fast and precise prediction method for SWH performance. With the predictive power of machine learning models, we can further solve a more challenging question: how to cost-effectively design a high-performance SWH? Here, we summarize our recent studies and propose a general framework of SWH design using a machine learning-based high-throughput screening (HTS method. Design of water-in-glass evacuated tube solar water heater (WGET-SWH is selected as a case study to show the potential application of machine learning-based HTS to the design and optimization of solar energy systems.

  18. rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning

    Directory of Open Access Journals (Sweden)

    Miron B. Kursa

    2014-11-01

    Full Text Available Random ferns is a very simple yet powerful classification method originally introduced for specific computer vision tasks. In this paper, I show that this algorithm may be considered as a constrained decision tree ensemble and use this interpretation to introduce a series of modifications which enable the use of random ferns in general machine learning problems. Moreover, I extend the method with an internal error approximation and an attribute importance measure based on corresponding features of the random forest algorithm. I also present the R package rFerns containing an efficient implementation of this modified version of random ferns.

  19. Exploring machine-learning-based control plane intrusion detection techniques in software defined optical networks

    Science.gov (United States)

    Zhang, Huibin; Wang, Yuqiao; Chen, Haoran; Zhao, Yongli; Zhang, Jie

    2017-12-01

    In software defined optical networks (SDON), the centralized control plane may encounter numerous intrusion threatens which compromise the security level of provisioned services. In this paper, the issue of control plane security is studied and two machine-learning-based control plane intrusion detection techniques are proposed for SDON with properly selected features such as bandwidth, route length, etc. We validate the feasibility and efficiency of the proposed techniques by simulations. Results show an accuracy of 83% for intrusion detection can be achieved with the proposed machine-learning-based control plane intrusion detection techniques.

  20. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  1. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  2. A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling.

    Science.gov (United States)

    Leger, Stefan; Zwanenburg, Alex; Pilz, Karoline; Lohaus, Fabian; Linge, Annett; Zöphel, Klaus; Kotzerke, Jörg; Schreiber, Andreas; Tinhofer, Inge; Budach, Volker; Sak, Ali; Stuschke, Martin; Balermpas, Panagiotis; Rödel, Claus; Ganswindt, Ute; Belka, Claus; Pigorsch, Steffi; Combs, Stephanie E; Mönnich, David; Zips, Daniel; Krause, Mechthild; Baumann, Michael; Troost, Esther G C; Löck, Steffen; Richter, Christian

    2017-10-16

    Radiomics applies machine learning algorithms to quantitative imaging data to characterise the tumour phenotype and predict clinical outcome. For the development of radiomics risk models, a variety of different algorithms is available and it is not clear which one gives optimal results. Therefore, we assessed the performance of 11 machine learning algorithms combined with 12 feature selection methods by the concordance index (C-Index), to predict loco-regional tumour control (LRC) and overall survival for patients with head and neck squamous cell carcinoma. The considered algorithms are able to deal with continuous time-to-event survival data. Feature selection and model building were performed on a multicentre cohort (213 patients) and validated using an independent cohort (80 patients). We found several combinations of machine learning algorithms and feature selection methods which achieve similar results, e.g. C-Index = 0.71 and BT-COX: C-Index = 0.70 in combination with Spearman feature selection. Using the best performing models, patients were stratified into groups of low and high risk of recurrence. Significant differences in LRC were obtained between both groups on the validation cohort. Based on the presented analysis, we identified a subset of algorithms which should be considered in future radiomics studies to develop stable and clinically relevant predictive models for time-to-event endpoints.

  3. Machine learning methods for the classification of gliomas: Initial results using features extracted from MR spectroscopy.

    Science.gov (United States)

    Ranjith, G; Parvathy, R; Vikas, V; Chandrasekharan, Kesavadas; Nair, Suresh

    2015-04-01

    With the advent of new imaging modalities, radiologists are faced with handling increasing volumes of data for diagnosis and treatment planning. The use of automated and intelligent systems is becoming essential in such a scenario. Machine learning, a branch of artificial intelligence, is increasingly being used in medical image analysis applications such as image segmentation, registration and computer-aided diagnosis and detection. Histopathological analysis is currently the gold standard for classification of brain tumors. The use of machine learning algorithms along with extraction of relevant features from magnetic resonance imaging (MRI) holds promise of replacing conventional invasive methods of tumor classification. The aim of the study is to classify gliomas into benign and malignant types using MRI data. Retrospective data from 28 patients who were diagnosed with glioma were used for the analysis. WHO Grade II (low-grade astrocytoma) was classified as benign while Grade III (anaplastic astrocytoma) and Grade IV (glioblastoma multiforme) were classified as malignant. Features were extracted from MR spectroscopy. The classification was done using four machine learning algorithms: multilayer perceptrons, support vector machine, random forest and locally weighted learning. Three of the four machine learning algorithms gave an area under ROC curve in excess of 0.80. Random forest gave the best performance in terms of AUC (0.911) while sensitivity was best for locally weighted learning (86.1%). The performance of different machine learning algorithms in the classification of gliomas is promising. An even better performance may be expected by integrating features extracted from other MR sequences. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  4. Solving a Higgs optimization problem with quantum annealing for machine learning.

    Science.gov (United States)

    Mott, Alex; Job, Joshua; Vlimant, Jean-Roch; Lidar, Daniel; Spiropulu, Maria

    2017-10-18

    The discovery of Higgs-boson decays in a background of standard-model processes was assisted by machine learning methods. The classifiers used to separate signals such as these from background are trained using highly unerring but not completely perfect simulations of the physical processes involved, often resulting in incorrect labelling of background processes or signals (label noise) and systematic errors. Here we use quantum and classical annealing (probabilistic techniques for approximating the global maximum or minimum of a given function) to solve a Higgs-signal-versus-background machine learning optimization problem, mapped to a problem of finding the ground state of a corresponding Ising spin model. We build a set of weak classifiers based on the kinematic observables of the Higgs decay photons, which we then use to construct a strong classifier. This strong classifier is highly resilient against overtraining and against errors in the correlations of the physical observables in the training data. We show that the resulting quantum and classical annealing-based classifier systems perform comparably to the state-of-the-art machine learning methods that are currently used in particle physics. However, in contrast to these methods, the annealing-based classifiers are simple functions of directly interpretable experimental parameters with clear physical meaning. The annealer-trained classifiers use the excited states in the vicinity of the ground state and demonstrate some advantage over traditional machine learning methods for small training datasets. Given the relative simplicity of the algorithm and its robustness to error, this technique may find application in other areas of experimental particle physics, such as real-time decision making in event-selection problems and classification in neutrino physics.

  5. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  6. A planning quality evaluation tool for prostate adaptive IMRT based on machine learning

    International Nuclear Information System (INIS)

    Zhu Xiaofeng; Ge Yaorong; Li Taoran; Thongphiew, Danthai; Yin Fangfang; Wu, Q Jackie

    2011-01-01

    Purpose: To ensure plan quality for adaptive IMRT of the prostate, we developed a quantitative evaluation tool using a machine learning approach. This tool generates dose volume histograms (DVHs) of organs-at-risk (OARs) based on prior plans as a reference, to be compared with the adaptive plan derived from fluence map deformation. Methods: Under the same configuration using seven-field 15 MV photon beams, DVHs of OARs (bladder and rectum) were estimated based on anatomical information of the patient and a model learned from a database of high quality prior plans. In this study, the anatomical information was characterized by the organ volumes and distance-to-target histogram (DTH). The database consists of 198 high quality prostate plans and was validated with 14 cases outside the training pool. Principal component analysis (PCA) was applied to DVHs and DTHs to quantify their salient features. Then, support vector regression (SVR) was implemented to establish the correlation between the features of the DVH and the anatomical information. Results: DVH/DTH curves could be characterized sufficiently just using only two or three truncated principal components, thus, patient anatomical information was quantified with reduced numbers of variables. The evaluation of the model using the test data set demonstrated its accuracy ∼80% in prediction and effectiveness in improving ART planning quality. Conclusions: An adaptive IMRT plan quality evaluation tool based on machine learning has been developed, which estimates OAR sparing and provides reference in evaluating ART.

  7. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  8. Machine-Learning Algorithms to Code Public Health Spending Accounts.

    Science.gov (United States)

    Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

    Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

  9. Machine learning of the reactor core loading pattern critical parameters

    International Nuclear Information System (INIS)

    Trontl, K.; Pevec, D.; Smuc, T.

    2007-01-01

    The usual approach to loading pattern optimization involves high degree of engineering judgment, a set of heuristic rules, an optimization algorithm and a computer code used for evaluating proposed loading patterns. The speed of the optimization process is highly dependent on the computer code used for the evaluation. In this paper we investigate the applicability of a machine learning model which could be used for fast loading pattern evaluation. We employed a recently introduced machine learning technique, Support Vector Regression (SVR), which has a strong theoretical background in statistical learning theory. Superior empirical performance of the method has been reported on difficult regression problems in different fields of science and technology. SVR is a data driven, kernel based, nonlinear modelling paradigm, in which model parameters are automatically determined by solving a quadratic optimization problem. The main objective of the work reported in this paper was to evaluate the possibility of applying SVR method for reactor core loading pattern modelling. The starting set of experimental data for training and testing of the machine learning algorithm was obtained using a two-dimensional diffusion theory reactor physics computer code. We illustrate the performance of the solution and discuss its applicability, i.e., complexity, speed and accuracy, with a projection to a more realistic scenario involving machine learning from the results of more accurate and time consuming three-dimensional core modelling code. (author)

  10. Intelligent fault diagnosis of photovoltaic arrays based on optimized kernel extreme learning machine and I-V characteristics

    International Nuclear Information System (INIS)

    Chen, Zhicong; Wu, Lijun; Cheng, Shuying; Lin, Peijie; Wu, Yue; Lin, Wencheng

    2017-01-01

    Highlights: •An improved Simulink based modeling method is proposed for PV modules and arrays. •Key points of I-V curves and PV model parameters are used as the feature variables. •Kernel extreme learning machine (KELM) is explored for PV arrays fault diagnosis. •The parameters of KELM algorithm are optimized by the Nelder-Mead simplex method. •The optimized KELM fault diagnosis model achieves high accuracy and reliability. -- Abstract: Fault diagnosis of photovoltaic (PV) arrays is important for improving the reliability, efficiency and safety of PV power stations, because the PV arrays usually operate in harsh outdoor environment and tend to suffer various faults. Due to the nonlinear output characteristics and varying operating environment of PV arrays, many machine learning based fault diagnosis methods have been proposed. However, there still exist some issues: fault diagnosis performance is still limited due to insufficient monitored information; fault diagnosis models are not efficient to be trained and updated; labeled fault data samples are hard to obtain by field experiments. To address these issues, this paper makes contribution in the following three aspects: (1) based on the key points and model parameters extracted from monitored I-V characteristic curves and environment condition, an effective and efficient feature vector of seven dimensions is proposed as the input of the fault diagnosis model; (2) the emerging kernel based extreme learning machine (KELM), which features extremely fast learning speed and good generalization performance, is utilized to automatically establish the fault diagnosis model. Moreover, the Nelder-Mead Simplex (NMS) optimization method is employed to optimize the KELM parameters which affect the classification performance; (3) an improved accurate Simulink based PV modeling approach is proposed for a laboratory PV array to facilitate the fault simulation and data sample acquisition. Intensive fault experiments are

  11. Quasilinear Extreme Learning Machine Model Based Internal Model Control for Nonlinear Process

    Directory of Open Access Journals (Sweden)

    Dazi Li

    2015-01-01

    Full Text Available A new strategy for internal model control (IMC is proposed using a regression algorithm of quasilinear model with extreme learning machine (QL-ELM. Aimed at the chemical process with nonlinearity, the learning process of the internal model and inverse model is derived. The proposed QL-ELM is constructed as a linear ARX model with a complicated nonlinear coefficient. It shows some good approximation ability and fast convergence. The complicated coefficients are separated into two parts. The linear part is determined by recursive least square (RLS, while the nonlinear part is identified through extreme learning machine. The parameters of linear part and the output weights of ELM are estimated iteratively. The proposed internal model control is applied to CSTR process. The effectiveness and accuracy of the proposed method are extensively verified through numerical results.

  12. Statistical and machine learning approaches for network analysis

    CERN Document Server

    Dehmer, Matthias

    2012-01-01

    Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internation

  13. Machine Learning of Fault Friction

    Science.gov (United States)

    Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.

    2017-12-01

    We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025

  14. A Machine Learning-based Rainfall System for GPM Dual-frequency Radar

    Science.gov (United States)

    Tan, H.; Chandrasekar, V.; Chen, H.

    2017-12-01

    Precipitation measurement produced by the Global Precipitation Measurement (GPM) Dual-frequency Precipitation Radar (DPR) plays an important role in researching the water circle and forecasting extreme weather event. Compare with its predecessor - Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR), GRM DPR measures precipitation in two different frequencies (i.e., Ku and Ka band), which can provide detailed information on the microphysical properties of precipitation particles, quantify particle size distribution and quantitatively measure light rain and falling snow. This paper presents a novel Machine Learning system for ground-based and space borne radar rainfall estimation. The system first trains ground radar data for rainfall estimation using rainfall measurements from gauges and subsequently uses the ground radar based rainfall estimates to train GPM DPR data in order to get space based rainfall product. Therein, data alignment between space DPR and ground radar is conducted using the methodology proposed by Bolen and Chandrasekar (2013), which can minimize the effects of potential geometric distortion of GPM DPR observations. For demonstration purposes, rainfall measurements from three rain gauge networks near Melbourne, Florida, are used for training and validation purposes. These three gauge networks, which are located in Kennedy Space Center (KSC), South Florida Water Management District (SFL), and St. Johns Water Management District (STJ), include 33, 46, and 99 rain gauge stations, respectively. Collocated ground radar observations from the National Weather Service (NWS) Weather Surveillance Radar - 1988 Doppler (WSR-88D) in Melbourne (i.e., KMLB radar) are trained with the gauge measurements. The trained model is then used to derive KMLB radar based rainfall product, which is used to train GPM DPR data collected from coincident overpasses events. The machine learning based rainfall product is compared against the GPM standard products

  15. Sparse Machine Learning Methods for Understanding Large Text Corpora

    Data.gov (United States)

    National Aeronautics and Space Administration — Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational...

  16. Two-Dimensional Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Bo Jia

    2015-01-01

    (BP networks. However, like many other methods, ELM is originally proposed to handle vector pattern while nonvector patterns in real applications need to be explored, such as image data. We propose the two-dimensional extreme learning machine (2DELM based on the very natural idea to deal with matrix data directly. Unlike original ELM which handles vectors, 2DELM take the matrices as input features without vectorization. Empirical studies on several real image datasets show the efficiency and effectiveness of the algorithm.

  17. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  18. Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches

    National Research Council Canada - National Science Library

    Qi, Yuan

    2000-01-01

    In this thesis, we propose two new machine learning schemes, a subband-based Independent Component Analysis scheme and a hybrid Independent Component Analysis/Support Vector Machine scheme, and apply...

  19. A review on machine learning principles for multi-view biological data integration.

    Science.gov (United States)

    Li, Yifeng; Wu, Fang-Xiang; Ngom, Alioune

    2018-03-01

    Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

  20. Machine-based mapping of innovation portfolios

    NARCIS (Netherlands)

    de Visser, Matthias; Miao, Shengfa; Englebienne, Gwenn; Sools, Anna Maria; Visscher, Klaasjan

    2017-01-01

    Machine learning techniques show a great promise for improving innovation portfolio management. In this paper we experiment with different methods to classify innovation projects of a high-tech firm as either explorative or exploitative, and compare the results with a manual, theory-based mapping of

  1. Support vector machine learning-based fMRI data group analysis.

    Science.gov (United States)

    Wang, Ze; Childress, Anna R; Wang, Jiongjiong; Detre, John A

    2007-07-15

    To explore the multivariate nature of fMRI data and to consider the inter-subject brain response discrepancies, a multivariate and brain response model-free method is fundamentally required. Two such methods are presented in this paper by integrating a machine learning algorithm, the support vector machine (SVM), and the random effect model. Without any brain response modeling, SVM was used to extract a whole brain spatial discriminance map (SDM), representing the brain response difference between the contrasted experimental conditions. Population inference was then obtained through the random effect analysis (RFX) or permutation testing (PMU) on the individual subjects' SDMs. Applied to arterial spin labeling (ASL) perfusion fMRI data, SDM RFX yielded lower false-positive rates in the null hypothesis test and higher detection sensitivity for synthetic activations with varying cluster size and activation strengths, compared to the univariate general linear model (GLM)-based RFX. For a sensory-motor ASL fMRI study, both SDM RFX and SDM PMU yielded similar activation patterns to GLM RFX and GLM PMU, respectively, but with higher t values and cluster extensions at the same significance level. Capitalizing on the absence of temporal noise correlation in ASL data, this study also incorporated PMU in the individual-level GLM and SVM analyses accompanied by group-level analysis through RFX or group-level PMU. Providing inferences on the probability of being activated or deactivated at each voxel, these individual-level PMU-based group analysis methods can be used to threshold the analysis results of GLM RFX, SDM RFX or SDM PMU.

  2. Machine learning modelling for predicting soil liquefaction susceptibility

    Directory of Open Access Journals (Sweden)

    P. Samui

    2011-01-01

    Full Text Available This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN based on multi-layer perceptions (MLP that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N160] and cyclic stress ratio (CSR. Further, an attempt has been made to simplify the models, requiring only the two parameters [(N160 and peck ground acceleration (amax/g], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  3. 2nd Machine Learning School for High Energy Physics

    CERN Document Server

    2016-01-01

    The Second Machine Learning summer school organized by Yandex School of Data Analysis and Laboratory of Methods for Big Data Analysis of National Research University Higher School of Economics will be held in Lund, Sweden from 20 to 26 June 2016. It is hosted by Lund University. The school is intended to cover the relatively young area of data analysis and computational research that has started to emerge in High Energy Physics (HEP). It is known by several names including “Multivariate Analysis”, “Neural Networks”, “Classification/Clusterization techniques”. In more generic terms, these techniques belong to the field of “Machine Learning”, which is an area that is based on research performed in Statistics and has received a lot of attention from the Data Science community. There are plenty of essential problems in High energy Physics that can be solved using Machine Learning methods. These vary from online data filtering and reconstruction to offline data analysis. Students of the school w...

  4. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  5. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    International Nuclear Information System (INIS)

    Kanemoto, Shigeru; Watanabe, Masaya; Yusa, Noritaka

    2014-01-01

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology

  6. Using machine learning to accelerate sampling-based inversion

    Science.gov (United States)

    Valentine, A. P.; Sambridge, M.

    2017-12-01

    In most cases, a complete solution to a geophysical inverse problem (including robust understanding of the uncertainties associated with the result) requires a sampling-based approach. However, the computational burden is high, and proves intractable for many problems of interest. There is therefore considerable value in developing techniques that can accelerate sampling procedures.The main computational cost lies in evaluation of the forward operator (e.g. calculation of synthetic seismograms) for each candidate model. Modern machine learning techniques-such as Gaussian Processes-offer a route for constructing a computationally-cheap approximation to this calculation, which can replace the accurate solution during sampling. Importantly, the accuracy of the approximation can be refined as inversion proceeds, to ensure high-quality results.In this presentation, we describe and demonstrate this approach-which can be seen as an extension of popular current methods, such as the Neighbourhood Algorithm, and bridges the gap between prior- and posterior-sampling frameworks.

  7. A Machine Learning Approach to Test Data Generation

    DEFF Research Database (Denmark)

    Christiansen, Henning; Dahmcke, Christina Mackeprang

    2007-01-01

    been tested, and a more thorough statistical foundation is required. We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated with the generation of new, artificial data. More specifically, we suggest to use the PRISM...... system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrate these principles here...

  8. An ensemble machine learning approach to predict survival in breast cancer.

    Science.gov (United States)

    Djebbari, Amira; Liu, Ziying; Phan, Sieu; Famili, Fazel

    2008-01-01

    Current breast cancer predictive signatures are not unique. Can we use this fact to our advantage to improve prediction? From the machine learning perspective, it is well known that combining multiple classifiers can improve classification performance. We propose an ensemble machine learning approach which consists of choosing feature subsets and learning predictive models from them. We then combine models based on certain model fusion criteria and we also introduce a tuning parameter to control sensitivity. Our method significantly improves classification performance with a particular emphasis on sensitivity which is critical to avoid misclassifying poor prognosis patients as good prognosis.

  9. Artificial Mangrove Species Mapping Using Pléiades-1: An Evaluation of Pixel-Based and Object-Based Classifications with Selected Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Dezhi Wang

    2018-02-01

    Full Text Available In the dwindling natural mangrove today, mangrove reforestation projects are conducted worldwide to prevent further losses. Due to monoculture and the low survival rate of artificial mangroves, it is necessary to pay attention to mapping and monitoring them dynamically. Remote sensing techniques have been widely used to map mangrove forests due to their capacity for large-scale, accurate, efficient, and repetitive monitoring. This study evaluated the capability of a 0.5-m Pléiades-1 in classifying artificial mangrove species using both pixel-based and object-based classification schemes. For comparison, three machine learning algorithms—decision tree (DT, support vector machine (SVM, and random forest (RF—were used as the classifiers in the pixel-based and object-based classification procedure. The results showed that both the pixel-based and object-based approaches could recognize the major discriminations between the four major artificial mangrove species. However, the object-based method had a better overall accuracy than the pixel-based method on average. For pixel-based image analysis, SVM produced the highest overall accuracy (79.63%; for object-based image analysis, RF could achieve the highest overall accuracy (82.40%, and it was also the best machine learning algorithm for classifying artificial mangroves. The patches produced by object-based image analysis approaches presented a more generalized appearance and could contiguously depict mangrove species communities. When the same machine learning algorithms were compared by McNemar’s test, a statistically significant difference in overall classification accuracy between the pixel-based and object-based classifications only existed in the RF algorithm. Regarding species, monoculture and dominant mangrove species Sonneratia apetala group 1 (SA1 as well as partly mixed and regular shape mangrove species Hibiscus tiliaceus (HT could well be identified. However, for complex and easily

  10. Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts

    OpenAIRE

    Jonsson, Leif; Borg, Markus; Broman, David; Sandahl, Kristian; Eldh, Sigrid; Runeson, Per

    2016-01-01

    Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learni...

  11. Student Modeling and Machine Learning

    OpenAIRE

    Sison , Raymund; Shimura , Masamichi

    1998-01-01

    After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine le...

  12. Automatic microseismic event picking via unsupervised machine learning

    Science.gov (United States)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  13. Machine learning approaches: from theory to application in schizophrenia.

    Science.gov (United States)

    Veronese, Elisa; Castellani, Umberto; Peruzzo, Denis; Bellani, Marcella; Brambilla, Paolo

    2013-01-01

    In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice.

  14. Machine Learning Approaches: From Theory to Application in Schizophrenia

    Directory of Open Access Journals (Sweden)

    Elisa Veronese

    2013-01-01

    Full Text Available In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice.

  15. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    Science.gov (United States)

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-12-01

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  16. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

    Science.gov (United States)

    Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao

    2017-11-01

    Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Automatic Detection of Acromegaly From Facial Photographs Using Machine Learning Methods.

    Science.gov (United States)

    Kong, Xiangyi; Gong, Shun; Su, Lijuan; Howard, Newton; Kong, Yanguo

    2018-01-01

    Automatic early detection of acromegaly is theoretically possible from facial photographs, which can lessen the prevalence and increase the cure probability. In this study, several popular machine learning algorithms were used to train a retrospective development dataset consisting of 527 acromegaly patients and 596 normal subjects. We firstly used OpenCV to detect the face bounding rectangle box, and then cropped and resized it to the same pixel dimensions. From the detected faces, locations of facial landmarks which were the potential clinical indicators were extracted. Frontalization was then adopted to synthesize frontal facing views to improve the performance. Several popular machine learning methods including LM, KNN, SVM, RT, CNN, and EM were used to automatically identify acromegaly from the detected facial photographs, extracted facial landmarks, and synthesized frontal faces. The trained models were evaluated using a separate dataset, of which half were diagnosed as acromegaly by growth hormone suppression test. The best result of our proposed methods showed a PPV of 96%, a NPV of 95%, a sensitivity of 96% and a specificity of 96%. Artificial intelligence can automatically early detect acromegaly with a high sensitivity and specificity. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  18. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  19. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

    Science.gov (United States)

    Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C

    2017-12-01

    The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised

  20. Creativity in Machine Learning

    OpenAIRE

    Thoma, Martin

    2016-01-01

    Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summary of the current work and give people who are new to machine learning some starting points.

  1. Bidirectional extreme learning machine for regression problem and its learning effectiveness.

    Science.gov (United States)

    Yang, Yimin; Wang, Yaonan; Yuan, Xiaofang

    2012-09-01

    It is clear that the learning effectiveness and learning speed of neural networks are in general far slower than required, which has been a major bottleneck for many applications. Recently, a simple and efficient learning method, referred to as extreme learning machine (ELM), was proposed by Huang , which has shown that, compared to some conventional methods, the training time of neural networks can be reduced by a thousand times. However, one of the open problems in ELM research is whether the number of hidden nodes can be further reduced without affecting learning effectiveness. This brief proposes a new learning algorithm, called bidirectional extreme learning machine (B-ELM), in which some hidden nodes are not randomly selected. In theory, this algorithm tends to reduce network output error to 0 at an extremely early learning stage. Furthermore, we find a relationship between the network output error and the network output weights in the proposed B-ELM. Simulation results demonstrate that the proposed method can be tens to hundreds of times faster than other incremental ELM algorithms.

  2. Contemporary machine learning: techniques for practitioners in the physical sciences

    Science.gov (United States)

    Spears, Brian

    2017-10-01

    Machine learning is the science of using computers to find relationships in data without explicitly knowing or programming those relationships in advance. Often without realizing it, we employ machine learning every day as we use our phones or drive our cars. Over the last few years, machine learning has found increasingly broad application in the physical sciences. This most often involves building a model relationship between a dependent, measurable output and an associated set of controllable, but complicated, independent inputs. The methods are applicable both to experimental observations and to databases of simulated output from large, detailed numerical simulations. In this tutorial, we will present an overview of current tools and techniques in machine learning - a jumping-off point for researchers interested in using machine learning to advance their work. We will discuss supervised learning techniques for modeling complicated functions, beginning with familiar regression schemes, then advancing to more sophisticated decision trees, modern neural networks, and deep learning methods. Next, we will cover unsupervised learning and techniques for reducing the dimensionality of input spaces and for clustering data. We'll show example applications from both magnetic and inertial confinement fusion. Along the way, we will describe methods for practitioners to help ensure that their models generalize from their training data to as-yet-unseen test data. We will finally point out some limitations to modern machine learning and speculate on some ways that practitioners from the physical sciences may be particularly suited to help. This work was performed by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  3. Parsimonious Wavelet Kernel Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Wang Qin

    2015-11-01

    Full Text Available In this study, a parsimonious scheme for wavelet kernel extreme learning machine (named PWKELM was introduced by combining wavelet theory and a parsimonious algorithm into kernel extreme learning machine (KELM. In the wavelet analysis, bases that were localized in time and frequency to represent various signals effectively were used. Wavelet kernel extreme learning machine (WELM maximized its capability to capture the essential features in “frequency-rich” signals. The proposed parsimonious algorithm also incorporated significant wavelet kernel functions via iteration in virtue of Householder matrix, thus producing a sparse solution that eased the computational burden and improved numerical stability. The experimental results achieved from the synthetic dataset and a gas furnace instance demonstrated that the proposed PWKELM is efficient and feasible in terms of improving generalization accuracy and real time performance.

  4. Introduction to Machine Learning: Class Notes 67577

    OpenAIRE

    Shashua, Amnon

    2009-01-01

    Introduction to Machine learning covering Statistical Inference (Bayes, EM, ML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering), and PAC learning (the Formal model, VC dimension, Double Sampling theorem).

  5. A Machine Learning Framework for Plan Payment Risk Adjustment.

    Science.gov (United States)

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  6. Machine Learning for Security

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  7. A machine learning-based framework to identify type 2 diabetes through electronic health records.

    Science.gov (United States)

    Zheng, Tao; Xie, Wei; Xu, Liling; He, Xiaoying; Zhang, Ya; You, Mingrong; Yang, Gong; Chen, You

    2017-01-01

    To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014. We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature

  8. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  9. In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods.

    Science.gov (United States)

    Du, Hanwen; Cai, Yingchun; Yang, Hongbin; Zhang, Hongxiao; Xue, Yuhan; Liu, Guixia; Tang, Yun; Li, Weihua

    2017-05-15

    Environmental chemicals may affect endocrine systems through multiple mechanisms, one of which is via effects on aromatase (also known as CYP19A1), an enzyme critical for maintaining the normal balance of estrogens and androgens in the body. Therefore, rapid and efficient identification of aromatase-related endocrine disrupting chemicals (EDCs) is important for toxicology and environment risk assessment. In this study, on the basis of the Tox21 10K compound library, in silico classification models for predicting aromatase binders/nonbinders were constructed by machine learning methods. To improve the prediction ability of the models, a combined classifier (CC) strategy that combines different independent machine learning methods was adopted. Performances of the models were measured by test and external validation sets containing 1336 and 216 chemicals, respectively. The best model was obtained with the MACCS (Molecular Access System) fingerprint and CC method, which exhibited an accuracy of 0.84 for the test set and 0.91 for the external validation set. Additionally, several representative substructures for characterizing aromatase binders, such as ketone, lactone, and nitrogen-containing derivatives, were identified using information gain and substructure frequency analysis. Our study provided a systematic assessment of chemicals binding to aromatase. The built models can be helpful to rapidly identify potential EDCs targeting aromatase.

  10. Using multivariate machine learning methods and structural MRI to classify childhood onset schizophrenia and healthy controls

    Directory of Open Access Journals (Sweden)

    Deanna eGreenstein

    2012-06-01

    Full Text Available Introduction: Multivariate machine learning methods can be used to classify groups of schizophrenia patients and controls using structural magnetic resonance imaging (MRI. However, machine learning methods to date have not been extended beyond classification and contemporaneously applied in a meaningful way to clinical measures. We hypothesized that brain measures would classify groups, and that increased likelihood of being classified as a patient using regional brain measures would be positively related to illness severity, developmental delays and genetic risk. Methods: Using 74 anatomic brain MRI sub regions and Random Forest, we classified 98 COS patients and 99 age, sex, and ethnicity-matched healthy controls. We also used Random Forest to determine the likelihood of being classified as a schizophrenia patient based on MRI measures. We then explored relationships between brain-based probability of illness and symptoms, premorbid development, and presence of copy number variation associated with schizophrenia. Results: Brain regions jointly classified COS and control groups with 73.7% accuracy. Greater brain-based probability of illness was associated with worse functioning (p= 0.0004 and fewer developmental delays (p=0.02. Presence of copy number variation (CNV was associated with lower probability of being classified as schizophrenia (p=0.001. The regions that were most important in classifying groups included left temporal lobes, bilateral dorsolateral prefrontal regions, and left medial parietal lobes. Conclusions: Schizophrenia and control groups can be well classified using Random Forest and anatomic brain measures, and brain-based probability of illness has a positive relationship with illness severity and a negative relationship with developmental delays/problems and CNV-based risk.

  11. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  12. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Louppe, Gilles; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu

    2012-01-01

    Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings....

  13. Prostate Cancer Probability Prediction By Machine Learning Technique.

    Science.gov (United States)

    Jović, Srđan; Miljković, Milica; Ivanović, Miljan; Šaranović, Milena; Arsić, Milena

    2017-11-26

    The main goal of the study was to explore possibility of prostate cancer prediction by machine learning techniques. In order to improve the survival probability of the prostate cancer patients it is essential to make suitable prediction models of the prostate cancer. If one make relevant prediction of the prostate cancer it is easy to create suitable treatment based on the prediction results. Machine learning techniques are the most common techniques for the creation of the predictive models. Therefore in this study several machine techniques were applied and compared. The obtained results were analyzed and discussed. It was concluded that the machine learning techniques could be used for the relevant prediction of prostate cancer.

  14. Comparative Analysis of Automatic Exudate Detection between Machine Learning and Traditional Approaches

    Science.gov (United States)

    Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas

    To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.

  15. Comparison of four statistical and machine learning methods for crash severity prediction.

    Science.gov (United States)

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Machine Learning methods in fitting first-principles total energies for substitutionally disordered solid

    Science.gov (United States)

    Gao, Qin; Yao, Sanxi; Widom, Michael

    2015-03-01

    Density functional theory (DFT) provides an accurate and first-principles description of solid structures and total energies. However, it is highly time-consuming to calculate structures with hundreds of atoms in the unit cell and almost not possible to calculate thousands of atoms. We apply and adapt machine learning algorithms, including compressive sensing, support vector regression and artificial neural networks to fit the DFT total energies of substitutionally disordered boron carbide. The nonparametric kernel method is also included in our models. Our fitted total energy model reproduces the DFT energies with prediction error of around 1 meV/atom. The assumptions of these machine learning models and applications of the fitted total energies will also be discussed. Financial support from McWilliams Fellowship and the ONR-MURI under the Grant No. N00014-11-1-0678 is gratefully acknowledged.

  17. Markerless gating for lung cancer radiotherapy based on machine learning techniques

    International Nuclear Information System (INIS)

    Lin Tong; Li Ruijiang; Tang Xiaoli; Jiang, Steve B; Dy, Jennifer G

    2009-01-01

    In lung cancer radiotherapy, radiation to a mobile target can be delivered by respiratory gating, for which we need to know whether the target is inside or outside a predefined gating window at any time point during the treatment. This can be achieved by tracking one or more fiducial markers implanted inside or near the target, either fluoroscopically or electromagnetically. However, the clinical implementation of marker tracking is limited for lung cancer radiotherapy mainly due to the risk of pneumothorax. Therefore, gating without implanted fiducial markers is a promising clinical direction. We have developed several template-matching methods for fluoroscopic marker-less gating. Recently, we have modeled the gating problem as a binary pattern classification problem, in which principal component analysis (PCA) and support vector machine (SVM) are combined to perform the classification task. Following the same framework, we investigated different combinations of dimensionality reduction techniques (PCA and four nonlinear manifold learning methods) and two machine learning classification methods (artificial neural networks-ANN and SVM). Performance was evaluated on ten fluoroscopic image sequences of nine lung cancer patients. We found that among all combinations of dimensionality reduction techniques and classification methods, PCA combined with either ANN or SVM achieved a better performance than the other nonlinear manifold learning methods. ANN when combined with PCA achieves a better performance than SVM in terms of classification accuracy and recall rate, although the target coverage is similar for the two classification methods. Furthermore, the running time for both ANN and SVM with PCA is within tolerance for real-time applications. Overall, ANN combined with PCA is a better candidate than other combinations we investigated in this work for real-time gated radiotherapy.

  18. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

    Science.gov (United States)

    Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

    2018-02-01

    For a drug, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  19. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

    Directory of Open Access Journals (Sweden)

    Hongbin Yang

    2018-02-01

    Full Text Available During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  20. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts.

    Science.gov (United States)

    Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

    2018-01-01

    During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.

  1. Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

    Science.gov (United States)

    Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

    2018-02-01

    We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.

  2. Building an asynchronous web-based tool for machine learning classification.

    Science.gov (United States)

    Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila

    2002-01-01

    Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

  3. Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals.

    Science.gov (United States)

    Pereira, Florbela; Xiao, Kaixia; Latino, Diogo A R S; Wu, Chengcheng; Zhang, Qingyou; Aires-de-Sousa, Joao

    2017-01-23

    Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).

  4. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  5. A Machine Learning Method for the Prediction of Receptor Activation in the Simulation of Synapses

    Science.gov (United States)

    Montes, Jesus; Gomez, Elena; Merchán-Pérez, Angel; DeFelipe, Javier; Peña, Jose-Maria

    2013-01-01

    Chemical synaptic transmission involves the release of a neurotransmitter that diffuses in the extracellular space and interacts with specific receptors located on the postsynaptic membrane. Computer simulation approaches provide fundamental tools for exploring various aspects of the synaptic transmission under different conditions. In particular, Monte Carlo methods can track the stochastic movements of neurotransmitter molecules and their interactions with other discrete molecules, the receptors. However, these methods are computationally expensive, even when used with simplified models, preventing their use in large-scale and multi-scale simulations of complex neuronal systems that may involve large numbers of synaptic connections. We have developed a machine-learning based method that can accurately predict relevant aspects of the behavior of synapses, such as the percentage of open synaptic receptors as a function of time since the release of the neurotransmitter, with considerably lower computational cost compared with the conventional Monte Carlo alternative. The method is designed to learn patterns and general principles from a corpus of previously generated Monte Carlo simulations of synapses covering a wide range of structural and functional characteristics. These patterns are later used as a predictive model of the behavior of synapses under different conditions without the need for additional computationally expensive Monte Carlo simulations. This is performed in five stages: data sampling, fold creation, machine learning, validation and curve fitting. The resulting procedure is accurate, automatic, and it is general enough to predict synapse behavior under experimental conditions that are different to the ones it has been trained on. Since our method efficiently reproduces the results that can be obtained with Monte Carlo simulations at a considerably lower computational cost, it is suitable for the simulation of high numbers of synapses and it is

  6. A machine learning method for the prediction of receptor activation in the simulation of synapses.

    Directory of Open Access Journals (Sweden)

    Jesus Montes

    Full Text Available Chemical synaptic transmission involves the release of a neurotransmitter that diffuses in the extracellular space and interacts with specific receptors located on the postsynaptic membrane. Computer simulation approaches provide fundamental tools for exploring various aspects of the synaptic transmission under different conditions. In particular, Monte Carlo methods can track the stochastic movements of neurotransmitter molecules and their interactions with other discrete molecules, the receptors. However, these methods are computationally expensive, even when used with simplified models, preventing their use in large-scale and multi-scale simulations of complex neuronal systems that may involve large numbers of synaptic connections. We have developed a machine-learning based method that can accurately predict relevant aspects of the behavior of synapses, such as the percentage of open synaptic receptors as a function of time since the release of the neurotransmitter, with considerably lower computational cost compared with the conventional Monte Carlo alternative. The method is designed to learn patterns and general principles from a corpus of previously generated Monte Carlo simulations of synapses covering a wide range of structural and functional characteristics. These patterns are later used as a predictive model of the behavior of synapses under different conditions without the need for additional computationally expensive Monte Carlo simulations. This is performed in five stages: data sampling, fold creation, machine learning, validation and curve fitting. The resulting procedure is accurate, automatic, and it is general enough to predict synapse behavior under experimental conditions that are different to the ones it has been trained on. Since our method efficiently reproduces the results that can be obtained with Monte Carlo simulations at a considerably lower computational cost, it is suitable for the simulation of high numbers of

  7. Estimating the complexity of 3D structural models using machine learning methods

    Science.gov (United States)

    Mejía-Herrera, Pablo; Kakurina, Maria; Royer, Jean-Jacques

    2016-04-01

    Quantifying the complexity of 3D geological structural models can play a major role in natural resources exploration surveys, for predicting environmental hazards or for forecasting fossil resources. This paper proposes a structural complexity index which can be used to help in defining the degree of effort necessary to build a 3D model for a given degree of confidence, and also to identify locations where addition efforts are required to meet a given acceptable risk of uncertainty. In this work, it is considered that the structural complexity index can be estimated using machine learning methods on raw geo-data. More precisely, the metrics for measuring the complexity can be approximated as the difficulty degree associated to the prediction of the geological objects distribution calculated based on partial information on the actual structural distribution of materials. The proposed methodology is tested on a set of 3D synthetic structural models for which the degree of effort during their building is assessed using various parameters (such as number of faults, number of part in a surface object, number of borders, ...), the rank of geological elements contained in each model, and, finally, their level of deformation (folding and faulting). The results show how the estimated complexity in a 3D model can be approximated by the quantity of partial data necessaries to simulated at a given precision the actual 3D model without error using machine learning algorithms.

  8. Identifying Structural Flow Defects in Disordered Solids Using Machine-Learning Methods

    Science.gov (United States)

    Cubuk, E. D.; Schoenholz, S. S.; Rieser, J. M.; Malone, B. D.; Rottler, J.; Durian, D. J.; Kaxiras, E.; Liu, A. J.

    2015-03-01

    We use machine-learning methods on local structure to identify flow defects—or particles susceptible to rearrangement—in jammed and glassy systems. We apply this method successfully to two very different systems: a two-dimensional experimental realization of a granular pillar under compression and a Lennard-Jones glass in both two and three dimensions above and below its glass transition temperature. We also identify characteristics of flow defects that differentiate them from the rest of the sample. Our results show it is possible to discern subtle structural features responsible for heterogeneous dynamics observed across a broad range of disordered materials.

  9. An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.

    Science.gov (United States)

    Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein

    2017-12-22

    The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.

  10. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  11. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    Science.gov (United States)

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  12. Web-based newborn screening system for metabolic diseases: machine learning versus clinicians.

    Science.gov (United States)

    Chen, Wei-Hsin; Hsieh, Sheau-Ling; Hsu, Kai-Ping; Chen, Han-Ping; Su, Xing-Yu; Tseng, Yi-Ju; Chien, Yin-Hsiu; Hwu, Wuh-Liang; Lai, Feipei

    2013-05-23

    A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as

  13. A comparison of machine learning and Bayesian modelling for molecular serotyping.

    Science.gov (United States)

    Newton, Richard; Wernisch, Lorenz

    2017-08-11

    Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological

  14. Transportation Mode Detection Based on Permutation Entropy and Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2015-01-01

    Full Text Available With the increasing prevalence of GPS devices and mobile phones, transportation mode detection based on GPS data has been a hot topic in GPS trajectory data analysis. Transportation modes such as walking, driving, bus, and taxi denote an important characteristic of the mobile user. Longitude, latitude, speed, acceleration, and direction are usually used as features in transportation mode detection. In this paper, first, we explore the possibility of using Permutation Entropy (PE of speed, a measure of complexity and uncertainty of GPS trajectory segment, as a feature for transportation mode detection. Second, we employ Extreme Learning Machine (ELM to distinguish GPS trajectory segments of different transportation. Finally, to evaluate the performance of the proposed method, we make experiments on GeoLife dataset. Experiments results show that we can get more than 50% accuracy when only using PE as a feature to characterize trajectory sequence. PE can indeed be effectively used to detect transportation mode from GPS trajectory. The proposed method has much better accuracy and faster running time than the methods based on the other features and SVM classifier.

  15. Acceleration of saddle-point searches with machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, Andrew A., E-mail: andrew-peterson@brown.edu [School of Engineering, Brown University, Providence, Rhode Island 02912 (United States)

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  16. Acceleration of saddle-point searches with machine learning

    International Nuclear Information System (INIS)

    Peterson, Andrew A.

    2016-01-01

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  17. Acceleration of saddle-point searches with machine learning.

    Science.gov (United States)

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  18. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Science.gov (United States)

    Glaab, Enrico; Bacardit, Jaume; Garibaldi, Jonathan M; Krasnogor, Natalio

    2012-01-01

    Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  19. Machine learning strategies for systems with invariance properties

    Science.gov (United States)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  20. Research on Three-dimensional Motion History Image Model and Extreme Learning Machine for Human Body Movement Trajectory Recognition

    Directory of Open Access Journals (Sweden)

    Zheng Chang

    2015-01-01

    Full Text Available Based on the traditional machine vision recognition technology and traditional artificial neural networks about body movement trajectory, this paper finds out the shortcomings of the traditional recognition technology. By combining the invariant moments of the three-dimensional motion history image (computed as the eigenvector of body movements and the extreme learning machine (constructed as the classification artificial neural network of body movements, the paper applies the method to the machine vision of the body movement trajectory. In detail, the paper gives a detailed introduction about the algorithm and realization scheme of the body movement trajectory recognition based on the three-dimensional motion history image and the extreme learning machine. Finally, by comparing with the results of the recognition experiments, it attempts to verify that the method of body movement trajectory recognition technology based on the three-dimensional motion history image and extreme learning machine has a more accurate recognition rate and better robustness.

  1. Discovering mammography-based machine learning classifiers for breast cancer diagnosis.

    Science.gov (United States)

    Ramos-Pollán, Raúl; Guevara-López, Miguel Angel; Suárez-Ortega, Cesar; Díaz-Herrero, Guillermo; Franco-Valiente, Jose Miguel; Rubio-Del-Solar, Manuel; González-de-Posada, Naimy; Vaz, Mario Augusto Pires; Loureiro, Joana; Ramos, Isabel

    2012-08-01

    This work explores the design of mammography-based machine learning classifiers (MLC) and proposes a new method to build MLC for breast cancer diagnosis. We massively evaluated MLC configurations to classify features vectors extracted from segmented regions (pathological lesion or normal tissue) on craniocaudal (CC) and/or mediolateral oblique (MLO) mammography image views, providing BI-RADS diagnosis. Previously, appropriate combinations of image processing and normalization techniques were applied to reduce image artifacts and increase mammograms details. The method can be used under different data acquisition circumstances and exploits computer clusters to select well performing MLC configurations. We evaluated 286 cases extracted from the repository owned by HSJ-FMUP, where specialized radiologists segmented regions on CC and/or MLO images (biopsies provided the golden standard). Around 20,000 MLC configurations were evaluated, obtaining classifiers achieving an area under the ROC curve of 0.996 when combining features vectors extracted from CC and MLO views of the same case.

  2. Geologic Carbon Sequestration Leakage Detection: A Physics-Guided Machine Learning Approach

    Science.gov (United States)

    Lin, Y.; Harp, D. R.; Chen, B.; Pawar, R.

    2017-12-01

    One of the risks of large-scale geologic carbon sequestration is the potential migration of fluids out of the storage formations. Accurate and fast detection of this fluids migration is not only important but also challenging, due to the large subsurface uncertainty and complex governing physics. Traditional leakage detection and monitoring techniques rely on geophysical observations including pressure. However, the resulting accuracy of these methods is limited because of indirect information they provide requiring expert interpretation, therefore yielding in-accurate estimates of leakage rates and locations. In this work, we develop a novel machine-learning technique based on support vector regression to effectively and efficiently predict the leakage locations and leakage rates based on limited number of pressure observations. Compared to the conventional data-driven approaches, which can be usually seem as a "black box" procedure, we develop a physics-guided machine learning method to incorporate the governing physics into the learning procedure. To validate the performance of our proposed leakage detection method, we employ our method to both 2D and 3D synthetic subsurface models. Our novel CO2 leakage detection method has shown high detection accuracy in the example problems.

  3. The ATLAS Higgs machine learning challenge

    CERN Document Server

    Davey, W; The ATLAS collaboration; Rousseau, D; Cowan, G; Kegl, B; Germain-Renaud, C; Guyon, I

    2014-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 90's with Artificial Neural Net for example, more recently with Boosted Decision Trees, Random Forest etc... Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, Data Scientists are developing new Machine Learning algorithms to extract sense from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, data scientists have advanced algorithms: the goal of the HiggsML project is to bring the two together by a “challenge”: participants from all over the world and any scientific background can compete online ( https://www.kaggle.com/c/higgs-boson ) to obtain the best Higgs to tau tau signal significance on a set of ATLAS full simulated Monte Carlo signal and background. Winners with the best scores will receive money prizes ; authors of the best method (most usable) will be invited t...

  4. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma.

    Science.gov (United States)

    Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei

    2018-04-01

    To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.

  5. Employing Machine-Learning Methods to Study Young Stellar Objects

    Science.gov (United States)

    Moore, Nicholas

    2018-01-01

    Vast amounts of data exist in the astronomical data archives, and yet a large number of sources remain unclassified. We developed a multi-wavelength pipeline to classify infrared sources. The pipeline uses supervised machine learning methods to classify objects into the appropriate categories. The program is fed data that is already classified to train it, and is then applied to unknown catalogues. The primary use for such a pipeline is the rapid classification and cataloging of data that would take a much longer time to classify otherwise. While our primary goal is to study young stellar objects (YSOs), the applications extend beyond the scope of this project. We present preliminary results from our analysis and discuss future applications.

  6. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    Science.gov (United States)

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  7. Application of Machine Learning Techniques in Aquaculture

    OpenAIRE

    Rahman, Akhlaqur; Tasnim, Sumaira

    2014-01-01

    In this paper we present applications of different machine learning algorithms in aquaculture. Machine learning algorithms learn models from historical data. In aquaculture historical data are obtained from farm practices, yields, and environmental data sources. Associations between these different variables can be obtained by applying machine learning algorithms to historical data. In this paper we present applications of different machine learning algorithms in aquaculture applications.

  8. Estimation of Alpine Skier Posture Using Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Bojan Nemec

    2014-10-01

    Full Text Available High precision Global Navigation Satellite System (GNSS measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier’s neck. A key issue is how to estimate other more relevant parameters of the skier’s body, like the center of mass (COM and ski trajectories. Previously, these parameters were estimated by modeling the skier’s body with an inverted-pendulum model that oversimplified the skier’s body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier’s body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing.

  9. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  10. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images.

    Science.gov (United States)

    Wang, Hongkai; Zhou, Zongwei; Li, Yingci; Chen, Zhonghua; Lu, Peiou; Wang, Wenzhi; Liu, Wanyu; Yu, Lijuan

    2017-12-01

    This study aimed to compare one state-of-the-art deep learning method and four classical machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer (NSCLC) from 18 F-FDG PET/CT images. Another objective was to compare the discriminative power of the recently popular PET/CT texture features with the widely used diagnostic features such as tumor size, CT value, SUV, image contrast, and intensity standard deviation. The four classical machine learning methods included random forests, support vector machines, adaptive boosting, and artificial neural network. The deep learning method was the convolutional neural networks (CNN). The five methods were evaluated using 1397 lymph nodes collected from PET/CT images of 168 patients, with corresponding pathology analysis results as gold standard. The comparison was conducted using 10 times 10-fold cross-validation based on the criterion of sensitivity, specificity, accuracy (ACC), and area under the ROC curve (AUC). For each classical method, different input features were compared to select the optimal feature set. Based on the optimal feature set, the classical methods were compared with CNN, as well as with human doctors from our institute. For the classical methods, the diagnostic features resulted in 81~85% ACC and 0.87~0.92 AUC, which were significantly higher than the results of texture features. CNN's sensitivity, specificity, ACC, and AUC were 84, 88, 86, and 0.91, respectively. There was no significant difference between the results of CNN and the best classical method. The sensitivity, specificity, and ACC of human doctors were 73, 90, and 82, respectively. All the five machine learning methods had higher sensitivities but lower specificities than human doctors. The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images

  11. Autonomous Scanning Probe Microscopy in Situ Tip Conditioning through Machine Learning.

    Science.gov (United States)

    Rashidi, Mohammad; Wolkow, Robert A

    2018-05-23

    Atomic-scale characterization and manipulation with scanning probe microscopy rely upon the use of an atomically sharp probe. Here we present automated methods based on machine learning to automatically detect and recondition the quality of the probe of a scanning tunneling microscope. As a model system, we employ these techniques on the technologically relevant hydrogen-terminated silicon surface, training the network to recognize abnormalities in the appearance of surface dangling bonds. Of the machine learning methods tested, a convolutional neural network yielded the greatest accuracy, achieving a positive identification of degraded tips in 97% of the test cases. By using multiple points of comparison and majority voting, the accuracy of the method is improved beyond 99%.

  12. Machine learning molecular dynamics for the simulation of infrared spectra.

    Science.gov (United States)

    Gastegger, Michael; Behler, Jörg; Marquetand, Philipp

    2017-10-01

    Machine learning has emerged as an invaluable tool in many research areas. In the present work, we harness this power to predict highly accurate molecular infrared spectra with unprecedented computational efficiency. To account for vibrational anharmonic and dynamical effects - typically neglected by conventional quantum chemistry approaches - we base our machine learning strategy on ab initio molecular dynamics simulations. While these simulations are usually extremely time consuming even for small molecules, we overcome these limitations by leveraging the power of a variety of machine learning techniques, not only accelerating simulations by several orders of magnitude, but also greatly extending the size of systems that can be treated. To this end, we develop a molecular dipole moment model based on environment dependent neural network charges and combine it with the neural network potential approach of Behler and Parrinello. Contrary to the prevalent big data philosophy, we are able to obtain very accurate machine learning models for the prediction of infrared spectra based on only a few hundreds of electronic structure reference points. This is made possible through the use of molecular forces during neural network potential training and the introduction of a fully automated sampling scheme. We demonstrate the power of our machine learning approach by applying it to model the infrared spectra of a methanol molecule, n -alkanes containing up to 200 atoms and the protonated alanine tripeptide, which at the same time represents the first application of machine learning techniques to simulate the dynamics of a peptide. In all of these case studies we find an excellent agreement between the infrared spectra predicted via machine learning models and the respective theoretical and experimental spectra.

  13. TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning

    OpenAIRE

    Tang, Yuan

    2016-01-01

    TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to simplify the process of creating, configuring, training, evaluating, and experimenting a machine learning model. TF.Learn integrates a wide range of state-of-art machine learning algorithms built on top of TensorFlow's low level APIs for small to large-scale supervised and unsupervised problems. This module focuses on bringing machine learning t...

  14. BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

    Directory of Open Access Journals (Sweden)

    F. Pirotti

    2016-06-01

    Full Text Available Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels for testing and validating subsets. The classes used are the following: (i urban (ii sowable areas (iii water (iv tree plantations (v grasslands. Validation is carried out using three different approaches: (i using pixels from the training dataset (train, (ii using pixels from the training dataset and applying cross-validation with the k-fold method (kfold and (iii using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train, the whole control dataset (full and with k-fold cross-validation (kfold with ten folds. Results from validation of predictions of the whole dataset (full show the

  15. An Evaluation of Machine Learning Methods to Detect Malicious SCADA Communications

    Energy Technology Data Exchange (ETDEWEB)

    Beaver, Justin M [ORNL; Borges, Raymond Charles [ORNL; Buckner, Mark A [ORNL

    2013-01-01

    Critical infrastructure Supervisory Control and Data Acquisition (SCADA) systems were designed to operate on closed, proprietary networks where a malicious insider posed the greatest threat potential. The centralization of control and the movement towards open systems and standards has improved the efficiency of industrial control, but has also exposed legacy SCADA systems to security threats that they were not designed to mitigate. This work explores the viability of machine learning methods in detecting the new threat scenarios of command and data injection. Similar to network intrusion detection systems in the cyber security domain, the command and control communications in a critical infrastructure setting are monitored, and vetted against examples of benign and malicious command traffic, in order to identify potential attack events. Multiple learning methods are evaluated using a dataset of Remote Terminal Unit communications, which included both normal operations and instances of command and data injection attack scenarios.

  16. Serum levels of chemical elements in esophageal squamous cell carcinoma in Anyang, China: a case-control study based on machine learning methods.

    Science.gov (United States)

    Lin, Tong; Liu, Tiebing; Lin, Yucheng; Zhang, Chaoting; Yan, Lailai; Chen, Zhongxue; He, Zhonghu; Wang, Jingyu

    2017-09-24

    Esophageal squamous cell carcinoma (ESCC) is the predominant form of esophageal carcinoma with extremely aggressive nature and low survival rate. The risk factors for ESCC in the high-incidence areas of China remain unclear. We used machine learning methods to investigate whether there was an association between the alterations of serum levels of certain chemical elements and ESCC. Primary healthcare unit in Anyang city, Henan Province of China. 100 patients with ESCC and 100 healthy controls matched for age, sex and region were included. Primary outcome was the classification accuracy. Secondary outcome was the p Value of the t-test or rank-sum test. Both traditional statistical methods of t-test and rank-sum test and fashionable machine learning approaches were employed. Random Forest achieves the best accuracy of 98.38% on the original feature vectors (without dimensionality reduction), and support vector machine outperforms other classifiers by yielding accuracy of 96.56% on embedding spaces (with dimensionality reduction). All six classifiers can achieve accuracies more than 90% based on the single most important element Sr. The other two elements with distinctive difference are S and P, providing accuracies around 80%. More than half of chemical elements were found to be significantly different between patients with ESCC and the controls. These results suggest clear differences between patients with ESCC and controls, implying some potential promising applications in diagnosis, prognosis, pharmacy and nutrition of ESCC. However, the results should be interpreted with caution due to the retrospective design nature, limited sample size and the lack of several potential confounding factors (including obesity, nutritional status, and fruit and vegetable consumption and potential regional carcinogen contacts). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted

  17. Radiomic Machine Learning Classifiers for Prognostic Biomarkers of Head & Neck Cancer

    Directory of Open Access Journals (Sweden)

    Chintan eParmar

    2015-12-01

    Full Text Available Introduction: Radiomics extracts and mines large number of medical imaging features in a non-invasive and cost-effective way. The underlying assumption of radiomics is that these imaging features quantify phenotypic characteristics of entire tumor. In order to enhance applicability of radiomics in clinical oncology, highly accurate and reliable machine learning approaches are required. In this radiomic study, thirteen feature selection methods and eleven machine learning classification methods were evaluated in terms of their performance and stability for predicting overall survival in head and neck cancer patients. Methods: Two independent head and neck cancer cohorts were investigated. Training cohort HN1 consisted 101 HNSCC patients. Cohort HN2 (n=95 was used for validation. A total of 440 radiomic features were extracted from the segmented tumor regions in CT images. Feature selection and classification methods were compared using an unbiased evaluation framework. Results: We observed that the three feature selection methods MRMR (AUC = 0.69, Stability = 0.66, MIFS (AUC = 0.66, Stability = 0.69, and CIFE (AUC = 0.68, Stability = 0.7 had high prognostic performance and stability. The three classifiers BY (AUC = 0.67, RSD = 11.28, RF (AUC = 0.61, RSD = 7.36, and NN (AUC = 0.62, RSD = 10.52 also showed high prognostic performance and stability. Analysis investigating performance variability indicated that the choice of classification method is the major factor driving the performance variation (29.02% of total variance. Conclusions: Our study identified prognostic and reliable machine learning methods for the prediction of overall survival of head and neck cancer patients. Identification of optimal machine-learning methods for radiomics based prognostic analyses could broaden the scope of radiomics in precision oncology and cancer care.

  18. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    Science.gov (United States)

    S K, Somasundaram; P, Alli

    2017-11-09

    The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection

  19. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  20. An introduction to machine learning with Scikit-Learn

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  1. Machine learning and data science in soft materials engineering

    Science.gov (United States)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  2. Machine learning and data science in soft materials engineering.

    Science.gov (United States)

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  3. Using Machine Learning for Land Suitability Classification

    African Journals Online (AJOL)

    User

    West African Journal of Applied Ecology, vol. ... evidence for the utility of machine learning methods in land suitability classification especially MCS methods. ... Artificial intelligence tools. ..... Numerical values of index for the various classes.

  4. Machine Learning Techniques for Optical Performance Monitoring from Directly Detected PDM-QAM Signals

    DEFF Research Database (Denmark)

    Thrane, Jakob; Wass, Jesper; Piels, Molly

    2017-01-01

    Linear signal processing algorithms are effective in dealing with linear transmission channel and linear signal detection, while the nonlinear signal processing algorithms, from the machine learning community, are effective in dealing with nonlinear transmission channel and nonlinear signal...... detection. In this paper, a brief overview of the various machine learning methods and their application in optical communication is presented and discussed. Moreover, supervised machine learning methods, such as neural networks and support vector machine, are experimentally demonstrated for in-band optical...

  5. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  6. Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.

    Science.gov (United States)

    Luo, Gang; Stone, Bryan L; Johnson, Michael D; Tarczy-Hornoch, Peter; Wilcox, Adam B; Mooney, Sean D; Sheng, Xiaoming; Haug, Peter J; Nkoy, Flory L

    2017-08-29

    To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets. This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care

  7. Machine learning systems

    Energy Technology Data Exchange (ETDEWEB)

    Forsyth, R

    1984-05-01

    With the dramatic rise of expert systems has come a renewed interest in the fuel that drives them-knowledge. For it is specialist knowledge which gives expert systems their power. But extracting knowledge from human experts in symbolic form has proved arduous and labour-intensive. So the idea of machine learning is enjoying a renaissance. Machine learning is any automatic improvement in the performance of a computer system over time, as a result of experience. Thus a learning algorithm seeks to do one or more of the following: cover a wider range of problems, deliver more accurate solutions, obtain answers more cheaply, and simplify codified knowledge. 6 references.

  8. Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression.

    Science.gov (United States)

    Nouretdinov, Ilia; Costafreda, Sergi G; Gammerman, Alexander; Chervonenkis, Alexey; Vovk, Vladimir; Vapnik, Vladimir; Fu, Cynthia H Y

    2011-05-15

    There is rapidly accumulating evidence that the application of machine learning classification to neuroimaging measurements may be valuable for the development of diagnostic and prognostic prediction tools in psychiatry. However, current methods do not produce a measure of the reliability of the predictions. Knowing the risk of the error associated with a given prediction is essential for the development of neuroimaging-based clinical tools. We propose a general probabilistic classification method to produce measures of confidence for magnetic resonance imaging (MRI) data. We describe the application of transductive conformal predictor (TCP) to MRI images. TCP generates the most likely prediction and a valid measure of confidence, as well as the set of all possible predictions for a given confidence level. We present the theoretical motivation for TCP, and we have applied TCP to structural and functional MRI data in patients and healthy controls to investigate diagnostic and prognostic prediction in depression. We verify that TCP predictions are as accurate as those obtained with more standard machine learning methods, such as support vector machine, while providing the additional benefit of a valid measure of confidence for each prediction. Copyright © 2010 Elsevier Inc. All rights reserved.

  9. Development of E-Learning Materials for Machining Safety Education

    Science.gov (United States)

    Nakazawa, Tsuyoshi; Mita, Sumiyoshi; Matsubara, Masaaki; Takashima, Takeo; Tanaka, Koichi; Izawa, Satoru; Kawamura, Takashi

    We developed two e-learning materials for Manufacturing Practice safety education: movie learning materials and hazard-detection learning materials. Using these video and sound media, students can learn how to operate machines safely with movie learning materials, which raise the effectiveness of preparation and review for manufacturing practice. Using these materials, students can realize safety operation well. Students can apply knowledge learned in lectures to the detection of hazards and use study methods for hazard detection during machine operation using the hazard-detection learning materials. Particularly, the hazard-detection learning materials raise students‧ safety consciousness and increase students‧ comprehension of knowledge from lectures and comprehension of operations during Manufacturing Practice.

  10. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  11. Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Ying Yin

    2016-05-01

    Full Text Available Multi-instance multi-label learning is a learning framework, where every object is represented by a bag of instances and associated with multiple labels simultaneously. The existing degeneration strategy-based methods often suffer from some common drawbacks: (1 the user-specific parameter for the number of clusters may incur the effective problem; (2 SVM may bring a high computational cost when utilized as the classifier builder. In this paper, we propose an algorithm, namely multi-instance multi-label (MIML-extreme learning machine (ELM, to address the problems. To our best knowledge, we are the first to utilize ELM in the MIML problem and to conduct the comparison of ELM and SVM on MIML. Extensive experiments have been conducted on real datasets and synthetic datasets. The results show that MIMLELM tends to achieve better generalization performance at a higher learning speed.

  12. Opinion Mining in Latvian Text Using Semantic Polarity Analysis and Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Gatis Špats

    2016-07-01

    Full Text Available In this paper we demonstrate approaches for opinion mining in Latvian text. Authors have applied, combined and extended results of several previous studies and public resources to perform opinion mining in Latvian text using two approaches, namely, semantic polarity analysis and machine learning. One of the most significant constraints that make application of opinion mining for written content classification in Latvian text challenging is the limited publicly available text corpora for classifier training. We have joined several sources and created a publically available extended lexicon. Our results are comparable to or outperform current achievements in opinion mining in Latvian. Experiments show that lexicon-based methods provide more accurate opinion mining than the application of Naive Bayes machine learning classifier on Latvian tweets. Methods used during this study could be further extended using human annotators, unsupervised machine learning and bootstrapping to create larger corpora of classified text.

  13. The use of machine learning and nonlinear statistical tools for ADME prediction.

    Science.gov (United States)

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.

  14. Optimal interference code based on machine learning

    Science.gov (United States)

    Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

    2016-10-01

    In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.

  15. Mutual learning in a tree parity machine and its application to cryptography

    International Nuclear Information System (INIS)

    Rosen-Zvi, Michal; Klein, Einat; Kanter, Ido; Kinzel, Wolfgang

    2002-01-01

    Mutual learning of a pair of tree parity machines with continuous and discrete weight vectors is studied analytically. The analysis is based on a mapping procedure that maps the mutual learning in tree parity machines onto mutual learning in noisy perceptrons. The stationary solution of the mutual learning in the case of continuous tree parity machines depends on the learning rate where a phase transition from partial to full synchronization is observed. In the discrete case the learning process is based on a finite increment and a full synchronized state is achieved in a finite number of steps. The synchronization of discrete parity machines is introduced in order to construct an ephemeral key-exchange protocol. The dynamic learning of a third tree parity machine (an attacker) that tries to imitate one of the two machines while the two still update their weight vectors is also analyzed. In particular, the synchronization times of the naive attacker and the flipping attacker recently introduced in Ref. 9 are analyzed. All analytical results are found to be in good agreement with simulation results

  16. Geometrical methods in learning theory

    International Nuclear Information System (INIS)

    Burdet, G.; Combe, Ph.; Nencka, H.

    2001-01-01

    The methods of information theory provide natural approaches to learning algorithms in the case of stochastic formal neural networks. Most of the classical techniques are based on some extremization principle. A geometrical interpretation of the associated algorithms provides a powerful tool for understanding the learning process and its stability and offers a framework for discussing possible new learning rules. An illustration is given using sequential and parallel learning in the Boltzmann machine

  17. Prediction of Human Drug Targets and Their Interactions Using Machine Learning Methods: Current and Future Perspectives.

    Science.gov (United States)

    Nath, Abhigyan; Kumari, Priyanka; Chaube, Radha

    2018-01-01

    Identification of drug targets and drug target interactions are important steps in the drug-discovery pipeline. Successful computational prediction methods can reduce the cost and time demanded by the experimental methods. Knowledge of putative drug targets and their interactions can be very useful for drug repurposing. Supervised machine learning methods have been very useful in drug target prediction and in prediction of drug target interactions. Here, we describe the details for developing prediction models using supervised learning techniques for human drug target prediction and their interactions.

  18. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Directory of Open Access Journals (Sweden)

    Enrico Glaab

    Full Text Available Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  19. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

    Science.gov (United States)

    Masso, Majid; Vaisman, Iosif I

    2008-09-15

    Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

  20. A cross-benchmark comparison of 87 learning to rank methods

    NARCIS (Netherlands)

    Tax, N.; Bockting, S.; Hiemstra, D.

    2015-01-01

    Learning to rank is an increasingly important scientific field that comprises the use of machine learning for the ranking task. New learning to rank methods are generally evaluated on benchmark test collections. However, comparison of learning to rank methods based on evaluation results is hindered

  1. Development of a Late-Life Dementia Prediction Index with Supervised Machine Learning in the Population-Based CAIDE Study.

    Science.gov (United States)

    Pekkala, Timo; Hall, Anette; Lötjönen, Jyrki; Mattila, Jussi; Soininen, Hilkka; Ngandu, Tiia; Laatikainen, Tiina; Kivipelto, Miia; Solomon, Alina

    2017-01-01

    This study aimed to develop a late-life dementia prediction model using a novel validated supervised machine learning method, the Disease State Index (DSI), in the Finnish population-based CAIDE study. The CAIDE study was based on previous population-based midlife surveys. CAIDE participants were re-examined twice in late-life, and the first late-life re-examination was used as baseline for the present study. The main study population included 709 cognitively normal subjects at first re-examination who returned to the second re-examination up to 10 years later (incident dementia n = 39). An extended population (n = 1009, incident dementia 151) included non-participants/non-survivors (national registers data). DSI was used to develop a dementia index based on first re-examination assessments. Performance in predicting dementia was assessed as area under the ROC curve (AUC). AUCs for DSI were 0.79 and 0.75 for main and extended populations. Included predictors were cognition, vascular factors, age, subjective memory complaints, and APOE genotype. The supervised machine learning method performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. DSI could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions.

  2. Sensor Data Air Pollution Prediction by Machine Learning Methods

    Czech Academy of Sciences Publication Activity Database

    Vidnerová, Petra; Neruda, Roman

    submitted 25. 1. (2018) ISSN 1530-437X R&D Projects: GA ČR GA15-18108S Grant - others:GA MŠk(CZ) LM2015042 Institutional support: RVO:67985807 Keywords : machine learning * sensors * air pollution * deep neural networks * regularization networks Subject RIV: IN - Informatics, Computer Science Impact factor: 2.512, year: 2016

  3. A Machine Learning Application Based in Random Forest for Integrating Mass Spectrometry-Based Metabolomic Data: A Simple Screening Method for Patients With Zika Virus

    Directory of Open Access Journals (Sweden)

    Carlos Fernando Odir Rodrigues Melo

    2018-04-01

    Full Text Available Recent Zika outbreaks in South America, accompanied by unexpectedly severe clinical complications have brought much interest in fast and reliable screening methods for ZIKV (Zika virus identification. Reverse-transcriptase polymerase chain reaction (RT-PCR is currently the method of choice to detect ZIKV in biological samples. This approach, nonetheless, demands a considerable amount of time and resources such as kits and reagents that, in endemic areas, may result in a substantial financial burden over affected individuals and health services veering away from RT-PCR analysis. This study presents a powerful combination of high-resolution mass spectrometry and a machine-learning prediction model for data analysis to assess the existence of ZIKV infection across a series of patients that bear similar symptomatic conditions, but not necessarily are infected with the disease. By using mass spectrometric data that are inputted with the developed decision-making algorithm, we were able to provide a set of features that work as a “fingerprint” for this specific pathophysiological condition, even after the acute phase of infection. Since both mass spectrometry and machine learning approaches are well-established and have largely utilized tools within their respective fields, this combination of methods emerges as a distinct alternative for clinical applications, providing a diagnostic screening—faster and more accurate—with improved cost-effectiveness when compared to existing technologies.

  4. A Machine Learning Application Based in Random Forest for Integrating Mass Spectrometry-Based Metabolomic Data: A Simple Screening Method for Patients With Zika Virus.

    Science.gov (United States)

    Melo, Carlos Fernando Odir Rodrigues; Navarro, Luiz Claudio; de Oliveira, Diogo Noin; Guerreiro, Tatiane Melina; Lima, Estela de Oliveira; Delafiori, Jeany; Dabaja, Mohamed Ziad; Ribeiro, Marta da Silva; de Menezes, Maico; Rodrigues, Rafael Gustavo Martins; Morishita, Karen Noda; Esteves, Cibele Zanardi; de Amorim, Aline Lopes Lucas; Aoyagui, Caroline Tiemi; Parise, Pierina Lorencini; Milanez, Guilherme Paier; do Nascimento, Gabriela Mansano; Ribas Freitas, André Ricardo; Angerami, Rodrigo; Costa, Fábio Trindade Maranhão; Arns, Clarice Weis; Resende, Mariangela Ribeiro; Amaral, Eliana; Junior, Renato Passini; Ribeiro-do-Valle, Carolina C; Milanez, Helaine; Moretti, Maria Luiza; Proenca-Modena, Jose Luiz; Avila, Sandra; Rocha, Anderson; Catharino, Rodrigo Ramos

    2018-01-01

    Recent Zika outbreaks in South America, accompanied by unexpectedly severe clinical complications have brought much interest in fast and reliable screening methods for ZIKV (Zika virus) identification. Reverse-transcriptase polymerase chain reaction (RT-PCR) is currently the method of choice to detect ZIKV in biological samples. This approach, nonetheless, demands a considerable amount of time and resources such as kits and reagents that, in endemic areas, may result in a substantial financial burden over affected individuals and health services veering away from RT-PCR analysis. This study presents a powerful combination of high-resolution mass spectrometry and a machine-learning prediction model for data analysis to assess the existence of ZIKV infection across a series of patients that bear similar symptomatic conditions, but not necessarily are infected with the disease. By using mass spectrometric data that are inputted with the developed decision-making algorithm, we were able to provide a set of features that work as a "fingerprint" for this specific pathophysiological condition, even after the acute phase of infection. Since both mass spectrometry and machine learning approaches are well-established and have largely utilized tools within their respective fields, this combination of methods emerges as a distinct alternative for clinical applications, providing a diagnostic screening-faster and more accurate-with improved cost-effectiveness when compared to existing technologies.

  5. Mixed Integer Linear Programming based machine learning approach identifies regulators of telomerase in yeast.

    Science.gov (United States)

    Poos, Alexandra M; Maicher, André; Dieckmann, Anna K; Oswald, Marcus; Eils, Roland; Kupiec, Martin; Luke, Brian; König, Rainer

    2016-06-02

    Understanding telomere length maintenance mechanisms is central in cancer biology as their dysregulation is one of the hallmarks for immortalization of cancer cells. Important for this well-balanced control is the transcriptional regulation of the telomerase genes. We integrated Mixed Integer Linear Programming models into a comparative machine learning based approach to identify regulatory interactions that best explain the discrepancy of telomerase transcript levels in yeast mutants with deleted regulators showing aberrant telomere length, when compared to mutants with normal telomere length. We uncover novel regulators of telomerase expression, several of which affect histone levels or modifications. In particular, our results point to the transcription factors Sum1, Hst1 and Srb2 as being important for the regulation of EST1 transcription, and we validated the effect of Sum1 experimentally. We compiled our machine learning method leading to a user friendly package for R which can straightforwardly be applied to similar problems integrating gene regulator binding information and expression profiles of samples of e.g. different phenotypes, diseases or treatments. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Machine learning in autistic spectrum disorder behavioral research: A review and ways forward.

    Science.gov (United States)

    Thabtah, Fadi

    2018-02-13

    Autistic Spectrum Disorder (ASD) is a mental disorder that retards acquisition of linguistic, communication, cognitive, and social skills and abilities. Despite being diagnosed with ASD, some individuals exhibit outstanding scholastic, non-academic, and artistic capabilities, in such cases posing a challenging task for scientists to provide answers. In the last few years, ASD has been investigated by social and computational intelligence scientists utilizing advanced technologies such as machine learning to improve diagnostic timing, precision, and quality. Machine learning is a multidisciplinary research topic that employs intelligent techniques to discover useful concealed patterns, which are utilized in prediction to improve decision making. Machine learning techniques such as support vector machines, decision trees, logistic regressions, and others, have been applied to datasets related to autism in order to construct predictive models. These models claim to enhance the ability of clinicians to provide robust diagnoses and prognoses of ASD. However, studies concerning the use of machine learning in ASD diagnosis and treatment suffer from conceptual, implementation, and data issues such as the way diagnostic codes are used, the type of feature selection employed, the evaluation measures chosen, and class imbalances in data among others. A more serious claim in recent studies is the development of a new method for ASD diagnoses based on machine learning. This article critically analyses these recent investigative studies on autism, not only articulating the aforementioned issues in these studies but also recommending paths forward that enhance machine learning use in ASD with respect to conceptualization, implementation, and data. Future studies concerning machine learning in autism research are greatly benefitted by such proposals.

  7. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography.

    Science.gov (United States)

    Narula, Sukrit; Shameer, Khader; Salem Omar, Alaa Mabrouk; Dudley, Joel T; Sengupta, Partho P

    2016-11-29

    Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p 13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience. Copyright © 2016 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  8. TH-CD-206-05: Machine-Learning Based Segmentation of Organs at Risks for Head and Neck Radiotherapy Planning

    International Nuclear Information System (INIS)

    Ibragimov, B; Pernus, F; Strojan, P; Xing, L

    2016-01-01

    Purpose: Accurate and efficient delineation of tumor target and organs-at-risks is essential for the success of radiotherapy. In reality, despite of decades of intense research efforts, auto-segmentation has not yet become clinical practice. In this study, we present, for the first time, a deep learning-based classification algorithm for autonomous segmentation in head and neck (HaN) treatment planning. Methods: Fifteen HN datasets of CT, MR and PET images with manual annotation of organs-at-risk (OARs) including spinal cord, brainstem, optic nerves, chiasm, eyes, mandible, tongue, parotid glands were collected and saved in a library of plans. We also have ten super-resolution MR images of the tongue area, where the genioglossus and inferior longitudinalis tongue muscles are defined as organs of interest. We applied the concepts of random forest- and deep learning-based object classification for automated image annotation with the aim of using machine learning to facilitate head and neck radiotherapy planning process. In this new paradigm of segmentation, random forests were used for landmark-assisted segmentation of super-resolution MR images. Alternatively to auto-segmentation with random forest-based landmark detection, deep convolutional neural networks were developed for voxel-wise segmentation of OARs in single and multi-modal images. The network consisted of three pairs of convolution and pooing layer, one RuLU layer and a softmax layer. Results: We present a comprehensive study on using machine learning concepts for auto-segmentation of OARs and tongue muscles for the HaN radiotherapy planning. An accuracy of 81.8% in terms of Dice coefficient was achieved for segmentation of genioglossus and inferior longitudinalis tongue muscles. Preliminary results of OARs regimentation also indicate that deep-learning afforded an unprecedented opportunities to improve the accuracy and robustness of radiotherapy planning. Conclusion: A novel machine learning framework

  9. TH-CD-206-05: Machine-Learning Based Segmentation of Organs at Risks for Head and Neck Radiotherapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Ibragimov, B [Stanford University, Stanford, CA (United States); Pernus, F [University of Ljubljana, Ljubljana (Slovenia); Strojan, P; Xing, L [Institute of Oncology, Ljubljana (Slovenia)

    2016-06-15

    Purpose: Accurate and efficient delineation of tumor target and organs-at-risks is essential for the success of radiotherapy. In reality, despite of decades of intense research efforts, auto-segmentation has not yet become clinical practice. In this study, we present, for the first time, a deep learning-based classification algorithm for autonomous segmentation in head and neck (HaN) treatment planning. Methods: Fifteen HN datasets of CT, MR and PET images with manual annotation of organs-at-risk (OARs) including spinal cord, brainstem, optic nerves, chiasm, eyes, mandible, tongue, parotid glands were collected and saved in a library of plans. We also have ten super-resolution MR images of the tongue area, where the genioglossus and inferior longitudinalis tongue muscles are defined as organs of interest. We applied the concepts of random forest- and deep learning-based object classification for automated image annotation with the aim of using machine learning to facilitate head and neck radiotherapy planning process. In this new paradigm of segmentation, random forests were used for landmark-assisted segmentation of super-resolution MR images. Alternatively to auto-segmentation with random forest-based landmark detection, deep convolutional neural networks were developed for voxel-wise segmentation of OARs in single and multi-modal images. The network consisted of three pairs of convolution and pooing layer, one RuLU layer and a softmax layer. Results: We present a comprehensive study on using machine learning concepts for auto-segmentation of OARs and tongue muscles for the HaN radiotherapy planning. An accuracy of 81.8% in terms of Dice coefficient was achieved for segmentation of genioglossus and inferior longitudinalis tongue muscles. Preliminary results of OARs regimentation also indicate that deep-learning afforded an unprecedented opportunities to improve the accuracy and robustness of radiotherapy planning. Conclusion: A novel machine learning framework

  10. Modeling Geomagnetic Variations using a Machine Learning Framework

    Science.gov (United States)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  11. Feasibility study of stain-free classification of cell apoptosis based on diffraction imaging flow cytometry and supervised machine learning techniques.

    Science.gov (United States)

    Feng, Jingwen; Feng, Tong; Yang, Chengwen; Wang, Wei; Sa, Yu; Feng, Yuanming

    2018-06-01

    This study was to explore the feasibility of prediction and classification of cells in different stages of apoptosis with a stain-free method based on diffraction images and supervised machine learning. Apoptosis was induced in human chronic myelogenous leukemia K562 cells by cis-platinum (DDP). A newly developed technique of polarization diffraction imaging flow cytometry (p-DIFC) was performed to acquire diffraction images of the cells in three different statuses (viable, early apoptotic and late apoptotic/necrotic) after cell separation through fluorescence activated cell sorting with Annexin V-PE and SYTOX® Green double staining. The texture features of the diffraction images were extracted with in-house software based on the Gray-level co-occurrence matrix algorithm to generate datasets for cell classification with supervised machine learning method. Therefore, this new method has been verified in hydrogen peroxide induced apoptosis model of HL-60. Results show that accuracy of higher than 90% was achieved respectively in independent test datasets from each cell type based on logistic regression with ridge estimators, which indicated that p-DIFC system has a great potential in predicting and classifying cells in different stages of apoptosis.

  12. A Machine Learning Based Intrusion Impact Analysis Scheme for Clouds

    Directory of Open Access Journals (Sweden)

    Junaid Arshad

    2012-01-01

    Full Text Available Clouds represent a major paradigm shift, inspiring the contemporary approach to computing. They present fascinating opportunities to address dynamic user requirements with the provision of on demand expandable computing infrastructures. However, Clouds introduce novel security challenges which need to be addressed to facilitate widespread adoption. This paper is focused on one such challenge - intrusion impact analysis. In particular, we highlight the significance of intrusion impact analysis for the overall security of Clouds. Additionally, we present a machine learning based scheme to address this challenge in accordance with the specific requirements of Clouds for intrusion impact analysis. We also present rigorous evaluation performed to assess the effectiveness and feasibility of the proposed method to address this challenge for Clouds. The evaluation results demonstrate high degree of effectiveness to correctly determine the impact of an intrusion along with significant reduction with respect to the intrusion response time.

  13. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  14. VATE: VAlidation of high TEchnology based on large database analysis by learning machine

    NARCIS (Netherlands)

    Meldolesi, E; Van Soest, J; Alitto, A R; Autorino, R; Dinapoli, N; Dekker, A; Gambacorta, M A; Gatta, R; Tagliaferri, L; Damiani, A; Valentini, V

    2014-01-01

    The interaction between implementation of new technologies and different outcomes can allow a broad range of researches to be expanded. The purpose of this paper is to introduce the VAlidation of high TEchnology based on large database analysis by learning machine (VATE) project that aims to combine

  15. Discrimination of plant root zone water status in greenhouse production based on phenotyping and machine learning techniques

    OpenAIRE

    Guo, Doudou; Juan, Jiaxiang; Chang, Liying; Zhang, Jingjin; Huang, Danfeng

    2017-01-01

    Plant-based sensing on water stress can provide sensitive and direct reference for precision irrigation system in greenhouse. However, plant information acquisition, interpretation, and systematical application remain insufficient. This study developed a discrimination method for plant root zone water status in greenhouse by integrating phenotyping and machine learning techniques. Pakchoi plants were used and treated by three root zone moisture levels, 40%, 60%, and 80% relative water content...

  16. Quantum Machine Learning

    Science.gov (United States)

    Biswas, Rupak

    2018-01-01

    Quantum computing promises an unprecedented ability to solve intractable problems by harnessing quantum mechanical effects such as tunneling, superposition, and entanglement. The Quantum Artificial Intelligence Laboratory (QuAIL) at NASA Ames Research Center is the space agency's primary facility for conducting research and development in quantum information sciences. QuAIL conducts fundamental research in quantum physics but also explores how best to exploit and apply this disruptive technology to enable NASA missions in aeronautics, Earth and space sciences, and space exploration. At the same time, machine learning has become a major focus in computer science and captured the imagination of the public as a panacea to myriad big data problems. In this talk, we will discuss how classical machine learning can take advantage of quantum computing to significantly improve its effectiveness. Although we illustrate this concept on a quantum annealer, other quantum platforms could be used as well. If explored fully and implemented efficiently, quantum machine learning could greatly accelerate a wide range of tasks leading to new technologies and discoveries that will significantly change the way we solve real-world problems.

  17. A Framework for Final Drive Simultaneous Failure Diagnosis Based on Fuzzy Entropy and Sparse Bayesian Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Qing Ye

    2015-01-01

    Full Text Available This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach.

  18. Machine Learning-based Intelligent Formal Reasoning and Proving System

    Science.gov (United States)

    Chen, Shengqing; Huang, Xiaojian; Fang, Jiaze; Liang, Jia

    2018-03-01

    The reasoning system can be used in many fields. How to improve reasoning efficiency is the core of the design of system. Through the formal description of formal proof and the regular matching algorithm, after introducing the machine learning algorithm, the system of intelligent formal reasoning and verification has high efficiency. The experimental results show that the system can verify the correctness of propositional logic reasoning and reuse the propositional logical reasoning results, so as to obtain the implicit knowledge in the knowledge base and provide the basic reasoning model for the construction of intelligent system.

  19. A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.

    Science.gov (United States)

    Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne

    2018-05-01

    Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.

  20. Machine Learning Optimization of Evolvable Artificial Cells

    DEFF Research Database (Denmark)

    Caschera, F.; Rasmussen, S.; Hanczyc, M.

    2011-01-01

    can be explored. A machine learning approach (Evo-DoE) could be applied to explore this experimental space and define optimal interactions according to a specific fitness function. Herein an implementation of an evolutionary design of experiments to optimize chemical and biochemical systems based...... on a machine learning process is presented. The optimization proceeds over generations of experiments in iterative loop until optimal compositions are discovered. The fitness function is experimentally measured every time the loop is closed. Two examples of complex systems, namely a liposomal drug formulation...

  1. Spike sorting based upon machine learning algorithms (SOMA).

    Science.gov (United States)

    Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F

    2007-02-15

    We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.

  2. Recent Advances in Conotoxin Classification by Using Machine Learning Methods.

    Science.gov (United States)

    Dao, Fu-Ying; Yang, Hui; Su, Zhen-Dong; Yang, Wuritu; Wu, Yun; Hui, Ding; Chen, Wei; Tang, Hua; Lin, Hao

    2017-06-25

    Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.

  3. Distinguishing butchery cut marks from crocodile bite marks through machine learning methods.

    Science.gov (United States)

    Domínguez-Rodrigo, Manuel; Baquedano, Enrique

    2018-04-10

    All models of evolution of human behaviour depend on the correct identification and interpretation of bone surface modifications (BSM) on archaeofaunal assemblages. Crucial evolutionary features, such as the origin of stone tool use, meat-eating, food-sharing, cooperation and sociality can only be addressed through confident identification and interpretation of BSM, and more specifically, cut marks. Recently, it has been argued that linear marks with the same properties as cut marks can be created by crocodiles, thereby questioning whether secure cut mark identifications can be made in the Early Pleistocene fossil record. Powerful classification methods based on multivariate statistics and machine learning (ML) algorithms have previously successfully discriminated cut marks from most other potentially confounding BSM. However, crocodile-made marks were marginal to or played no role in these comparative analyses. Here, for the first time, we apply state-of-the-art ML methods on crocodile linear BSM and experimental butchery cut marks, showing that the combination of multivariate taphonomy and ML methods provides accurate identification of BSM, including cut and crocodile bite marks. This enables empirically-supported hominin behavioural modelling, provided that these methods are applied to fossil assemblages.

  4. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  5. Comparison of Machine Learning methods for incipient motion in gravel bed rivers

    Science.gov (United States)

    Valyrakis, Manousos

    2013-04-01

    Soil erosion and sediment transport of natural gravel bed streams are important processes which affect both the morphology as well as the ecology of earth's surface. For gravel bed rivers at near incipient flow conditions, particle entrainment dynamics are highly intermittent. This contribution reviews the use of modern Machine Learning (ML) methods implemented for short term prediction of entrainment instances of individual grains exposed in fully developed near boundary turbulent flows. Results obtained by network architectures of variable complexity based on two different ML methods namely the Artificial Neural Network (ANN) and the Adaptive Neuro-Fuzzy Inference System (ANFIS) are compared in terms of different error and performance indices, computational efficiency and complexity as well as predictive accuracy and forecast ability. Different model architectures are trained and tested with experimental time series obtained from mobile particle flume experiments. The experimental setup consists of a Laser Doppler Velocimeter (LDV) and a laser optics system, which acquire data for the instantaneous flow and particle response respectively, synchronously. The first is used to record the flow velocity components directly upstream of the test particle, while the later tracks the particle's displacements. The lengthy experimental data sets (millions of data points) are split into the training and validation subsets used to perform the corresponding learning and testing of the models. It is demonstrated that the ANFIS hybrid model, which is based on neural learning and fuzzy inference principles, better predicts the critical flow conditions above which sediment transport is initiated. In addition, it is illustrated that empirical knowledge can be extracted, validating the theoretical assumption that particle ejections occur due to energetic turbulent flow events. Such a tool may find application in management and regulation of stream flows downstream of dams for stream

  6. Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization.

    Science.gov (United States)

    Turk, Samo; Merget, Benjamin; Rippmann, Friedrich; Fulle, Simone

    2017-12-26

    Matched molecular pair (MMP) analyses are widely used in compound optimization projects to gain insights into structure-activity relationships (SAR). The analysis is traditionally done via statistical methods but can also be employed together with machine learning (ML) approaches to extrapolate to novel compounds. The here introduced MMP/ML method combines a fragment-based MMP implementation with different machine learning methods to obtain automated SAR decomposition and prediction. To test the prediction capabilities and model transferability, two different compound optimization scenarios were designed: (1) "new fragments" which occurs when exploring new fragments for a defined compound series and (2) "new static core and transformations" which resembles for instance the identification of a new compound series. Very good results were achieved by all employed machine learning methods especially for the new fragments case, but overall deep neural network models performed best, allowing reliable predictions also for the new static core and transformations scenario, where comprehensive SAR knowledge of the compound series is missing. Furthermore, we show that models trained on all available data have a higher generalizability compared to models trained on focused series and can extend beyond chemical space covered in the training data. Thus, coupling MMP with deep neural networks provides a promising approach to make high quality predictions on various data sets and in different compound optimization scenarios.

  7. Classification of carcinogenic and mutagenic properties using machine learning method

    DEFF Research Database (Denmark)

    Moorthy, N. S.Hari Narayana; Kumar, Surendra; Poongavanam, Vasanthanathan

    2017-01-01

    An accurate calculation of carcinogenicity of chemicals became a serious challenge for the health assessment authority around the globe because of not only increased cost for experiments but also various ethical issues exist using animal models. In this study, we provide machine learning...

  8. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  9. Machine Learning in Medicine

    Science.gov (United States)

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  10. Machine learning techniques for optical communication system optimization

    DEFF Research Database (Denmark)

    Zibar, Darko; Wass, Jesper; Thrane, Jakob

    In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction.......In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction....

  11. Machine learning for radioxenon event classification for the Comprehensive Nuclear-Test-Ban Treaty

    Energy Technology Data Exchange (ETDEWEB)

    Stocki, Trevor J., E-mail: trevor_stocki@hc-sc.gc.c [Radiation Protection Bureau, 775 Brookfield Road, A.L. 6302D1, Ottawa, ON, K1A 1C1 (Canada); Li, Guichong; Japkowicz, Nathalie [School of Information Technology and Engineering, University of Ottawa, 800 King Edward Avenue, Ottawa, ON, K1N 6N5 (Canada); Ungar, R. Kurt [Radiation Protection Bureau, 775 Brookfield Road, A.L. 6302D1, Ottawa, ON, K1A 1C1 (Canada)

    2010-01-15

    A method of weapon detection for the Comprehensive nuclear-Test-Ban-Treaty (CTBT) consists of monitoring the amount of radioxenon in the atmosphere by measuring and sampling the activity concentration of {sup 131m}Xe, {sup 133}Xe, {sup 133m}Xe, and {sup 135}Xe by radionuclide monitoring. Several explosion samples were simulated based on real data since the measured data of this type is quite rare. These data sets consisted of different circumstances of a nuclear explosion, and are used as training data sets to establish an effective classification model employing state-of-the-art technologies in machine learning. A study was conducted involving classic induction algorithms in machine learning including Naive Bayes, Neural Networks, Decision Trees, k-Nearest Neighbors, and Support Vector Machines, that revealed that they can successfully be used in this practical application. In particular, our studies show that many induction algorithms in machine learning outperform a simple linear discriminator when a signal is found in a high radioxenon background environment.

  12. Machine learning for radioxenon event classification for the Comprehensive Nuclear-Test-Ban Treaty

    International Nuclear Information System (INIS)

    Stocki, Trevor J.; Li, Guichong; Japkowicz, Nathalie; Ungar, R. Kurt

    2010-01-01

    A method of weapon detection for the Comprehensive nuclear-Test-Ban-Treaty (CTBT) consists of monitoring the amount of radioxenon in the atmosphere by measuring and sampling the activity concentration of 131m Xe, 133 Xe, 133m Xe, and 135 Xe by radionuclide monitoring. Several explosion samples were simulated based on real data since the measured data of this type is quite rare. These data sets consisted of different circumstances of a nuclear explosion, and are used as training data sets to establish an effective classification model employing state-of-the-art technologies in machine learning. A study was conducted involving classic induction algorithms in machine learning including Naive Bayes, Neural Networks, Decision Trees, k-Nearest Neighbors, and Support Vector Machines, that revealed that they can successfully be used in this practical application. In particular, our studies show that many induction algorithms in machine learning outperform a simple linear discriminator when a signal is found in a high radioxenon background environment.

  13. Machine vision systems using machine learning for industrial product inspection

    Science.gov (United States)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  14. What is the machine learning?

    Science.gov (United States)

    Chang, Spencer; Cohen, Timothy; Ostdiek, Bryan

    2018-03-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables—aided by physical intuition—that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus nonlinear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

  15. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances

    Directory of Open Access Journals (Sweden)

    Abut F

    2015-08-01

    Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance

  16. Using Machine Learning in Adversarial Environments.

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Warren Leon [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-02-01

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approaches only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.

  17. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    Science.gov (United States)

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Improved Extreme Learning Machine and Its Application in Image Quality Assessment

    Directory of Open Access Journals (Sweden)

    Li Mao

    2014-01-01

    Full Text Available Extreme learning machine (ELM is a new class of single-hidden layer feedforward neural network (SLFN, which is simple in theory and fast in implementation. Zong et al. propose a weighted extreme learning machine for learning data with imbalanced class distribution, which maintains the advantages from original ELM. However, the current reported ELM and its improved version are only based on the empirical risk minimization principle, which may suffer from overfitting. To solve the overfitting troubles, in this paper, we incorporate the structural risk minimization principle into the (weighted ELM, and propose a modified (weighted extreme learning machine (M-ELM and M-WELM. Experimental results show that our proposed M-WELM outperforms the current reported extreme learning machine algorithm in image quality assessment.

  19. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting

    Directory of Open Access Journals (Sweden)

    Zhaoxuan Li

    2016-01-01

    Full Text Available We evaluate and compare two common methods, artificial neural networks (ANN and support vector regression (SVR, for predicting energy productions from a solar photovoltaic (PV system in Florida 15 min, 1 h and 24 h ahead of time. A hierarchical approach is proposed based on the machine learning algorithms tested. The production data used in this work corresponds to 15 min averaged power measurements collected from 2014. The accuracy of the model is determined using computing error statistics such as mean bias error (MBE, mean absolute error (MAE, root mean square error (RMSE, relative MBE (rMBE, mean percentage error (MPE and relative RMSE (rRMSE. This work provides findings on how forecasts from individual inverters will improve the total solar power generation forecast of the PV system.

  20. Computerization of Hungarian reforestation manual with machine learning methods

    Science.gov (United States)

    Czimber, Kornél; Gálos, Borbála; Mátyás, Csaba; Bidló, András; Gribovszki, Zoltán

    2017-04-01

    Hungarian forests are highly sensitive to the changing climate, especially to the available precipitation amount. Over the past two decades several drought damages were observed for tree species which are in the lower xeric limit of their distribution. From year to year these affected forest stands become more difficult to reforest with the same native species because these are not able to adapt to the increasing probability of droughts. The climate related parameter set of the Hungarian forest stand database needs updates. Air humidity that was formerly used to define the forest climate zones is not measured anymore and its value based on climate model outputs is highly uncertain. The aim was to develop a novel computerized and objective method to describe the species-specific climate conditions that is essential for survival, growth and optimal production of the forest ecosystems. The method is expected to project the species spatial distribution until 2100 on the basis of regional climate model simulations. Until now, Hungarian forest managers have been using a carefully edited spreadsheet for reforestation purposes. Applying binding regulations this spreadsheet prescribes the stand-forming and admixed tree species and their expected growth rate for each forest site types. We are going to present a new machine learning based method to replace the former spreadsheet. We took into great consideration of various methods, such as maximum likelihood, Bayesian networks, Fuzzy logic. The method calculates distributions, setups classification, which can be validated and modified by experts if necessary. Projected climate change conditions makes necessary to include into this system an additional climate zone that does not exist in our region now, as well as new options for potential tree species. In addition to or instead of the existing ones, the influence of further limiting parameters (climatic extremes, soil water retention) are also investigated. Results will be

  1. Comparison of machine learning and semi-quantification algorithms for (I123)FP-CIT classification: the beginning of the end for semi-quantification?

    Science.gov (United States)

    Taylor, Jonathan Christopher; Fenner, John Wesley

    2017-11-29

    Semi-quantification methods are well established in the clinic for assisted reporting of (I123) Ioflupane images. Arguably, these are limited diagnostic tools. Recent research has demonstrated the potential for improved classification performance offered by machine learning algorithms. A direct comparison between methods is required to establish whether a move towards widespread clinical adoption of machine learning algorithms is justified. This study compared three machine learning algorithms with that of a range of semi-quantification methods, using the Parkinson's Progression Markers Initiative (PPMI) research database and a locally derived clinical database for validation. Machine learning algorithms were based on support vector machine classifiers with three different sets of features: Voxel intensities Principal components of image voxel intensities Striatal binding radios from the putamen and caudate. Semi-quantification methods were based on striatal binding ratios (SBRs) from both putamina, with and without consideration of the caudates. Normal limits for the SBRs were defined through four different methods: Minimum of age-matched controls Mean minus 1/1.5/2 standard deviations from age-matched controls Linear regression of normal patient data against age (minus 1/1.5/2 standard errors) Selection of the optimum operating point on the receiver operator characteristic curve from normal and abnormal training data Each machine learning and semi-quantification technique was evaluated with stratified, nested 10-fold cross-validation, repeated 10 times. The mean accuracy of the semi-quantitative methods for classification of local data into Parkinsonian and non-Parkinsonian groups varied from 0.78 to 0.87, contrasting with 0.89 to 0.95 for classifying PPMI data into healthy controls and Parkinson's disease groups. The machine learning algorithms gave mean accuracies between 0.88 to 0.92 and 0.95 to 0.97 for local and PPMI data respectively. Classification

  2. Classification of Strawberry Fruit Shape by Machine Learning

    Science.gov (United States)

    Ishikawa, T.; Hayashi, A.; Nagamatsu, S.; Kyutoku, Y.; Dan, I.; Wada, T.; Oku, K.; Saeki, Y.; Uto, T.; Tanabata, T.; Isobe, S.; Kochi, N.

    2018-05-01

    Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning methods to other plant species.

  3. Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy

    Science.gov (United States)

    Gueth, P.; Dauvergne, D.; Freud, N.; Létang, J. M.; Ray, C.; Testa, E.; Sarrut, D.

    2013-07-01

    Online dose monitoring in proton therapy is currently being investigated with prompt-gamma (PG) devices. PG emission was shown to be correlated with dose deposition. This relationship is mostly unknown under real conditions. We propose a machine learning approach based on simulations to create optimized treatment-specific classifiers that detect discrepancies between planned and delivered dose. Simulations were performed with the Monte-Carlo platform Gate/Geant4 for a spot-scanning proton therapy treatment and a PG camera prototype currently under investigation. The method first builds a learning set of perturbed situations corresponding to a range of patient translation. This set is then used to train a combined classifier using distal falloff and registered correlation measures. Classifier performances were evaluated using receiver operating characteristic curves and maximum associated specificity and sensitivity. A leave-one-out study showed that it is possible to detect discrepancies of 5 mm with specificity and sensitivity of 85% whereas using only distal falloff decreases the sensitivity down to 77% on the same data set. The proposed method could help to evaluate performance and to optimize the design of PG monitoring devices. It is generic: other learning sets of deviations, other measures and other types of classifiers could be studied to potentially reach better performance. At the moment, the main limitation lies in the computation time needed to perform the simulations.

  4. Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy

    International Nuclear Information System (INIS)

    Gueth, P; Freud, N; Létang, J M; Sarrut, D; Dauvergne, D; Ray, C; Testa, E

    2013-01-01

    Online dose monitoring in proton therapy is currently being investigated with prompt-gamma (PG) devices. PG emission was shown to be correlated with dose deposition. This relationship is mostly unknown under real conditions. We propose a machine learning approach based on simulations to create optimized treatment-specific classifiers that detect discrepancies between planned and delivered dose. Simulations were performed with the Monte-Carlo platform Gate/Geant4 for a spot-scanning proton therapy treatment and a PG camera prototype currently under investigation. The method first builds a learning set of perturbed situations corresponding to a range of patient translation. This set is then used to train a combined classifier using distal falloff and registered correlation measures. Classifier performances were evaluated using receiver operating characteristic curves and maximum associated specificity and sensitivity. A leave-one-out study showed that it is possible to detect discrepancies of 5 mm with specificity and sensitivity of 85% whereas using only distal falloff decreases the sensitivity down to 77% on the same data set. The proposed method could help to evaluate performance and to optimize the design of PG monitoring devices. It is generic: other learning sets of deviations, other measures and other types of classifiers could be studied to potentially reach better performance. At the moment, the main limitation lies in the computation time needed to perform the simulations. (paper)

  5. Differentiation of Enhancing Glioma and Primary Central Nervous System Lymphoma by Texture-Based Machine Learning.

    Science.gov (United States)

    Alcaide-Leon, P; Dufort, P; Geraldo, A F; Alshafai, L; Maralani, P J; Spears, J; Bharatha, A

    2017-06-01

    Accurate preoperative differentiation of primary central nervous system lymphoma and enhancing glioma is essential to avoid unnecessary neurosurgical resection in patients with primary central nervous system lymphoma. The purpose of the study was to evaluate the diagnostic performance of a machine-learning algorithm by using texture analysis of contrast-enhanced T1-weighted images for differentiation of primary central nervous system lymphoma and enhancing glioma. Seventy-one adult patients with enhancing gliomas and 35 adult patients with primary central nervous system lymphomas were included. The tumors were manually contoured on contrast-enhanced T1WI, and the resulting volumes of interest were mined for textural features and subjected to a support vector machine-based machine-learning protocol. Three readers classified the tumors independently on contrast-enhanced T1WI. Areas under the receiver operating characteristic curves were estimated for each reader and for the support vector machine classifier. A noninferiority test for diagnostic accuracy based on paired areas under the receiver operating characteristic curve was performed with a noninferiority margin of 0.15. The mean areas under the receiver operating characteristic curve were 0.877 (95% CI, 0.798-0.955) for the support vector machine classifier; 0.878 (95% CI, 0.807-0.949) for reader 1; 0.899 (95% CI, 0.833-0.966) for reader 2; and 0.845 (95% CI, 0.757-0.933) for reader 3. The mean area under the receiver operating characteristic curve of the support vector machine classifier was significantly noninferior to the mean area under the curve of reader 1 ( P = .021), reader 2 ( P = .035), and reader 3 ( P = .007). Support vector machine classification based on textural features of contrast-enhanced T1WI is noninferior to expert human evaluation in the differentiation of primary central nervous system lymphoma and enhancing glioma. © 2017 by American Journal of Neuroradiology.

  6. Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes.

    Science.gov (United States)

    Wang, Yuanjia; Chen, Tianle; Zeng, Donglin

    2016-01-01

    Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects.

  7. Designing “Theory of Machines and Mechanisms” course on Project Based Learning approach

    DEFF Research Database (Denmark)

    Shinde, Vikas

    2013-01-01

    by the industry and the learning outcomes specified by the National Board of Accreditation (NBA), India; this course is restructured on Project Based Learning approach. A mini project is designed to suit course objectives. An objective of this paper is to discuss the rationale of this course design......Theory of Machines and Mechanisms course is one of the essential courses of Mechanical Engineering undergraduate curriculum practiced at Indian Institute. Previously, this course was taught by traditional instruction based pedagogy. In order to achieve profession specific skills demanded...... and the process followed to design a project which meets diverse objectives....

  8. Figure of merit for macrouniformity based on image quality ruler evaluation and machine learning framework

    Science.gov (United States)

    Wang, Weibao; Overall, Gary; Riggs, Travis; Silveston-Keith, Rebecca; Whitney, Julie; Chiu, George; Allebach, Jan P.

    2013-01-01

    Assessment of macro-uniformity is a capability that is important for the development and manufacture of printer products. Our goal is to develop a metric that will predict macro-uniformity, as judged by human subjects, by scanning and analyzing printed pages. We consider two different machine learning frameworks for the metric: linear regression and the support vector machine. We have implemented the image quality ruler, based on the recommendations of the INCITS W1.1 macro-uniformity team. Using 12 subjects at Purdue University and 20 subjects at Lexmark, evenly balanced with respect to gender, we conducted subjective evaluations with a set of 35 uniform b/w prints from seven different printers with five levels of tint coverage. Our results suggest that the image quality ruler method provides a reliable means to assess macro-uniformity. We then defined and implemented separate features to measure graininess, mottle, large area variation, jitter, and large-scale non-uniformity. The algorithms that we used are largely based on ISO image quality standards. Finally, we used these features computed for a set of test pages and the subjects' image quality ruler assessments of these pages to train the two different predictors - one based on linear regression and the other based on the support vector machine (SVM). Using five-fold cross-validation, we confirmed the efficacy of our predictor.

  9. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  10. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  11. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  12. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    Science.gov (United States)

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  13. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    Science.gov (United States)

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  14. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    Directory of Open Access Journals (Sweden)

    Joachim de Greeff

    Full Text Available Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference; the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  15. BELM: Bayesian extreme learning machine.

    Science.gov (United States)

    Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J

    2011-03-01

    The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.

  16. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  17. A heuristic method for simulating open-data of arbitrary complexity that can be used to compare and evaluate machine learning methods.

    Science.gov (United States)

    Moore, Jason H; Shestov, Maksim; Schmitt, Peter; Olson, Randal S

    2018-01-01

    A central challenge of developing and evaluating artificial intelligence and machine learning methods for regression and classification is access to data that illuminates the strengths and weaknesses of different methods. Open data plays an important role in this process by making it easy for computational researchers to easily access real data for this purpose. Genomics has in some examples taken a leading role in the open data effort starting with DNA microarrays. While real data from experimental and observational studies is necessary for developing computational methods it is not sufficient. This is because it is not possible to know what the ground truth is in real data. This must be accompanied by simulated data where that balance between signal and noise is known and can be directly evaluated. Unfortunately, there is a lack of methods and software for simulating data with the kind of complexity found in real biological and biomedical systems. We present here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating complex biological and biomedical data. Further, we introduce new methods for developing simulation models that generate data that specifically allows discrimination between different machine learning methods.

  18. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study.

    Science.gov (United States)

    Lyles, Courtney Rees; Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-06-10

    It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals' opinions and the decision-making process. We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms "cervical cancer," "Pap smear," and "Pap test" were analyzed during: (1) January 1-March 13, 2012, and (2) March 14-June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, "New Screening Guidelines for Cervical Cancer"), and many topics in both time periods were addressing basic screening promotion and education (eg, "It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.") It was demonstrated that machine learning

  19. Neural architecture design based on extreme learning machine.

    Science.gov (United States)

    Bueno-Crespo, Andrés; García-Laencina, Pedro J; Sancho-Gómez, José-Luis

    2013-12-01

    Selection of the optimal neural architecture to solve a pattern classification problem entails to choose the relevant input units, the number of hidden neurons and its corresponding interconnection weights. This problem has been widely studied in many research works but their solutions usually involve excessive computational cost in most of the problems and they do not provide a unique solution. This paper proposes a new technique to efficiently design the MultiLayer Perceptron (MLP) architecture for classification using the Extreme Learning Machine (ELM) algorithm. The proposed method provides a high generalization capability and a unique solution for the architecture design. Moreover, the selected final network only retains those input connections that are relevant for the classification task. Experimental results show these advantages. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction.

    Science.gov (United States)

    Luo, Gang

    2016-01-01

    Predictive modeling is a key component of solutions to many healthcare problems. Among all predictive modeling approaches, machine learning methods often achieve the highest prediction accuracy, but suffer from a long-standing open problem precluding their widespread use in healthcare. Most machine learning models give no explanation for their prediction results, whereas interpretability is essential for a predictive model to be adopted in typical healthcare settings. This paper presents the first complete method for automatically explaining results for any machine learning predictive model without degrading accuracy. We did a computer coding implementation of the method. Using the electronic medical record data set from the Practice Fusion diabetes classification competition containing patient records from all 50 states in the United States, we demonstrated the method on predicting type 2 diabetes diagnosis within the next year. For the champion machine learning model of the competition, our method explained prediction results for 87.4 % of patients who were correctly predicted by the model to have type 2 diabetes diagnosis within the next year. Our demonstration showed the feasibility of automatically explaining results for any machine learning predictive model without degrading accuracy.

  1. Advances in independent component analysis and learning machines

    CERN Document Server

    Bingham, Ella; Laaksonen, Jorma; Lampinen, Jouko

    2015-01-01

    In honour of Professor Erkki Oja, one of the pioneers of Independent Component Analysis (ICA), this book reviews key advances in the theory and application of ICA, as well as its influence on signal processing, pattern recognition, machine learning, and data mining. Examples of topics which have developed from the advances of ICA, which are covered in the book are: A unifying probabilistic model for PCA and ICA Optimization methods for matrix decompositions Insights into the FastICA algorithmUnsupervised deep learning Machine vision and image retrieval A review of developments in the t

  2. Using Machine Learning to Advance Personality Assessment and Theory.

    Science.gov (United States)

    Bleidorn, Wiebke; Hopwood, Christopher James

    2018-05-01

    Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have focused on the associations between social media and other digital records with established personality measures. The goal of this article is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation, and provide recommendations for how to use machine learning to advance our understanding of personality.

  3. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  4. A Novel Application of Machine Learning Methods to Model Microcontroller Upset Due to Intentional Electromagnetic Interference

    Science.gov (United States)

    Bilalic, Rusmir

    A novel application of support vector machines (SVMs), artificial neural networks (ANNs), and Gaussian processes (GPs) for machine learning (GPML) to model microcontroller unit (MCU) upset due to intentional electromagnetic interference (IEMI) is presented. In this approach, an MCU performs a counting operation (0-7) while electromagnetic interference in the form of a radio frequency (RF) pulse is direct-injected into the MCU clock line. Injection times with respect to the clock signal are the clock low, clock rising edge, clock high, and the clock falling edge periods in the clock window during which the MCU is performing initialization and executing the counting procedure. The intent is to cause disruption in the counting operation and model the probability of effect (PoE) using machine learning tools. Five experiments were executed as part of this research, each of which contained a set of 38,300 training points and 38,300 test points, for a total of 383,000 total points with the following experiment variables: injection times with respect to the clock signal, injected RF power, injected RF pulse width, and injected RF frequency. For the 191,500 training points, the average training error was 12.47%, while for the 191,500 test points the average test error was 14.85%, meaning that on average, the machine was able to predict MCU upset with an 85.15% accuracy. Leaving out the results for the worst-performing model (SVM with a linear kernel), the test prediction accuracy for the remaining machines is almost 89%. All three machine learning methods (ANNs, SVMs, and GPML) showed excellent and consistent results in their ability to model and predict the PoE on an MCU due to IEMI. The GP approach performed best during training with a 7.43% average training error, while the ANN technique was most accurate during the test with a 10.80% error.

  5. Machine Learning and Deep Learning Models to Predict Runoff Water Quantity and Quality

    Science.gov (United States)

    Bradford, S. A.; Liang, J.; Li, W.; Murata, T.; Simunek, J.

    2017-12-01

    Contaminants can be rapidly transported at the soil surface by runoff to surface water bodies. Physically-based models, which are based on the mathematical description of main hydrological processes, are key tools for predicting surface water impairment. Along with physically-based models, data-driven models are becoming increasingly popular for describing the behavior of hydrological and water resources systems since these models can be used to complement or even replace physically based-models. In this presentation we propose a new data-driven model as an alternative to a physically-based overland flow and transport model. First, we have developed a physically-based numerical model to simulate overland flow and contaminant transport (the HYDRUS-1D overland flow module). A large number of numerical simulations were carried out to develop a database containing information about the impact of various input parameters (weather patterns, surface topography, vegetation, soil conditions, contaminants, and best management practices) on runoff water quantity and quality outputs. This database was used to train data-driven models. Three different methods (Neural Networks, Support Vector Machines, and Recurrence Neural Networks) were explored to prepare input- output functional relations. Results demonstrate the ability and limitations of machine learning and deep learning models to predict runoff water quantity and quality.

  6. MLnet report: training in Europe on machine learning

    OpenAIRE

    Ellebrecht, Mario; Morik, Katharina

    1999-01-01

    Machine learning techniques offer opportunities for a variety of applications and the theory of machine learning investigates problems that are of interest for other fields of computer science (e.g., complexity theory, logic programming, pattern recognition). However, the impacts of machine learning can only be recognized by those who know the techniques and are able to apply them. Hence, teaching machine learning is necessary before this field can diversify computer science. In order ...

  7. Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.

    Science.gov (United States)

    Tian, Yuling; Zhang, Hongxian

    2016-01-01

    For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.

  8. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    Directory of Open Access Journals (Sweden)

    Liu Qingzhong

    2011-12-01

    Full Text Available Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

  9. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  10. Inverse analysis of turbidites by machine learning

    Science.gov (United States)

    Naruse, H.; Nakao, K.

    2017-12-01

    This study aims to propose a method to estimate paleo-hydraulic conditions of turbidity currents from ancient turbidites by using machine-learning technique. In this method, numerical simulation was repeated under various initial conditions, which produces a data set of characteristic features of turbidites. Then, this data set of turbidites is used for supervised training of a deep-learning neural network (NN). Quantities of characteristic features of turbidites in the training data set are given to input nodes of NN, and output nodes are expected to provide the estimates of initial condition of the turbidity current. The optimization of weight coefficients of NN is then conducted to reduce root-mean-square of the difference between the true conditions and the output values of NN. The empirical relationship with numerical results and the initial conditions is explored in this method, and the discovered relationship is used for inversion of turbidity currents. This machine learning can potentially produce NN that estimates paleo-hydraulic conditions from data of ancient turbidites. We produced a preliminary implementation of this methodology. A forward model based on 1D shallow-water equations with a correction of density-stratification effect was employed. This model calculates a behavior of a surge-like turbidity current transporting mixed-size sediment, and outputs spatial distribution of volume per unit area of each grain-size class on the uniform slope. Grain-size distribution was discretized 3 classes. Numerical simulation was repeated 1000 times, and thus 1000 beds of turbidites were used as the training data for NN that has 21000 input nodes and 5 output nodes with two hidden-layers. After the machine learning finished, independent simulations were conducted 200 times in order to evaluate the performance of NN. As a result of this test, the initial conditions of validation data were successfully reconstructed by NN. The estimated values show very small

  11. A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.

    Science.gov (United States)

    Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei

    2017-09-21

    In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.

  12. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    Science.gov (United States)

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  13. Statistical and Machine Learning forecasting methods: Concerns and ways forward.

    Science.gov (United States)

    Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.

  14. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  15. Thutmose - Investigation of Machine Learning-Based Intrusion Detection Systems

    Science.gov (United States)

    2016-06-01

    monitoring. This analyzed payload is within the application layer of the OSI model . The analysis tries to establish whether or not the payload is...24 3.2.5 Model Drift Experiments...ADVERSARIAL ENVIRONMENTS (SPIE DSS 2014) .................................................. 58 APPENDIX C - EVALUATING MODEL DRIFT IN MACHINE LEARNING

  16. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM...

  17. Integrated Method for Personal Thermal Comfort Assessment and Optimization through Users' Feedback, IoT and Machine Learning: A Case Study †.

    Science.gov (United States)

    Salamone, Francesco; Belussi, Lorenzo; Currò, Cristian; Danza, Ludovico; Ghellere, Matteo; Guazzi, Giulia; Lenzi, Bruno; Megale, Valentino; Meroni, Italo

    2018-05-17

    Thermal comfort has become a topic issue in building performance assessment as well as energy efficiency. Three methods are mainly recognized for its assessment. Two of them based on standardized methodologies, face the problem by considering the indoor environment in steady-state conditions (PMV and PPD) and users as active subjects whose thermal perception is influenced by outdoor climatic conditions (adaptive approach). The latter method is the starting point to investigate thermal comfort from an overall perspective by considering endogenous variables besides the traditional physical and environmental ones. Following this perspective, the paper describes the results of an in-field investigation of thermal conditions through the use of nearable and wearable solutions, parametric models and machine learning techniques. The aim of the research is the exploration of the reliability of IoT-based solutions combined with advanced algorithms, in order to create a replicable framework for the assessment and improvement of user thermal satisfaction. For this purpose, an experimental test in real offices was carried out involving eight workers. Parametric models are applied for the assessment of thermal comfort; IoT solutions are used to monitor the environmental variables and the users' parameters; the machine learning CART method allows to predict the users' profile and the thermal comfort perception respect to the indoor environment.

  18. Spectral methods in machine learning and new strategies for very large datasets

    Science.gov (United States)

    Belabbas, Mohamed-Ali; Wolfe, Patrick J.

    2009-01-01

    Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods. PMID:19129490

  19. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    Directory of Open Access Journals (Sweden)

    Rubén Armañanzas

    Full Text Available Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE. Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery.

  20. Distributed Extreme Learning Machine for Nonlinear Learning over Network

    Directory of Open Access Journals (Sweden)

    Songyan Huang

    2015-02-01

    Full Text Available Distributed data collection and analysis over a network are ubiquitous, especially over a wireless sensor network (WSN. To our knowledge, the data model used in most of the distributed algorithms is linear. However, in real applications, the linearity of systems is not always guaranteed. In nonlinear cases, the single hidden layer feedforward neural network (SLFN with radial basis function (RBF hidden neurons has the ability to approximate any continuous functions and, thus, may be used as the nonlinear learning system. However, confined by the communication cost, using the distributed version of the conventional algorithms to train the neural network directly is usually prohibited. Fortunately, based on the theorems provided in the extreme learning machine (ELM literature, we only need to compute the output weights of the SLFN. Computing the output weights itself is a linear learning problem, although the input-output mapping of the overall SLFN is still nonlinear. Using the distributed algorithmto cooperatively compute the output weights of the SLFN, we obtain a distributed extreme learning machine (dELM for nonlinear learning in this paper. This dELM is applied to the regression problem and classification problem to demonstrate its effectiveness and advantages.