WorldWideScience

Sample records for machine learning algorithm

  1. Machine Learning an algorithmic perspective

    CERN Document Server

    Marsland, Stephen

    2009-01-01

    Traditional books on machine learning can be divided into two groups - those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement le

  2. Parallelization of TMVA Machine Learning Algorithms

    CERN Document Server

    Hajili, Mammad

    2017-01-01

    This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.

  3. A distributed algorithm for machine learning

    Science.gov (United States)

    Chen, Shihong

    2018-04-01

    This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.

  4. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  5. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    International Nuclear Information System (INIS)

    Kanemoto, Shigeru; Watanabe, Masaya; Yusa, Noritaka

    2014-01-01

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology

  6. Research on machine learning framework based on random forest algorithm

    Science.gov (United States)

    Ren, Qiong; Cheng, Hui; Han, Hai

    2017-03-01

    With the continuous development of machine learning, industry and academia have released a lot of machine learning frameworks based on distributed computing platform, and have been widely used. However, the existing framework of machine learning is limited by the limitations of machine learning algorithm itself, such as the choice of parameters and the interference of noises, the high using threshold and so on. This paper introduces the research background of machine learning framework, and combined with the commonly used random forest algorithm in machine learning classification algorithm, puts forward the research objectives and content, proposes an improved adaptive random forest algorithm (referred to as ARF), and on the basis of ARF, designs and implements the machine learning framework.

  7. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  8. Interactive Algorithms for Unsupervised Machine Learning

    Science.gov (United States)

    2015-06-01

    in Neural Information Processing Systems, 2013. 14 [3] Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi. On combinato- rial...Myung Jin Choi, Vincent Y F Tan , Animashree Anandkumar, and Alan S Willsky. Learn- ing Latent Tree Graphical Models. Journal of Machine Learning

  9. Relevance as a metric for evaluating machine learning algorithms

    NARCIS (Netherlands)

    Kota Gopalakrishna, A.; Ozcelebi, T.; Liotta, A.; Lukkien, J.J.

    2013-01-01

    In machine learning, the choice of a learning algorithm that is suitable for the application domain is critical. The performance metric used to compare different algorithms must also reflect the concerns of users in the application domain under consideration. In this work, we propose a novel

  10. Machine Learning in Production Systems Design Using Genetic Algorithms

    OpenAIRE

    Abu Qudeiri Jaber; Yamamoto Hidehiko Rizauddin Ramli

    2008-01-01

    To create a solution for a specific problem in machine learning, the solution is constructed from the data or by use a search method. Genetic algorithms are a model of machine learning that can be used to find nearest optimal solution. While the great advantage of genetic algorithms is the fact that they find a solution through evolution, this is also the biggest disadvantage. Evolution is inductive, in nature life does not evolve towards a good solution but it evolves aw...

  11. Randomized Algorithms for Scalable Machine Learning

    OpenAIRE

    Kleiner, Ariel Jacob

    2012-01-01

    Many existing procedures in machine learning and statistics are computationally intractable in the setting of large-scale data. As a result, the advent of rapidly increasing dataset sizes, which should be a boon yielding improved statistical performance, instead severely blunts the usefulness of a variety of existing inferential methods. In this work, we use randomness to ameliorate this lack of scalability by reducing complex, computationally difficult inferential problems to larger sets o...

  12. Component Pin Recognition Using Algorithms Based on Machine Learning

    Science.gov (United States)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  13. A strategy for quantum algorithm design assisted by machine learning

    International Nuclear Information System (INIS)

    Bang, Jeongho; Lee, Jinhyoung; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin

    2014-01-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum–classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch–Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method. (paper)

  14. A strategy for quantum algorithm design assisted by machine learning

    Science.gov (United States)

    Bang, Jeongho; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin; Lee, Jinhyoung

    2014-07-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum-classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch-Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method.

  15. Machine-Learning Algorithms to Code Public Health Spending Accounts.

    Science.gov (United States)

    Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

    Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

  16. Automated Essay Grading using Machine Learning Algorithm

    Science.gov (United States)

    Ramalingam, V. V.; Pandian, A.; Chetry, Prateek; Nigam, Himanshu

    2018-04-01

    Essays are paramount for of assessing the academic excellence along with linking the different ideas with the ability to recall but are notably time consuming when they are assessed manually. Manual grading takes significant amount of evaluator’s time and hence it is an expensive process. Automated grading if proven effective will not only reduce the time for assessment but comparing it with human scores will also make the score realistic. The project aims to develop an automated essay assessment system by use of machine learning techniques by classifying a corpus of textual entities into small number of discrete categories, corresponding to possible grades. Linear regression technique will be utilized for training the model along with making the use of various other classifications and clustering techniques. We intend to train classifiers on the training set, make it go through the downloaded dataset, and then measure performance our dataset by comparing the obtained values with the dataset values. We have implemented our model using java.

  17. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  18. Machine learning algorithms for datasets popularity prediction

    CERN Document Server

    Kancys, Kipras

    2016-01-01

    This report represents continued study where ML algorithms were used to predict databases popularity. Three topics were covered. First of all, there was a discrepancy between old and new meta-data collection procedures, so a reason for that had to be found. Secondly, different parameters were analysed and dropped to make algorithms perform better. And third, it was decided to move modelling part on Spark.

  19. Four Machine Learning Algorithms for Biometrics Fusion: A Comparative Study

    Directory of Open Access Journals (Sweden)

    I. G. Damousis

    2012-01-01

    Full Text Available We examine the efficiency of four machine learning algorithms for the fusion of several biometrics modalities to create a multimodal biometrics security system. The algorithms examined are Gaussian Mixture Models (GMMs, Artificial Neural Networks (ANNs, Fuzzy Expert Systems (FESs, and Support Vector Machines (SVMs. The fusion of biometrics leads to security systems that exhibit higher recognition rates and lower false alarms compared to unimodal biometric security systems. Supervised learning was carried out using a number of patterns from a well-known benchmark biometrics database, and the validation/testing took place with patterns from the same database which were not included in the training dataset. The comparison of the algorithms reveals that the biometrics fusion system is superior to the original unimodal systems and also other fusion schemes found in the literature.

  20. Towards the compression of parton densities through machine learning algorithms

    CERN Document Server

    Carrazza, Stefano

    2016-01-01

    One of the most fascinating challenges in the context of parton density function (PDF) is the determination of the best combined PDF uncertainty from individual PDF sets. Since 2014 multiple methodologies have been developed to achieve this goal. In this proceedings we first summarize the strategy adopted by the PDF4LHC15 recommendation and then, we discuss about a new approach to Monte Carlo PDF compression based on clustering through machine learning algorithms.

  1. MACHINE LEARNING METHODS IN DIGITAL AGRICULTURE: ALGORITHMS AND CASES

    Directory of Open Access Journals (Sweden)

    Aleksandr Vasilyevich Koshkarov

    2018-05-01

    Full Text Available Ensuring food security is a major challenge in many countries. With a growing global population, the issues of improving the efficiency of agriculture have become most relevant. Farmers are looking for new ways to increase yields, and governments of different countries are developing new programs to support agriculture. This contributes to a more active implementation of digital technologies in agriculture, helping farmers to make better decisions, increase yields and take care of the environment. The central point is the collection and analysis of data. In the industry of agriculture, data can be collected from different sources and may contain useful patterns that identify potential problems or opportunities. Data should be analyzed using machine learning algorithms to extract useful insights. Such methods of precision farming allow the farmer to monitor individual parts of the field, optimize the consumption of water and chemicals, and identify problems quickly. Purpose: to make an overview of the machine learning algorithms used for data analysis in agriculture. Methodology: an overview of the relevant literature; a survey of farmers. Results: relevant algorithms of machine learning for the analysis of data in agriculture at various levels were identified: soil analysis (soil assessment, soil classification, soil fertility predictions, weather forecast (simulation of climate change, temperature and precipitation prediction, and analysis of vegetation (weed identification, vegetation classification, plant disease identification, crop forecasting. Practical implications: agriculture, crop production.

  2. Genetic algorithm enhanced by machine learning in dynamic aperture optimization

    Science.gov (United States)

    Li, Yongjun; Cheng, Weixing; Yu, Li Hua; Rainer, Robert

    2018-05-01

    With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given "elite" status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitness of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. The machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.

  3. Predicting Smoking Status Using Machine Learning Algorithms and Statistical Analysis

    Directory of Open Access Journals (Sweden)

    Charles Frank

    2018-03-01

    Full Text Available Smoking has been proven to negatively affect health in a multitude of ways. As of 2009, smoking has been considered the leading cause of preventable morbidity and mortality in the United States, continuing to plague the country’s overall health. This study aims to investigate the viability and effectiveness of some machine learning algorithms for predicting the smoking status of patients based on their blood tests and vital readings results. The analysis of this study is divided into two parts: In part 1, we use One-way ANOVA analysis with SAS tool to show the statistically significant difference in blood test readings between smokers and non-smokers. The results show that the difference in INR, which measures the effectiveness of anticoagulants, was significant in favor of non-smokers which further confirms the health risks associated with smoking. In part 2, we use five machine learning algorithms: Naïve Bayes, MLP, Logistic regression classifier, J48 and Decision Table to predict the smoking status of patients. To compare the effectiveness of these algorithms we use: Precision, Recall, F-measure and Accuracy measures. The results show that the Logistic algorithm outperformed the four other algorithms with Precision, Recall, F-Measure, and Accuracy of 83%, 83.4%, 83.2%, 83.44%, respectively.

  4. Comparison of machine learning algorithms for detecting coral reef

    Directory of Open Access Journals (Sweden)

    Eduardo Tusa

    2014-09-01

    Full Text Available (Received: 2014/07/31 - Accepted: 2014/09/23This work focuses on developing a fast coral reef detector, which is used for an autonomous underwater vehicle, AUV. A fast detection secures the AUV stabilization respect to an area of reef as fast as possible, and prevents devastating collisions. We use the algorithm of Purser et al. (2009 because of its precision. This detector has two parts: feature extraction that uses Gabor Wavelet filters, and feature classification that uses machine learning based on Neural Networks. Due to the extensive time of the Neural Networks, we exchange for a classification algorithm based on Decision Trees. We use a database of 621 images of coral reef in Belize (110 images for training and 511 images for testing. We implement the bank of Gabor Wavelets filters using C++ and the OpenCV library. We compare the accuracy and running time of 9 machine learning algorithms, whose result was the selection of the Decision Trees algorithm. Our coral detector performs 70ms of running time in comparison to 22s executed by the algorithm of Purser et al. (2009.

  5. Behavioral Modeling for Mental Health using Machine Learning Algorithms.

    Science.gov (United States)

    Srividya, M; Mohanavalli, S; Bhalaji, N

    2018-04-03

    Mental health is an indicator of emotional, psychological and social well-being of an individual. It determines how an individual thinks, feels and handle situations. Positive mental health helps one to work productively and realize their full potential. Mental health is important at every stage of life, from childhood and adolescence through adulthood. Many factors contribute to mental health problems which lead to mental illness like stress, social anxiety, depression, obsessive compulsive disorder, drug addiction, and personality disorders. It is becoming increasingly important to determine the onset of the mental illness to maintain proper life balance. The nature of machine learning algorithms and Artificial Intelligence (AI) can be fully harnessed for predicting the onset of mental illness. Such applications when implemented in real time will benefit the society by serving as a monitoring tool for individuals with deviant behavior. This research work proposes to apply various machine learning algorithms such as support vector machines, decision trees, naïve bayes classifier, K-nearest neighbor classifier and logistic regression to identify state of mental health in a target group. The responses obtained from the target group for the designed questionnaire were first subject to unsupervised learning techniques. The labels obtained as a result of clustering were validated by computing the Mean Opinion Score. These cluster labels were then used to build classifiers to predict the mental health of an individual. Population from various groups like high school students, college students and working professionals were considered as target groups. The research presents an analysis of applying the aforementioned machine learning algorithms on the target groups and also suggests directions for future work.

  6. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil; Harrou, Fouzi; Houacine, Amrane; Sun, Ying

    2017-01-01

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  7. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil

    2017-01-05

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  8. Alignment of Custom Standards by Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Adela Sirbu

    2010-09-01

    Full Text Available Building an efficient model for automatic alignment of terminologies would bring a significant improvement to the information retrieval process. We have developed and compared two machine learning based algorithms whose aim is to align 2 custom standards built on a 3 level taxonomy, using kNN and SVM classifiers that work on a vector representation consisting of several similarity measures. The weights utilized by the kNN were optimized with an evolutionary algorithm, while the SVM classifier's hyper-parameters were optimized with a grid search algorithm. The database used for train was semi automatically obtained by using the Coma++ tool. The performance of our aligners is shown by the results obtained on the test set.

  9. A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

    OpenAIRE

    Hayden Wimmer; Loreen Powell

    2014-01-01

    While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM), a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Baye...

  10. Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

    Science.gov (United States)

    Yadav, B.; Hatfield, K.

    2017-12-01

    We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized

  11. Machine learning based global particle indentification algorithms at LHCb experiment

    CERN Multimedia

    Derkach, Denis; Likhomanenko, Tatiana; Rogozhnikov, Aleksei; Ratnikov, Fedor

    2017-01-01

    One of the most important aspects of data processing at LHC experiments is the particle identification (PID) algorithm. In LHCb, several different sub-detector systems provide PID information: the Ring Imaging CHerenkov (RICH) detector, the hadronic and electromagnetic calorimeters, and the muon chambers. To improve charged particle identification, several neural networks including a deep architecture and gradient boosting have been applied to data. These new approaches provide higher identification efficiencies than existing implementations for all charged particle types. It is also necessary to achieve a flat dependency between efficiencies and spectator variables such as particle momentum, in order to reduce systematic uncertainties during later stages of data analysis. For this purpose, "flat” algorithms that guarantee the flatness property for efficiencies have also been developed. This talk presents this new approach based on machine learning and its performance.

  12. Overlay improvements using a real time machine learning algorithm

    Science.gov (United States)

    Schmitt-Weaver, Emil; Kubis, Michael; Henke, Wolfgang; Slotboom, Daan; Hoogenboom, Tom; Mulkens, Jan; Coogans, Martyn; ten Berge, Peter; Verkleij, Dick; van de Mast, Frank

    2014-04-01

    While semiconductor manufacturing is moving towards the 14nm node using immersion lithography, the overlay requirements are tightened to below 5nm. Next to improvements in the immersion scanner platform, enhancements in the overlay optimization and process control are needed to enable these low overlay numbers. Whereas conventional overlay control methods address wafer and lot variation autonomously with wafer pre exposure alignment metrology and post exposure overlay metrology, we see a need to reduce these variations by correlating more of the TWINSCAN system's sensor data directly to the post exposure YieldStar metrology in time. In this paper we will present the results of a study on applying a real time control algorithm based on machine learning technology. Machine learning methods use context and TWINSCAN system sensor data paired with post exposure YieldStar metrology to recognize generic behavior and train the control system to anticipate on this generic behavior. Specific for this study, the data concerns immersion scanner context, sensor data and on-wafer measured overlay data. By making the link between the scanner data and the wafer data we are able to establish a real time relationship. The result is an inline controller that accounts for small changes in scanner hardware performance in time while picking up subtle lot to lot and wafer to wafer deviations introduced by wafer processing.

  13. Effective and efficient optics inspection approach using machine learning algorithms

    International Nuclear Information System (INIS)

    Abdulla, G.; Kegelmeyer, L.; Liao, Z.; Carr, W.

    2010-01-01

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is 'truthed' or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called 'Avatar Tools' is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  14. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  15. Spike sorting based upon machine learning algorithms (SOMA).

    Science.gov (United States)

    Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F

    2007-02-15

    We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.

  16. Modeling the Swift Bat Trigger Algorithm with Machine Learning

    Science.gov (United States)

    Graff, Philip B.; Lien, Amy Y.; Baker, John G.; Sakamoto, Takanori

    2016-01-01

    To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift / BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien et al. is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of greater than or equal to 97 percent (less than or equal to 3 percent error), which is a significant improvement on a cut in GRB flux, which has an accuracy of 89.6 percent (10.4 percent error). These models are then used to measure the detection efficiency of Swift as a function of redshift z, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of n (sub 0) approaching 0.48 (sup plus 0.41) (sub minus 0.23) per cubic gigaparsecs per year with power-law indices of n (sub 1) approaching 1.7 (sup plus 0.6) (sub minus 0.5) and n (sub 2) approaching minus 5.9 (sup plus 5.7) (sub minus 0.1) for GRBs above and below a break point of z (redshift) (sub 1) approaching 6.8 (sup plus 2.8) (sub minus 3.2). This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting.

  17. Modeling the Swift BAT Trigger Algorithm with Machine Learning

    Science.gov (United States)

    Graff, Philip B.; Lien, Amy Y.; Baker, John G.; Sakamoto, Takanori

    2015-01-01

    To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien et al. (2014) is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of approximately greater than 97% (approximately less than 3% error), which is a significant improvement on a cut in GRB flux which has an accuracy of 89:6% (10:4% error). These models are then used to measure the detection efficiency of Swift as a function of redshift z, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of eta(sub 0) approximately 0.48(+0.41/-0.23) Gpc(exp -3) yr(exp -1) with power-law indices of eta(sub 1) approximately 1.7(+0.6/-0.5) and eta(sub 2) approximately -5.9(+5.7/-0.1) for GRBs above and below a break point of z(sub 1) approximately 6.8(+2.8/-3.2). This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting. The code used in this is analysis is publicly available online.

  18. Two-Stage Electricity Demand Modeling Using Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Krzysztof Gajowniczek

    2017-10-01

    Full Text Available Forecasting of electricity demand has become one of the most important areas of research in the electric power industry, as it is a critical component of cost-efficient power system management and planning. In this context, accurate and robust load forecasting is supposed to play a key role in reducing generation costs, and deals with the reliability of the power system. However, due to demand peaks in the power system, forecasts are inaccurate and prone to high numbers of errors. In this paper, our contributions comprise a proposed data-mining scheme for demand modeling through peak detection, as well as the use of this information to feed the forecasting system. For this purpose, we have taken a different approach from that of time series forecasting, representing it as a two-stage pattern recognition problem. We have developed a peak classification model followed by a forecasting model to estimate an aggregated demand volume. We have utilized a set of machine learning algorithms to benefit from both accurate detection of the peaks and precise forecasts, as applied to the Polish power system. The key finding is that the algorithms can detect 96.3% of electricity peaks (load value equal to or above the 99th percentile of the load distribution and deliver accurate forecasts, with mean absolute percentage error (MAPE of 3.10% and resistant mean absolute percentage error (r-MAPE of 2.70% for the 24 h forecasting horizon.

  19. How the machine ‘thinks’: Understanding opacity in machine learning algorithms

    Directory of Open Access Journals (Sweden)

    Jenna Burrell

    2016-01-01

    Full Text Available This article considers the issue of opacity as a problem for socially consequential mechanisms of classification and ranking, such as spam filters, credit card fraud detection, search engines, news trends, market segmentation and advertising, insurance or loan qualification, and credit scoring. These mechanisms of classification all frequently rely on computational algorithms, and in many cases on machine learning algorithms to do this work. In this article, I draw a distinction between three forms of opacity: (1 opacity as intentional corporate or state secrecy, (2 opacity as technical illiteracy, and (3 an opacity that arises from the characteristics of machine learning algorithms and the scale required to apply them usefully. The analysis in this article gets inside the algorithms themselves. I cite existing literatures in computer science, known industry practices (as they are publicly presented, and do some testing and manipulation of code as a form of lightweight code audit. I argue that recognizing the distinct forms of opacity that may be coming into play in a given application is a key to determining which of a variety of technical and non-technical solutions could help to prevent harm.

  20. Machine Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  1. Developing robust arsenic awareness prediction models using machine learning algorithms.

    Science.gov (United States)

    Singh, Sushant K; Taylor, Robert W; Rahman, Mohammad Mahmudur; Pradhan, Biswajeet

    2018-04-01

    Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Naïve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63%) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness-a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a

  2. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality.

    Science.gov (United States)

    Braithwaite, Scott R; Giraud-Carrier, Christophe; West, Josh; Barnes, Michael D; Hanson, Carl Lee

    2016-05-16

    One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data.

  3. Experimental analysis of the performance of machine learning algorithms in the classification of navigation accident records

    Directory of Open Access Journals (Sweden)

    REIS, M V. S. de A.

    2017-06-01

    Full Text Available This paper aims to evaluate the use of machine learning techniques in a database of marine accidents. We analyzed and evaluated the main causes and types of marine accidents in the Northern Fluminense region. For this, machine learning techniques were used. The study showed that the modeling can be done in a satisfactory manner using different configurations of classification algorithms, varying the activation functions and training parameters. The SMO (Sequential Minimal Optimization algorithm showed the best performance result.

  4. Prediction of Employee Turnover in Organizations using Machine Learning Algorithms

    OpenAIRE

    Rohit Punnoose; Pankaj Ajit

    2016-01-01

    Employee turnover has been identified as a key issue for organizations because of its adverse impact on work place productivity and long term growth strategies. To solve this problem, organizations use machine learning techniques to predict employee turnover. Accurate predictions enable organizations to take action for retention or succession planning of employees. However, the data for this modeling problem comes from HR Information Systems (HRIS); these are typically under-funded compared t...

  5. Machine learning algorithms to classify spinal muscular atrophy subtypes.

    Science.gov (United States)

    Srivastava, Tuhin; Darras, Basil T; Wu, Jim S; Rutkove, Seward B

    2012-07-24

    The development of better biomarkers for disease assessment remains an ongoing effort across the spectrum of neurologic illnesses. One approach for refining biomarkers is based on the concept of machine learning, in which individual, unrelated biomarkers are simultaneously evaluated. In this cross-sectional study, we assess the possibility of using machine learning, incorporating both quantitative muscle ultrasound (QMU) and electrical impedance myography (EIM) data, for classification of muscles affected by spinal muscular atrophy (SMA). Twenty-one normal subjects, 15 subjects with SMA type 2, and 10 subjects with SMA type 3 underwent EIM and QMU measurements of unilateral biceps, wrist extensors, quadriceps, and tibialis anterior. EIM and QMU parameters were then applied in combination using a support vector machine (SVM), a type of machine learning, in an attempt to accurately categorize 165 individual muscles. For all 3 classification problems, normal vs SMA, normal vs SMA 3, and SMA 2 vs SMA 3, use of SVM provided the greatest accuracy in discrimination, surpassing both EIM and QMU individually. For example, the accuracy, as measured by the receiver operating characteristic area under the curve (ROC-AUC) for the SVM discriminating SMA 2 muscles from SMA 3 muscles was 0.928; in comparison, the ROC-AUCs for EIM and QMU parameters alone were only 0.877 (p < 0.05) and 0.627 (p < 0.05), respectively. Combining EIM and QMU data categorizes individual SMA-affected muscles with very high accuracy. Further investigation of this approach for classifying and for following the progression of neuromuscular illness is warranted.

  6. CAT-PUMA: CME Arrival Time Prediction Using Machine learning Algorithms

    Science.gov (United States)

    Liu, Jiajia; Ye, Yudong; Shen, Chenglong; Wang, Yuming; Erdélyi, Robert

    2018-04-01

    CAT-PUMA (CME Arrival Time Prediction Using Machine learning Algorithms) quickly and accurately predicts the arrival of Coronal Mass Ejections (CMEs) of CME arrival time. The software was trained via detailed analysis of CME features and solar wind parameters using 182 previously observed geo-effective partial-/full-halo CMEs and uses algorithms of the Support Vector Machine (SVM) to make its predictions, which can be made within minutes of providing the necessary input parameters of a CME.

  7. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    Science.gov (United States)

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (pmachine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273

  8. Application of Machine Learning Algorithms for the Query Performance Prediction

    Directory of Open Access Journals (Sweden)

    MILICEVIC, M.

    2015-08-01

    Full Text Available This paper analyzes the relationship between the system load/throughput and the query response time in a real Online transaction processing (OLTP system environment. Although OLTP systems are characterized by short transactions, which normally entail high availability and consistent short response times, the need for operational reporting may jeopardize these objectives. We suggest a new approach to performance prediction for concurrent database workloads, based on the system state vector which consists of 36 attributes. There is no bias to the importance of certain attributes, but the machine learning methods are used to determine which attributes better describe the behavior of the particular database server and how to model that system. During the learning phase, the system's profile is created using multiple reference queries, which are selected to represent frequent business processes. The possibility of the accurate response time prediction may be a foundation for automated decision-making for database (DB query scheduling. Possible applications of the proposed method include adaptive resource allocation, quality of service (QoS management or real-time dynamic query scheduling (e.g. estimation of the optimal moment for a complex query execution.

  9. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography.

    Science.gov (United States)

    Narula, Sukrit; Shameer, Khader; Salem Omar, Alaa Mabrouk; Dudley, Joel T; Sengupta, Partho P

    2016-11-29

    Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p 13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience. Copyright © 2016 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  10. Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method

    Directory of Open Access Journals (Sweden)

    Muneki Yasuda

    2018-04-01

    Full Text Available The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the most important models in Markov random fields. However, the inference and learning problems in the Boltzmann machine are NP-hard. The investigation of an effective learning algorithm for the Boltzmann machine is one of the most important challenges in the field of statistical machine learning. In this paper, we study Boltzmann machine learning based on the (first-order spatial Monte Carlo integration method, referred to as the 1-SMCI learning method, which was proposed in the author’s previous paper. In the first part of this paper, we compare the method with the maximum pseudo-likelihood estimation (MPLE method using a theoretical and a numerical approaches, and show the 1-SMCI learning method is more effective than the MPLE. In the latter part, we compare the 1-SMCI learning method with other effective methods, ratio matching and minimum probability flow, using a numerical experiment, and show the 1-SMCI learning method outperforms them.

  11. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    Science.gov (United States)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  12. Using machine learning algorithms to guide rehabilitation planning for home care clients.

    Science.gov (United States)

    Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul

    2007-12-20

    Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.

  13. Optimisation of a machine learning algorithm in human locomotion using principal component and discriminant function analyses.

    Science.gov (United States)

    Bisele, Maria; Bencsik, Martin; Lewis, Martin G C; Barnett, Cleveland T

    2017-01-01

    Assessment methods in human locomotion often involve the description of normalised graphical profiles and/or the extraction of discrete variables. Whilst useful, these approaches may not represent the full complexity of gait data. Multivariate statistical methods, such as Principal Component Analysis (PCA) and Discriminant Function Analysis (DFA), have been adopted since they have the potential to overcome these data handling issues. The aim of the current study was to develop and optimise a specific machine learning algorithm for processing human locomotion data. Twenty participants ran at a self-selected speed across a 15m runway in barefoot and shod conditions. Ground reaction forces (BW) and kinematics were measured at 1000 Hz and 100 Hz, respectively from which joint angles (°), joint moments (N.m.kg-1) and joint powers (W.kg-1) for the hip, knee and ankle joints were calculated in all three anatomical planes. Using PCA and DFA, power spectra of the kinematic and kinetic variables were used as a training database for the development of a machine learning algorithm. All possible combinations of 10 out of 20 participants were explored to find the iteration of individuals that would optimise the machine learning algorithm. The results showed that the algorithm was able to successfully predict whether a participant ran shod or barefoot in 93.5% of cases. To the authors' knowledge, this is the first study to optimise the development of a machine learning algorithm.

  14. Four wind speed multi-step forecasting models using extreme learning machines and signal decomposing algorithms

    International Nuclear Information System (INIS)

    Liu, Hui; Tian, Hong-qi; Li, Yan-fei

    2015-01-01

    Highlights: • A hybrid architecture is proposed for the wind speed forecasting. • Four algorithms are used for the wind speed multi-scale decomposition. • The extreme learning machines are employed for the wind speed forecasting. • All the proposed hybrid models can generate the accurate results. - Abstract: Realization of accurate wind speed forecasting is important to guarantee the safety of wind power utilization. In this paper, a new hybrid forecasting architecture is proposed to realize the wind speed accurate forecasting. In this architecture, four different hybrid models are presented by combining four signal decomposing algorithms (e.g., Wavelet Decomposition/Wavelet Packet Decomposition/Empirical Mode Decomposition/Fast Ensemble Empirical Mode Decomposition) and Extreme Learning Machines. The originality of the study is to investigate the promoted percentages of the Extreme Learning Machines by those mainstream signal decomposing algorithms in the multiple step wind speed forecasting. The results of two forecasting experiments indicate that: (1) the method of Extreme Learning Machines is suitable for the wind speed forecasting; (2) by utilizing the decomposing algorithms, all the proposed hybrid algorithms have better performance than the single Extreme Learning Machines; (3) in the comparisons of the decomposing algorithms in the proposed hybrid architecture, the Fast Ensemble Empirical Mode Decomposition has the best performance in the three-step forecasting results while the Wavelet Packet Decomposition has the best performance in the one and two step forecasting results. At the same time, the Wavelet Packet Decomposition and the Fast Ensemble Empirical Mode Decomposition are better than the Wavelet Decomposition and the Empirical Mode Decomposition in all the step predictions, respectively; and (4) the proposed algorithms are effective in the wind speed accurate predictions

  15. New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems

    Directory of Open Access Journals (Sweden)

    Xiguang Li

    2017-01-01

    Full Text Available Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA, is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent.

  16. Performance Evaluation of Machine Learning Algorithms for Urban Pattern Recognition from Multi-spectral Satellite Images

    Directory of Open Access Journals (Sweden)

    Marc Wieland

    2014-03-01

    Full Text Available In this study, a classification and performance evaluation framework for the recognition of urban patterns in medium (Landsat ETM, TM and MSS and very high resolution (WorldView-2, Quickbird, Ikonos multi-spectral satellite images is presented. The study aims at exploring the potential of machine learning algorithms in the context of an object-based image analysis and to thoroughly test the algorithm’s performance under varying conditions to optimize their usage for urban pattern recognition tasks. Four classification algorithms, Normal Bayes, K Nearest Neighbors, Random Trees and Support Vector Machines, which represent different concepts in machine learning (probabilistic, nearest neighbor, tree-based, function-based, have been selected and implemented on a free and open-source basis. Particular focus is given to assess the generalization ability of machine learning algorithms and the transferability of trained learning machines between different image types and image scenes. Moreover, the influence of the number and choice of training data, the influence of the size and composition of the feature vector and the effect of image segmentation on the classification accuracy is evaluated.

  17. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    importance in the projection (VIP) information of the DPLS method. The power of the gene selection method and the proposed supervised hierarchical clustering method is illustrated on a three microarray data sets of leukemia, breast, and colon cancer. Supervised machine learning algorithms thus enable...

  18. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  19. Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

    Science.gov (United States)

    Gaur, Pallavi; Chaturvedi, Anoop

    2017-07-22

    The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.

  20. Machine Learning Algorithms for $b$-Jet Tagging at the ATLAS Experiment

    CERN Document Server

    Paganini, Michela; The ATLAS collaboration

    2017-01-01

    The separation of $b$-quark initiated jets from those coming from lighter quark flavors ($b$-tagging) is a fundamental tool for the ATLAS physics program at the CERN Large Hadron Collider. The most powerful $b$-tagging algorithms combine information from low-level taggers, exploiting reconstructed track and vertex information, into machine learning classifiers. The potential of modern deep learning techniques is explored using simulated events, and compared to that achievable from more traditional classifiers such as boosted decision trees.

  1. ASSESSMENT OF PERFORMANCES OF VARIOUS MACHINE LEARNING ALGORITHMS DURING AUTOMATED EVALUATION OF DESCRIPTIVE ANSWERS

    Directory of Open Access Journals (Sweden)

    C. Sunil Kumar

    2014-07-01

    Full Text Available Automation of descriptive answers evaluation is the need of the hour because of the huge increase in the number of students enrolling each year in educational institutions and the limited staff available to spare their time for evaluations. In this paper, we use a machine learning workbench called LightSIDE to accomplish auto evaluation and scoring of descriptive answers. We attempted to identify the best supervised machine learning algorithm given a limited training set sample size scenario. We evaluated performances of Bayes, SVM, Logistic Regression, Random forests, Decision stump and Decision trees algorithms. We confirmed SVM as best performing algorithm based on quantitative measurements across accuracy, kappa, training speed and prediction accuracy with supplied test set.

  2. Behavioral Profiling of Scada Network Traffic Using Machine Learning Algorithms

    Science.gov (United States)

    2014-03-27

    Acquisition ( SCADA ) System Overview SCADA systems control and monitor processes for water distribution, oil and natural gas pipelines , electrical...the desire for remote control and monitoring of industrial processes. The ability to identify SCADA devices on a mixed traffic network with zero...optimal attribute subset, while maintaining the desired TPR of .99 for SCADA network traffic. The attributes and ML algorithms chosen for

  3. Energy-efficient algorithm for classification of states of wireless sensor network using machine learning methods

    Science.gov (United States)

    Yuldashev, M. N.; Vlasov, A. I.; Novikov, A. N.

    2018-05-01

    This paper focuses on the development of an energy-efficient algorithm for classification of states of a wireless sensor network using machine learning methods. The proposed algorithm reduces energy consumption by: 1) elimination of monitoring of parameters that do not affect the state of the sensor network, 2) reduction of communication sessions over the network (the data are transmitted only if their values can affect the state of the sensor network). The studies of the proposed algorithm have shown that at classification accuracy close to 100%, the number of communication sessions can be reduced by 80%.

  4. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...

  5. Evaluation of machine learning algorithms for improved risk assessment for Down's syndrome.

    Science.gov (United States)

    Koivu, Aki; Korpimäki, Teemu; Kivelä, Petri; Pahikkala, Tapio; Sairanen, Mikko

    2018-05-04

    Prenatal screening generates a great amount of data that is used for predicting risk of various disorders. Prenatal risk assessment is based on multiple clinical variables and overall performance is defined by how well the risk algorithm is optimized for the population in question. This article evaluates machine learning algorithms to improve performance of first trimester screening of Down syndrome. Machine learning algorithms pose an adaptive alternative to develop better risk assessment models using the existing clinical variables. Two real-world data sets were used to experiment with multiple classification algorithms. Implemented models were tested with a third, real-world, data set and performance was compared to a predicate method, a commercial risk assessment software. Best performing deep neural network model gave an area under the curve of 0.96 and detection rate of 78% with 1% false positive rate with the test data. Support vector machine model gave area under the curve of 0.95 and detection rate of 61% with 1% false positive rate with the same test data. When compared with the predicate method, the best support vector machine model was slightly inferior, but an optimized deep neural network model was able to give higher detection rates with same false positive rate or similar detection rate but with markedly lower false positive rate. This finding could further improve the first trimester screening for Down syndrome, by using existing clinical variables and a large training data derived from a specific population. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

    Science.gov (United States)

    Karim, Mohammad Ehsanul; Pang, Menglan; Platt, Robert W

    2018-03-01

    The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

  7. A Comparison Study of Machine Learning Based Algorithms for Fatigue Crack Growth Calculation.

    Science.gov (United States)

    Wang, Hongxun; Zhang, Weifang; Sun, Fuqiang; Zhang, Wei

    2017-05-18

    The relationships between the fatigue crack growth rate ( d a / d N ) and stress intensity factor range ( Δ K ) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are diverse in different materials. However, most existing fatigue crack growth models cannot handle these nonlinearities appropriately. The machine learning method provides a flexible approach to the modeling of fatigue crack growth because of its excellent nonlinear approximation and multivariable learning ability. In this paper, a fatigue crack growth calculation method is proposed based on three different machine learning algorithms (MLAs): extreme learning machine (ELM), radial basis function network (RBFN) and genetic algorithms optimized back propagation network (GABP). The MLA based method is validated using testing data of different materials. The three MLAs are compared with each other as well as the classical two-parameter model ( K * approach). The results show that the predictions of MLAs are superior to those of K * approach in accuracy and effectiveness, and the ELM based algorithms show overall the best agreement with the experimental data out of the three MLAs, for its global optimization and extrapolation ability.

  8. Robust total energy demand estimation with a hybrid Variable Neighborhood Search – Extreme Learning Machine algorithm

    International Nuclear Information System (INIS)

    Sánchez-Oro, J.; Duarte, A.; Salcedo-Sanz, S.

    2016-01-01

    Highlights: • The total energy demand in Spain is estimated with a Variable Neighborhood algorithm. • Socio-economic variables are used, and one year ahead prediction horizon is considered. • Improvement of the prediction with an Extreme Learning Machine network is considered. • Experiments are carried out in real data for the case of Spain. - Abstract: Energy demand prediction is an important problem whose solution is evaluated by policy makers in order to take key decisions affecting the economy of a country. A number of previous approaches to improve the quality of this estimation have been proposed in the last decade, the majority of them applying different machine learning techniques. In this paper, the performance of a robust hybrid approach, composed of a Variable Neighborhood Search algorithm and a new class of neural network called Extreme Learning Machine, is discussed. The Variable Neighborhood Search algorithm is focused on obtaining the most relevant features among the set of initial ones, by including an exponential prediction model. While previous approaches consider that the number of macroeconomic variables used for prediction is a parameter of the algorithm (i.e., it is fixed a priori), the proposed Variable Neighborhood Search method optimizes both: the number of variables and the best ones. After this first step of feature selection, an Extreme Learning Machine network is applied to obtain the final energy demand prediction. Experiments in a real case of energy demand estimation in Spain show the excellent performance of the proposed approach. In particular, the whole method obtains an estimation of the energy demand with an error lower than 2%, even when considering the crisis years, which are a real challenge.

  9. Anomaly detection in wide area network mesh using two machine learning anomaly detection algorithms

    OpenAIRE

    Zhang, James; Vukotic, Ilija; Gardner, Robert

    2018-01-01

    Anomaly detection is the practice of identifying items or events that do not conform to an expected behavior or do not correlate with other items in a dataset. It has previously been applied to areas such as intrusion detection, system health monitoring, and fraud detection in credit card transactions. In this paper, we describe a new method for detecting anomalous behavior over network performance data, gathered by perfSONAR, using two machine learning algorithms: Boosted Decision Trees (BDT...

  10. Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms

    Science.gov (United States)

    Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M.

    2017-02-01

    We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010-2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite. We detected active regions (ARs) from the full-disk magnetogram, from which ˜60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.

  11. Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms

    Energy Technology Data Exchange (ETDEWEB)

    Nishizuka, N.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M. [Applied Electromagnetic Research Institute, National Institute of Information and Communications Technology, 4-2-1, Nukui-Kitamachi, Koganei, Tokyo 184-8795 (Japan); Sugiura, K., E-mail: nishizuka.naoto@nict.go.jp [Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology (Japan)

    2017-02-01

    We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite . We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.

  12. Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms

    International Nuclear Information System (INIS)

    Nishizuka, N.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M.; Sugiura, K.

    2017-01-01

    We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite . We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.

  13. Machine learning algorithm accurately detects fMRI signature of vulnerability to major depression

    OpenAIRE

    Sato, Jo?o R.; Moll, Jorge; Green, Sophie; Deakin, John F.W.; Thomaz, Carlos E.; Zahn, Roland

    2015-01-01

    Standard functional magnetic resonance imaging (fMRI) analyses cannot assess the potential of a neuroimaging signature as a biomarker to predict individual vulnerability to major depression (MD). Here, we use machine learning for the first time to address this question. Using a recently identified neural signature of guilt-selective functional disconnection, the classification algorithm was able to distinguish remitted MD from control participants with 78.3% accuracy. This demonstrates the hi...

  14. Machine Learning.

    Science.gov (United States)

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  15. Supervised machine learning algorithms to diagnose stress for vehicle drivers based on physiological sensor signals.

    Science.gov (United States)

    Barua, Shaibal; Begum, Shahina; Ahmed, Mobyen Uddin

    2015-01-01

    Machine learning algorithms play an important role in computer science research. Recent advancement in sensor data collection in clinical sciences lead to a complex, heterogeneous data processing, and analysis for patient diagnosis and prognosis. Diagnosis and treatment of patients based on manual analysis of these sensor data are difficult and time consuming. Therefore, development of Knowledge-based systems to support clinicians in decision-making is important. However, it is necessary to perform experimental work to compare performances of different machine learning methods to help to select appropriate method for a specific characteristic of data sets. This paper compares classification performance of three popular machine learning methods i.e., case-based reasoning, neutral networks and support vector machine to diagnose stress of vehicle drivers using finger temperature and heart rate variability. The experimental results show that case-based reasoning outperforms other two methods in terms of classification accuracy. Case-based reasoning has achieved 80% and 86% accuracy to classify stress using finger temperature and heart rate variability. On contrary, both neural network and support vector machine have achieved less than 80% accuracy by using both physiological signals.

  16. Machine Learning Algorithms for $b$-Jet Tagging at the ATLAS Experiment

    CERN Document Server

    Paganini, Michela; The ATLAS collaboration

    2017-01-01

    The separation of b-quark initiated jets from those coming from lighter quark flavours (b-tagging) is a fundamental tool for the ATLAS physics program at the CERN Large Hadron Collider. The most powerful b-tagging algorithms combine information from low-level taggers exploiting reconstructed track and vertex information using a multivariate classifier. The potential of modern Machine Learning techniques such as Recurrent Neural Networks and Deep Learning is explored using simulated events, and compared to that achievable from more traditional classifiers such as boosted decision trees.

  17. Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty

    Science.gov (United States)

    Ling, J.; Templeton, J.

    2015-08-01

    Reynolds Averaged Navier Stokes (RANS) models are widely used in industry to predict fluid flows, despite their acknowledged deficiencies. Not only do RANS models often produce inaccurate flow predictions, but there are very limited diagnostics available to assess RANS accuracy for a given flow configuration. If experimental or higher fidelity simulation results are not available for RANS validation, there is no reliable method to evaluate RANS accuracy. This paper explores the potential of utilizing machine learning algorithms to identify regions of high RANS uncertainty. Three different machine learning algorithms were evaluated: support vector machines, Adaboost decision trees, and random forests. The algorithms were trained on a database of canonical flow configurations for which validated direct numerical simulation or large eddy simulation results were available, and were used to classify RANS results on a point-by-point basis as having either high or low uncertainty, based on the breakdown of specific RANS modeling assumptions. Classifiers were developed for three different basic RANS eddy viscosity model assumptions: the isotropy of the eddy viscosity, the linearity of the Boussinesq hypothesis, and the non-negativity of the eddy viscosity. It is shown that these classifiers are able to generalize to flows substantially different from those on which they were trained. Feature selection techniques, model evaluation, and extrapolation detection are discussed in the context of turbulence modeling applications.

  18. Machine-Learning Research

    OpenAIRE

    Dietterich, Thomas G.

    1997-01-01

    Machine-learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (1) the improvement of classification accuracy by learning ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

  19. Auto-SEIA: simultaneous optimization of image processing and machine learning algorithms

    Science.gov (United States)

    Negro Maggio, Valentina; Iocchi, Luca

    2015-02-01

    Object classification from images is an important task for machine vision and it is a crucial ingredient for many computer vision applications, ranging from security and surveillance to marketing. Image based object classification techniques properly integrate image processing and machine learning (i.e., classification) procedures. In this paper we present a system for automatic simultaneous optimization of algorithms and parameters for object classification from images. More specifically, the proposed system is able to process a dataset of labelled images and to return a best configuration of image processing and classification algorithms and of their parameters with respect to the accuracy of classification. Experiments with real public datasets are used to demonstrate the effectiveness of the developed system.

  20. AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

    International Nuclear Information System (INIS)

    Farrell, Sean A.; Murphy, Tara; Lo, Kitty K.

    2015-01-01

    In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of a random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.

  1. Separation of pulsar signals from noise using supervised machine learning algorithms

    Science.gov (United States)

    Bethapudi, S.; Desai, S.

    2018-04-01

    We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP), Adaboost, Gradient Boosting Classifier (GBC), and XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pipeline. This dataset was previously used for the cross-validation of the SPINN-based machine learning engine, obtained from the reprocessing of the HTRU-S survey data (Morello et al., 2014). We have used the Synthetic Minority Over-sampling Technique (SMOTE) to deal with high-class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean for both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in Morello et al. (2014), for the same recall value.

  2. Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers

    International Nuclear Information System (INIS)

    Roche-Lima, Abiel; Thulasiram, Ruppa K

    2012-01-01

    Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.

  3. Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    Science.gov (United States)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2016-03-01

    Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.

  4. Classifying spatially heterogeneous wetland communities using machine learning algorithms and spectral and textural features.

    Science.gov (United States)

    Szantoi, Zoltan; Escobedo, Francisco J; Abd-Elrahman, Amr; Pearlstine, Leonard; Dewitt, Bon; Smith, Scot

    2015-05-01

    Mapping of wetlands (marsh vs. swamp vs. upland) is a common remote sensing application.Yet, discriminating between similar freshwater communities such as graminoid/sedge fromremotely sensed imagery is more difficult. Most of this activity has been performed using medium to low resolution imagery. There are only a few studies using highspatial resolutionimagery and machine learning image classification algorithms for mapping heterogeneouswetland plantcommunities. This study addresses this void by analyzing whether machine learning classifierssuch as decisiontrees (DT) and artificial neural networks (ANN) can accurately classify graminoid/sedgecommunities usinghigh resolution aerial imagery and image texture data in the Everglades National Park, Florida.In addition tospectral bands, the normalized difference vegetation index, and first- and second-order texturefeatures derivedfrom the near-infrared band were analyzed. Classifier accuracies were assessed using confusiontablesand the calculated kappa coefficients of the resulting maps. The results indicated that an ANN(multilayerperceptron based on backpropagation) algorithm produced a statistically significantly higheraccuracy(82.04%) than the DT (QUEST) algorithm (80.48%) or the maximum likelihood (80.56%)classifier (αtexture features.

  5. Development of a Machine Learning Algorithm for the Surveillance of Autism Spectrum Disorder.

    Directory of Open Access Journals (Sweden)

    Matthew J Maenner

    Full Text Available The Autism and Developmental Disabilities Monitoring (ADDM Network conducts population-based surveillance of autism spectrum disorder (ASD among 8-year old children in multiple US sites. To classify ASD, trained clinicians review developmental evaluations collected from multiple health and education sources to determine whether the child meets the ASD surveillance case criteria. The number of evaluations collected has dramatically increased since the year 2000, challenging the resources and timeliness of the surveillance system. We developed and evaluated a machine learning approach to classify case status in ADDM using words and phrases contained in children's developmental evaluations. We trained a random forest classifier using data from the 2008 Georgia ADDM site which included 1,162 children with 5,396 evaluations (601 children met ADDM ASD criteria using standard ADDM methods. The classifier used the words and phrases from the evaluations to predict ASD case status. We evaluated its performance on the 2010 Georgia ADDM surveillance data (1,450 children with 9,811 evaluations; 754 children met ADDM ASD criteria. We also estimated ASD prevalence using predictions from the classification algorithm. Overall, the machine learning approach predicted ASD case statuses that were 86.5% concordant with the clinician-determined case statuses (84.0% sensitivity, 89.4% predictive value positive. The area under the resulting receiver-operating characteristic curve was 0.932. Algorithm-derived ASD "prevalence" was 1.46% compared to the published (clinician-determined estimate of 1.55%. Using only the text contained in developmental evaluations, a machine learning algorithm was able to discriminate between children that do and do not meet ASD surveillance criteria at one surveillance site.

  6. Machine learning algorithm accurately detects fMRI signature of vulnerability to major depression.

    Science.gov (United States)

    Sato, João R; Moll, Jorge; Green, Sophie; Deakin, John F W; Thomaz, Carlos E; Zahn, Roland

    2015-08-30

    Standard functional magnetic resonance imaging (fMRI) analyses cannot assess the potential of a neuroimaging signature as a biomarker to predict individual vulnerability to major depression (MD). Here, we use machine learning for the first time to address this question. Using a recently identified neural signature of guilt-selective functional disconnection, the classification algorithm was able to distinguish remitted MD from control participants with 78.3% accuracy. This demonstrates the high potential of our fMRI signature as a biomarker of MD vulnerability. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.

  7. Machine learning algorithms for meteorological event classification in the coastal area using in-situ data

    Science.gov (United States)

    Sokolov, Anton; Gengembre, Cyril; Dmitriev, Egor; Delbarre, Hervé

    2017-04-01

    The problem is considered of classification of local atmospheric meteorological events in the coastal area such as sea breezes, fogs and storms. The in-situ meteorological data as wind speed and direction, temperature, humidity and turbulence are used as predictors. Local atmospheric events of 2013-2014 were analysed manually to train classification algorithms in the coastal area of English Channel in Dunkirk (France). Then, ultrasonic anemometer data and LIDAR wind profiler data were used as predictors. A few algorithms were applied to determine meteorological events by local data such as a decision tree, the nearest neighbour classifier, a support vector machine. The comparison of classification algorithms was carried out, the most important predictors for each event type were determined. It was shown that in more than 80 percent of the cases machine learning algorithms detect the meteorological class correctly. We expect that this methodology could be applied also to classify events by climatological in-situ data or by modelling data. It allows estimating frequencies of each event in perspective of climate change.

  8. Predicting the Occurrence of Haze Events in Southeast Asia using Machine Learning Algorithms

    Science.gov (United States)

    Lee, H. H.; Chulakadabba, A.; Tonks, A.; Yang, Z.; Wang, C.

    2017-12-01

    Severe local- and regional-scale air pollution episodes typically originate from 1) high emissions of air pollutants, 2) poor dispersion conditions, and 3) trans-boundary pollutant transport. Biomass burning activities have become more frequent in Southeast Asia, especially in Sumatra, Borneo, and the mainland Southeast. Trans-boundary transport of biomass burning aerosols often lead to air quality problems in the region. Furthermore, particulate pollutants from human activities besides biomass burning also play an important role in the air quality of Southeast Asia. Singapore, for example, has a dynamic industrial sector including chemical, electric and metallurgic industries, and is the region's major petroleum-refining center. In addition, natural gas and oil power plants, waste incinerators, active port traffic, and a major regional airport further complicate Singapore's air quality issues. In this study, we compare five Machine Learning algorithms: k-Nearest Neighbors, Linear Support Vector Machine, Decision Tree, Random Forest and Artificial Neural Network, to identify haze patterns and determine variable importance. The algorithms were trained using local atmospheric data (i.e. months, atmospheric conditions, wind direction and relative humidity) from three observation stations in Singapore (Changi, Seletar and Paya Labar). We find that the algorithms reveal the associations in data within and between the stations, and provide in-depth interpretation of the haze sources. The algorithms also allow us to predict the probability of haze episodes in Singapore and to determine the correlation between this probability and atmospheric conditions.

  9. Machine learning algorithms for mode-of-action classification in toxicity assessment.

    Science.gov (United States)

    Zhang, Yile; Wong, Yau Shu; Deng, Jian; Anton, Cristina; Gabos, Stephan; Zhang, Weiping; Huang, Dorothy Yu; Jin, Can

    2016-01-01

    Real Time Cell Analysis (RTCA) technology is used to monitor cellular changes continuously over the entire exposure period. Combining with different testing concentrations, the profiles have potential in probing the mode of action (MOA) of the testing substances. In this paper, we present machine learning approaches for MOA assessment. Computational tools based on artificial neural network (ANN) and support vector machine (SVM) are developed to analyze the time-concentration response curves (TCRCs) of human cell lines responding to tested chemicals. The techniques are capable of learning data from given TCRCs with known MOA information and then making MOA classification for the unknown toxicity. A novel data processing step based on wavelet transform is introduced to extract important features from the original TCRC data. From the dose response curves, time interval leading to higher classification success rate can be selected as input to enhance the performance of the machine learning algorithm. This is particularly helpful when handling cases with limited and imbalanced data. The validation of the proposed method is demonstrated by the supervised learning algorithm applied to the exposure data of HepG2 cell line to 63 chemicals with 11 concentrations in each test case. Classification success rate in the range of 85 to 95 % are obtained using SVM for MOA classification with two clusters to cases up to four clusters. Wavelet transform is capable of capturing important features of TCRCs for MOA classification. The proposed SVM scheme incorporated with wavelet transform has a great potential for large scale MOA classification and high-through output chemical screening.

  10. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  11. Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms.

    Science.gov (United States)

    Shahinfar, Saleh; Page, David; Guenther, Jerry; Cabrera, Victor; Fricke, Paul; Weigel, Kent

    2014-02-01

    When making the decision about whether or not to breed a given cow, knowledge about the expected outcome would have an economic impact on profitability of the breeding program and net income of the farm. The outcome of each breeding can be affected by many management and physiological features that vary between farms and interact with each other. Hence, the ability of machine learning algorithms to accommodate complex relationships in the data and missing values for explanatory variables makes these algorithms well suited for investigation of reproduction performance in dairy cattle. The objective of this study was to develop a user-friendly and intuitive on-farm tool to help farmers make reproduction management decisions. Several different machine learning algorithms were applied to predict the insemination outcomes of individual cows based on phenotypic and genotypic data. Data from 26 dairy farms in the Alta Genetics (Watertown, WI) Advantage Progeny Testing Program were used, representing a 10-yr period from 2000 to 2010. Health, reproduction, and production data were extracted from on-farm dairy management software, and estimated breeding values were downloaded from the US Department of Agriculture Agricultural Research Service Animal Improvement Programs Laboratory (Beltsville, MD) database. The edited data set consisted of 129,245 breeding records from primiparous Holstein cows and 195,128 breeding records from multiparous Holstein cows. Each data point in the final data set included 23 and 25 explanatory variables and 1 binary outcome for of 0.756 ± 0.005 and 0.736 ± 0.005 for primiparous and multiparous cows, respectively. The naïve Bayes algorithm, Bayesian network, and decision tree algorithms showed somewhat poorer classification performance. An information-based variable selection procedure identified herd average conception rate, incidence of ketosis, number of previous (failed) inseminations, days in milk at breeding, and mastitis as the most

  12. Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

    Science.gov (United States)

    Patel, Nihir; Wang, Jason T L

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  13. Classification and authentication of unknown water samples using machine learning algorithms.

    Science.gov (United States)

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

  14. Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches

    National Research Council Canada - National Science Library

    Qi, Yuan

    2000-01-01

    In this thesis, we propose two new machine learning schemes, a subband-based Independent Component Analysis scheme and a hybrid Independent Component Analysis/Support Vector Machine scheme, and apply...

  15. Machine learning based cloud mask algorithm driven by radiative transfer modeling

    Science.gov (United States)

    Chen, N.; Li, W.; Tanikawa, T.; Hori, M.; Shimada, R.; Stamnes, K. H.

    2017-12-01

    Cloud detection is a critically important first step required to derive many satellite data products. Traditional threshold based cloud mask algorithms require a complicated design process and fine tuning for each sensor, and have difficulty over snow/ice covered areas. With the advance of computational power and machine learning techniques, we have developed a new algorithm based on a neural network classifier driven by extensive radiative transfer modeling. Statistical validation results obtained by using collocated CALIOP and MODIS data show that its performance is consistent over different ecosystems and significantly better than the MODIS Cloud Mask (MOD35 C6) during the winter seasons over mid-latitude snow covered areas. Simulations using a reduced number of satellite channels also show satisfactory results, indicating its flexibility to be configured for different sensors.

  16. Development of Predictive QSAR Models of 4-Thiazolidinones Antitrypanosomal Activity using Modern Machine Learning Algorithms.

    Science.gov (United States)

    Kryshchyshyn, Anna; Devinyak, Oleg; Kaminskyy, Danylo; Grellier, Philippe; Lesyk, Roman

    2017-11-14

    This paper presents novel QSAR models for the prediction of antitrypanosomal activity among thiazolidines and related heterocycles. The performance of four machine learning algorithms: Random Forest regression, Stochastic gradient boosting, Multivariate adaptive regression splines and Gaussian processes regression have been studied in order to reach better levels of predictivity. The results for Random Forest and Gaussian processes regression are comparable and outperform other studied methods. The preliminary descriptor selection with Boruta method improved the outcome of machine learning methods. The two novel QSAR-models developed with Random Forest and Gaussian processes regression algorithms have good predictive ability, which was proved by the external evaluation of the test set with corresponding Q 2 ext =0.812 and Q 2 ext =0.830. The obtained models can be used further for in silico screening of virtual libraries in the same chemical domain in order to find new antitrypanosomal agents. Thorough analysis of descriptors influence in the QSAR models and interpretation of their chemical meaning allows to highlight a number of structure-activity relationships. The presence of phenyl rings with electron-withdrawing atoms or groups in para-position, increased number of aromatic rings, high branching but short chains, high HOMO energy, and the introduction of 1-substituted 2-indolyl fragment into the molecular structure have been recognized as trypanocidal activity prerequisites. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Short-Term Solar Forecasting Performance of Popular Machine Learning Algorithms: Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Florita, Anthony R [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Elgindy, Tarek [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Hodge, Brian S [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Dobbs, Alex [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2017-10-03

    A framework for assessing the performance of short-term solar forecasting is presented in conjunction with a range of numerical results using global horizontal irradiation (GHI) from the open-source Surface Radiation Budget (SURFRAD) data network. A suite of popular machine learning algorithms is compared according to a set of statistically distinct metrics and benchmarked against the persistence-of-cloudiness forecast and a cloud motion forecast. Results show significant improvement compared to the benchmarks with trade-offs among the machine learning algorithms depending on the desired error metric. Training inputs include time series observations of GHI for a history of years, historical weather and atmospheric measurements, and corresponding date and time stamps such that training sensitivities might be inferred. Prediction outputs are GHI forecasts for 1, 2, 3, and 4 hours ahead of the issue time, and they are made for every month of the year for 7 locations. Photovoltaic power and energy outputs can then be made using the solar forecasts to better understand power system impacts.

  18. Open source machine-learning algorithms for the prediction of optimal cancer drug therapies.

    Science.gov (United States)

    Huang, Cai; Mezencev, Roman; McDonald, John F; Vannberg, Fredrik

    2017-01-01

    Precision medicine is a rapidly growing area of modern medical science and open source machine-learning codes promise to be a critical component for the successful development of standardized and automated analysis of patient data. One important goal of precision cancer medicine is the accurate prediction of optimal drug therapies from the genomic profiles of individual patient tumors. We introduce here an open source software platform that employs a highly versatile support vector machine (SVM) algorithm combined with a standard recursive feature elimination (RFE) approach to predict personalized drug responses from gene expression profiles. Drug specific models were built using gene expression and drug response data from the National Cancer Institute panel of 60 human cancer cell lines (NCI-60). The models are highly accurate in predicting the drug responsiveness of a variety of cancer cell lines including those comprising the recent NCI-DREAM Challenge. We demonstrate that predictive accuracy is optimized when the learning dataset utilizes all probe-set expression values from a diversity of cancer cell types without pre-filtering for genes generally considered to be "drivers" of cancer onset/progression. Application of our models to publically available ovarian cancer (OC) patient gene expression datasets generated predictions consistent with observed responses previously reported in the literature. By making our algorithm "open source", we hope to facilitate its testing in a variety of cancer types and contexts leading to community-driven improvements and refinements in subsequent applications.

  19. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Ricardo Andres Pizarro

    2016-12-01

    Full Text Available High-resolution three-dimensional magnetic resonance imaging (3D-MRI is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM algorithm in the quality assessment of structural brain images, using global and region of interest (ROI automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  20. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm.

    Science.gov (United States)

    Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S

    2016-01-01

    High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  1. Open source machine-learning algorithms for the prediction of optimal cancer drug therapies.

    Directory of Open Access Journals (Sweden)

    Cai Huang

    Full Text Available Precision medicine is a rapidly growing area of modern medical science and open source machine-learning codes promise to be a critical component for the successful development of standardized and automated analysis of patient data. One important goal of precision cancer medicine is the accurate prediction of optimal drug therapies from the genomic profiles of individual patient tumors. We introduce here an open source software platform that employs a highly versatile support vector machine (SVM algorithm combined with a standard recursive feature elimination (RFE approach to predict personalized drug responses from gene expression profiles. Drug specific models were built using gene expression and drug response data from the National Cancer Institute panel of 60 human cancer cell lines (NCI-60. The models are highly accurate in predicting the drug responsiveness of a variety of cancer cell lines including those comprising the recent NCI-DREAM Challenge. We demonstrate that predictive accuracy is optimized when the learning dataset utilizes all probe-set expression values from a diversity of cancer cell types without pre-filtering for genes generally considered to be "drivers" of cancer onset/progression. Application of our models to publically available ovarian cancer (OC patient gene expression datasets generated predictions consistent with observed responses previously reported in the literature. By making our algorithm "open source", we hope to facilitate its testing in a variety of cancer types and contexts leading to community-driven improvements and refinements in subsequent applications.

  2. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  3. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms.

    Science.gov (United States)

    Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L

    2013-12-01

    The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Can machine learning explain human learning?

    NARCIS (Netherlands)

    Vahdat, M.; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

    2016-01-01

    Learning Analytics (LA) has a major interest in exploring and understanding the learning process of humans and, for this purpose, benefits from both Cognitive Science, which studies how humans learn, and Machine Learning, which studies how algorithms learn from data. Usually, Machine Learning is

  5. Machine learning algorithms for the creation of clinical healthcare enterprise systems

    Science.gov (United States)

    Mandal, Indrajit

    2017-10-01

    Clinical recommender systems are increasingly becoming popular for improving modern healthcare systems. Enterprise systems are persuasively used for creating effective nurse care plans to provide nurse training, clinical recommendations and clinical quality control. A novel design of a reliable clinical recommender system based on multiple classifier system (MCS) is implemented. A hybrid machine learning (ML) ensemble based on random subspace method and random forest is presented. The performance accuracy and robustness of proposed enterprise architecture are quantitatively estimated to be above 99% and 97%, respectively (above 95% confidence interval). The study then extends to experimental analysis of the clinical recommender system with respect to the noisy data environment. The ranking of items in nurse care plan is demonstrated using machine learning algorithms (MLAs) to overcome the drawback of the traditional association rule method. The promising experimental results are compared against the sate-of-the-art approaches to highlight the advancement in recommendation technology. The proposed recommender system is experimentally validated using five benchmark clinical data to reinforce the research findings.

  6. Field tests and machine learning approaches for refining algorithms and correlations of driver's model parameters.

    Science.gov (United States)

    Tango, Fabio; Minin, Luca; Tesauri, Francesco; Montanari, Roberto

    2010-03-01

    This paper describes the field tests on a driving simulator carried out to validate the algorithms and the correlations of dynamic parameters, specifically driving task demand and drivers' distraction, able to predict drivers' intentions. These parameters belong to the driver's model developed by AIDE (Adaptive Integrated Driver-vehicle InterfacE) European Integrated Project. Drivers' behavioural data have been collected from the simulator tests to model and validate these parameters using machine learning techniques, specifically the adaptive neuro fuzzy inference systems (ANFIS) and the artificial neural network (ANN). Two models of task demand and distraction have been developed, one for each adopted technique. The paper provides an overview of the driver's model, the description of the task demand and distraction modelling and the tests conducted for the validation of these parameters. A test comparing predicted and expected outcomes of the modelled parameters for each machine learning technique has been carried out: for distraction, in particular, promising results (low prediction errors) have been obtained by adopting an artificial neural network.

  7. Developing Novel Machine Learning Algorithms to Improve Sedentary Assessment for Youth Health Enhancement.

    Science.gov (United States)

    Golla, Gowtham Kumar; Carlson, Jordan A; Huan, Jun; Kerr, Jacqueline; Mitchell, Tarrah; Borner, Kelsey

    2016-10-01

    Sedentary behavior of youth is an important determinant of health. However, better measures are needed to improve understanding of this relationship and the mechanisms at play, as well as to evaluate health promotion interventions. Wearable accelerometers are considered as the standard for assessing physical activity in research, but do not perform well for assessing posture (i.e., sitting vs. standing), a critical component of sedentary behavior. The machine learning algorithms that we propose for assessing sedentary behavior will allow us to re-examine existing accelerometer data to better understand the association between sedentary time and health in various populations. We collected two datasets, a laboratory-controlled dataset and a free-living dataset. We trained machine learning classifiers separately on each dataset and compared performance across datasets. The classifiers predict five postures: sit, stand, sit-stand, stand-sit, and stand\\walk. We compared a manually constructed Hidden Markov model (HMM) with an automated HMM from existing software. The manually constructed HMM gave more F1-Macro score on both datasets.

  8. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  9. A Machine Learning Regression scheme to design a FR-Image Quality Assessment Algorithm

    OpenAIRE

    Charrier , Christophe; Lezoray , Olivier; Lebrun , Gilles

    2012-01-01

    International audience; A crucial step in image compression is the evaluation of its performance, and more precisely available ways to measure the quality of compressed images. In this paper, a machine learning expert, providing a quality score is proposed. This quality measure is based on a learned classification process in order to respect that of human observers. The proposed method namely Machine Learning-based Image Quality Measurment (MLIQM) first classifies the quality using multi Supp...

  10. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper

    Science.gov (United States)

    Luo, Gang

    2017-01-01

    For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022

  11. Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Deling Wang

    2018-03-01

    Full Text Available Breast cancer is one of the most common malignancies in women. Patient-derived tumor xenograft (PDX model is a cutting-edge approach for drug research on breast cancer. However, PDX still exhibits differences from original human tumors, thereby challenging the molecular understanding of tumorigenesis. In particular, gene expression changes after tissues are transplanted from human to mouse model. In this study, we propose a novel computational method by incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS, random forest (RF, and rough set-based rule learning, to identify genes with significant expression differences between PDX and original human tumors. First, 831 breast tumors, including 657 PDX and 174 human tumors, were collected. Based on MCFS and RF, 32 genes were then identified to be informative for the prediction of PDX and human tumors and can be used to construct a prediction model. The prediction model exhibits a Matthews coefficient correlation value of 0.777. Seven interpretable interactions within the informative gene were detected based on the rough set-based rule learning. Furthermore, the seven interpretable interactions can be well supported by previous experimental studies. Our study not only presents a method for identifying informative genes with differential expression but also provides insights into the mechanism through which gene expression changes after being transplanted from human tumor into mouse model. This work would be helpful for research and drug development for breast cancer.

  12. Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms.

    Science.gov (United States)

    Wang, Deling; Li, Jia-Rui; Zhang, Yu-Hang; Chen, Lei; Huang, Tao; Cai, Yu-Dong

    2018-03-12

    Breast cancer is one of the most common malignancies in women. Patient-derived tumor xenograft (PDX) model is a cutting-edge approach for drug research on breast cancer. However, PDX still exhibits differences from original human tumors, thereby challenging the molecular understanding of tumorigenesis. In particular, gene expression changes after tissues are transplanted from human to mouse model. In this study, we propose a novel computational method by incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), random forest (RF), and rough set-based rule learning, to identify genes with significant expression differences between PDX and original human tumors. First, 831 breast tumors, including 657 PDX and 174 human tumors, were collected. Based on MCFS and RF, 32 genes were then identified to be informative for the prediction of PDX and human tumors and can be used to construct a prediction model. The prediction model exhibits a Matthews coefficient correlation value of 0.777. Seven interpretable interactions within the informative gene were detected based on the rough set-based rule learning. Furthermore, the seven interpretable interactions can be well supported by previous experimental studies. Our study not only presents a method for identifying informative genes with differential expression but also provides insights into the mechanism through which gene expression changes after being transplanted from human tumor into mouse model. This work would be helpful for research and drug development for breast cancer.

  13. Application of Machine Learning Techniques in Aquaculture

    OpenAIRE

    Rahman, Akhlaqur; Tasnim, Sumaira

    2014-01-01

    In this paper we present applications of different machine learning algorithms in aquaculture. Machine learning algorithms learn models from historical data. In aquaculture historical data are obtained from farm practices, yields, and environmental data sources. Associations between these different variables can be obtained by applying machine learning algorithms to historical data. In this paper we present applications of different machine learning algorithms in aquaculture applications.

  14. A New Tool for CME Arrival Time Prediction using Machine Learning Algorithms: CAT-PUMA

    Science.gov (United States)

    Liu, Jiajia; Ye, Yudong; Shen, Chenglong; Wang, Yuming; Erdélyi, Robert

    2018-03-01

    Coronal mass ejections (CMEs) are arguably the most violent eruptions in the solar system. CMEs can cause severe disturbances in interplanetary space and can even affect human activities in many aspects, causing damage to infrastructure and loss of revenue. Fast and accurate prediction of CME arrival time is vital to minimize the disruption that CMEs may cause when interacting with geospace. In this paper, we propose a new approach for partial-/full halo CME Arrival Time Prediction Using Machine learning Algorithms (CAT-PUMA). Via detailed analysis of the CME features and solar-wind parameters, we build a prediction engine taking advantage of 182 previously observed geo-effective partial-/full halo CMEs and using algorithms of the Support Vector Machine. We demonstrate that CAT-PUMA is accurate and fast. In particular, predictions made after applying CAT-PUMA to a test set unknown to the engine show a mean absolute prediction error of ∼5.9 hr within the CME arrival time, with 54% of the predictions having absolute errors less than 5.9 hr. Comparisons with other models reveal that CAT-PUMA has a more accurate prediction for 77% of the events investigated that can be carried out very quickly, i.e., within minutes of providing the necessary input parameters of a CME. A practical guide containing the CAT-PUMA engine and the source code of two examples are available in the Appendix, allowing the community to perform their own applications for prediction using CAT-PUMA.

  15. Application of Machine Learning Algorithms to the Study of Noise Artifacts in Gravitational-Wave Data

    Science.gov (United States)

    Biswas, Rahul; Blackburn, Lindy L.; Cao, Junwei; Essick, Reed; Hodge, Kari Alison; Katsavounidis, Erotokritos; Kim, Kyungmin; Young-Min, Kim; Le Bigot, Eric-Olivier; Lee, Chang-Hwan; hide

    2014-01-01

    The sensitivity of searches for astrophysical transients in data from the Laser Interferometer Gravitationalwave Observatory (LIGO) is generally limited by the presence of transient, non-Gaussian noise artifacts, which occur at a high-enough rate such that accidental coincidence across multiple detectors is non-negligible. Furthermore, non-Gaussian noise artifacts typically dominate over the background contributed from stationary noise. These "glitches" can easily be confused for transient gravitational-wave signals, and their robust identification and removal will help any search for astrophysical gravitational-waves. We apply Machine Learning Algorithms (MLAs) to the problem, using data from auxiliary channels within the LIGO detectors that monitor degrees of freedom unaffected by astrophysical signals. Terrestrial noise sources may manifest characteristic disturbances in these auxiliary channels, inducing non-trivial correlations with glitches in the gravitational-wave data. The number of auxiliary-channel parameters describing these disturbances may also be extremely large; high dimensionality is an area where MLAs are particularly well-suited. We demonstrate the feasibility and applicability of three very different MLAs: Artificial Neural Networks, Support Vector Machines, and Random Forests. These classifiers identify and remove a substantial fraction of the glitches present in two very different data sets: four weeks of LIGO's fourth science run and one week of LIGO's sixth science run. We observe that all three algorithms agree on which events are glitches to within 10% for the sixth science run data, and support this by showing that the different optimization criteria used by each classifier generate the same decision surface, based on a likelihood-ratio statistic. Furthermore, we find that all classifiers obtain similar limiting performance, suggesting that most of the useful information currently contained in the auxiliary channel parameters we extract

  16. Applications of machine-learning algorithms for infrared colour selection of Galactic Wolf-Rayet stars

    Science.gov (United States)

    Morello, Giuseppe; Morris, P. W.; Van Dyk, S. D.; Marston, A. P.; Mauerhan, J. C.

    2018-01-01

    We have investigated and applied machine-learning algorithms for infrared colour selection of Galactic Wolf-Rayet (WR) candidates. Objects taken from the Spitzer Galactic Legacy Infrared Midplane Survey Extraordinaire (GLIMPSE) catalogue of the infrared objects in the Galactic plane can be classified into different stellar populations based on the colours inferred from their broad-band photometric magnitudes [J, H and Ks from 2 Micron All Sky Survey (2MASS), and the four Spitzer/IRAC bands]. The algorithms tested in this pilot study are variants of the k-nearest neighbours approach, which is ideal for exploratory studies of classification problems where interrelations between variables and classes are complicated. The aims of this study are (1) to provide an automated tool to select reliable WR candidates and potentially other classes of objects, (2) to measure the efficiency of infrared colour selection at performing these tasks and (3) to lay the groundwork for statistically inferring the total number of WR stars in our Galaxy. We report the performance results obtained over a set of known objects and selected candidates for which we have carried out follow-up spectroscopic observations, and confirm the discovery of four new WR stars.

  17. Water quality of Danube Delta systems: ecological status and prediction using machine-learning algorithms.

    Science.gov (United States)

    Stoica, C; Camejo, J; Banciu, A; Nita-Lazar, M; Paun, I; Cristofor, S; Pacheco, O R; Guevara, M

    2016-01-01

    Environmental issues have a worldwide impact on water bodies, including the Danube Delta, the largest European wetland. The Water Framework Directive (2000/60/EC) implementation operates toward solving environmental issues from European and national level. As a consequence, the water quality and the biocenosis structure was altered, especially the composition of the macro invertebrate community which is closely related to habitat and substrate heterogeneity. This study aims to assess the ecological status of Southern Branch of the Danube Delta, Saint Gheorghe, using benthic fauna and a computational method as an alternative for monitoring the water quality in real time. The analysis of spatial and temporal variability of unicriterial and multicriterial indices were used to assess the current status of aquatic systems. In addition, chemical status was characterized. Coliform bacteria and several chemical parameters were used to feed machine-learning (ML) algorithms to simulate a real-time classification method. Overall, the assessment of the water bodies indicated a moderate ecological status based on the biological quality elements or a good ecological status based on chemical and ML algorithms criteria.

  18. Application of machine learning algorithms to the study of noise artifacts in gravitational-wave data

    Science.gov (United States)

    Biswas, Rahul; Blackburn, Lindy; Cao, Junwei; Essick, Reed; Hodge, Kari Alison; Katsavounidis, Erotokritos; Kim, Kyungmin; Kim, Young-Min; Le Bigot, Eric-Olivier; Lee, Chang-Hwan; Oh, John J.; Oh, Sang Hoon; Son, Edwin J.; Tao, Ye; Vaulin, Ruslan; Wang, Xiaoge

    2013-09-01

    The sensitivity of searches for astrophysical transients in data from the Laser Interferometer Gravitational-wave Observatory (LIGO) is generally limited by the presence of transient, non-Gaussian noise artifacts, which occur at a high enough rate such that accidental coincidence across multiple detectors is non-negligible. These “glitches” can easily be mistaken for transient gravitational-wave signals, and their robust identification and removal will help any search for astrophysical gravitational waves. We apply machine-learning algorithms (MLAs) to the problem, using data from auxiliary channels within the LIGO detectors that monitor degrees of freedom unaffected by astrophysical signals. Noise sources may produce artifacts in these auxiliary channels as well as the gravitational-wave channel. The number of auxiliary-channel parameters describing these disturbances may also be extremely large; high dimensionality is an area where MLAs are particularly well suited. We demonstrate the feasibility and applicability of three different MLAs: artificial neural networks, support vector machines, and random forests. These classifiers identify and remove a substantial fraction of the glitches present in two different data sets: four weeks of LIGO’s fourth science run and one week of LIGO’s sixth science run. We observe that all three algorithms agree on which events are glitches to within 10% for the sixth-science-run data, and support this by showing that the different optimization criteria used by each classifier generate the same decision surface, based on a likelihood-ratio statistic. Furthermore, we find that all classifiers obtain similar performance to the benchmark algorithm, the ordered veto list, which is optimized to detect pairwise correlations between transients in LIGO auxiliary channels and glitches in the gravitational-wave data. This suggests that most of the useful information currently extracted from the auxiliary channels is already described

  19. Short communication: Prediction of retention pay-off using a machine learning algorithm.

    Science.gov (United States)

    Shahinfar, Saleh; Kalantari, Afshin S; Cabrera, Victor; Weigel, Kent

    2014-05-01

    Replacement decisions have a major effect on dairy farm profitability. Dynamic programming (DP) has been widely studied to find the optimal replacement policies in dairy cattle. However, DP models are computationally intensive and might not be practical for daily decision making. Hence, the ability of applying machine learning on a prerun DP model to provide fast and accurate predictions of nonlinear and intercorrelated variables makes it an ideal methodology. Milk class (1 to 5), lactation number (1 to 9), month in milk (1 to 20), and month of pregnancy (0 to 9) were used to describe all cows in a herd in a DP model. Twenty-seven scenarios based on all combinations of 3 levels (base, 20% above, and 20% below) of milk production, milk price, and replacement cost were solved with the DP model, resulting in a data set of 122,716 records, each with a calculated retention pay-off (RPO). Then, a machine learning model tree algorithm was used to mimic the evaluated RPO with DP. The correlation coefficient factor was used to observe the concordance of RPO evaluated by DP and RPO predicted by the model tree. The obtained correlation coefficient was 0.991, with a corresponding value of 0.11 for relative absolute error. At least 100 instances were required per model constraint, resulting in 204 total equations (models). When these models were used for binary classification of positive and negative RPO, error rates were 1% false negatives and 9% false positives. Applying this trained model from simulated data for prediction of RPO for 102 actual replacement records from the University of Wisconsin-Madison dairy herd resulted in a 0.994 correlation with 0.10 relative absolute error rate. Overall results showed that model tree has a potential to be used in conjunction with DP to assist farmers in their replacement decisions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  20. Assessing the Performance of a Machine Learning Algorithm in Identifying Bubbles in Dust Emission

    Science.gov (United States)

    Xu, Duo; Offner, Stella S. R.

    2017-12-01

    Stellar feedback created by radiation and winds from massive stars plays a significant role in both physical and chemical evolution of molecular clouds. This energy and momentum leaves an identifiable signature (“bubbles”) that affects the dynamics and structure of the cloud. Most bubble searches are performed “by eye,” which is usually time-consuming, subjective, and difficult to calibrate. Automatic classifications based on machine learning make it possible to perform systematic, quantifiable, and repeatable searches for bubbles. We employ a previously developed machine learning algorithm, Brut, and quantitatively evaluate its performance in identifying bubbles using synthetic dust observations. We adopt magnetohydrodynamics simulations, which model stellar winds launching within turbulent molecular clouds, as an input to generate synthetic images. We use a publicly available three-dimensional dust continuum Monte Carlo radiative transfer code, HYPERION, to generate synthetic images of bubbles in three Spitzer bands (4.5, 8, and 24 μm). We designate half of our synthetic bubbles as a training set, which we use to train Brut along with citizen-science data from the Milky Way Project (MWP). We then assess Brut’s accuracy using the remaining synthetic observations. We find that Brut’s performance after retraining increases significantly, and it is able to identify yellow bubbles, which are likely associated with B-type stars. Brut continues to perform well on previously identified high-score bubbles, and over 10% of the MWP bubbles are reclassified as high-confidence bubbles, which were previously marginal or ambiguous detections in the MWP data. We also investigate the influence of the size of the training set, dust model, evolutionary stage, and background noise on bubble identification.

  1. Machine learning systems

    Energy Technology Data Exchange (ETDEWEB)

    Forsyth, R

    1984-05-01

    With the dramatic rise of expert systems has come a renewed interest in the fuel that drives them-knowledge. For it is specialist knowledge which gives expert systems their power. But extracting knowledge from human experts in symbolic form has proved arduous and labour-intensive. So the idea of machine learning is enjoying a renaissance. Machine learning is any automatic improvement in the performance of a computer system over time, as a result of experience. Thus a learning algorithm seeks to do one or more of the following: cover a wider range of problems, deliver more accurate solutions, obtain answers more cheaply, and simplify codified knowledge. 6 references.

  2. Machine Learning Algorithms For Predicting the Instability Timescales of Compact Planetary Systems

    Science.gov (United States)

    Tamayo, Daniel; Ali-Dib, Mohamad; Cloutier, Ryan; Huang, Chelsea; Van Laerhoven, Christa L.; Leblanc, Rejean; Menou, Kristen; Murray, Norman; Obertas, Alysa; Paradise, Adiv; Petrovich, Cristobal; Rachkov, Aleksandar; Rein, Hanno; Silburt, Ari; Tacik, Nick; Valencia, Diana

    2016-10-01

    The Kepler mission has uncovered hundreds of compact multi-planet systems. The dynamical pathways to instability in these compact systems and their associated timescales are not well understood theoretically. However, long-term stability is often used as a constraint to narrow down the space of orbital solutions from the transit data. This requires a large suite of N-body integrations that can each take several weeks to complete. This computational bottleneck is therefore an important limitation in our ability to characterize compact multi-planet systems.From suites of numerical simulations, previous studies have fit simple scaling relations between the instability timescale and various system parameters. However, the numerically simulated systems can deviate strongly from these empirical fits.We present a new approach to the problem using machine learning algorithms that have enjoyed success across a broad range of high-dimensional industry applications. In particular, we have generated large training sets of direct N-body integrations of synthetic compact planetary systems to train several regression models (support vector machine, gradient boost) that predict the instability timescale. We find that ensembling these models predicts the instability timescale of planetary systems better than previous approaches using the simple scaling relations mentioned above.Finally, we will discuss how these models provide a powerful tool for not only understanding the current Kepler multi-planet sample, but also for characterizing and shaping the radial-velocity follow-up strategies of multi-planet systems from the upcoming Transiting Exoplanet Survey Satellite (TESS) mission, given its shorter observation baselines.

  3. In vitro molecular machine learning algorithm via symmetric internal loops of DNA.

    Science.gov (United States)

    Lee, Ji-Hoon; Lee, Seung Hwan; Baek, Christina; Chun, Hyosun; Ryu, Je-Hwan; Kim, Jin-Woo; Deaton, Russell; Zhang, Byoung-Tak

    2017-08-01

    Programmable biomolecules, such as DNA strands, deoxyribozymes, and restriction enzymes, have been used to solve computational problems, construct large-scale logic circuits, and program simple molecular games. Although studies have shown the potential of molecular computing, the capability of computational learning with DNA molecules, i.e., molecular machine learning, has yet to be experimentally verified. Here, we present a novel molecular learning in vitro model in which symmetric internal loops of double-stranded DNA are exploited to measure the differences between training instances, thus enabling the molecules to learn from small errors. The model was evaluated on a data set of twenty dialogue sentences obtained from the television shows Friends and Prison Break. The wet DNA-computing experiments confirmed that the molecular learning machine was able to generalize the dialogue patterns of each show and successfully identify the show from which the sentences originated. The molecular machine learning model described here opens the way for solving machine learning problems in computer science and biology using in vitro molecular computing with the data encoded in DNA molecules. Copyright © 2017. Published by Elsevier B.V.

  4. A Pathological Brain Detection System based on Extreme Learning Machine Optimized by Bat Algorithm.

    Science.gov (United States)

    Lu, Siyuan; Qiu, Xin; Shi, Jianping; Li, Na; Lu, Zhi-Hai; Chen, Peng; Yang, Meng-Meng; Liu, Fang-Yuan; Jia, Wen-Juan; Zhang, Yudong

    2017-01-01

    It is beneficial to classify brain images as healthy or pathological automatically, because 3D brain images can generate so much information which is time consuming and tedious for manual analysis. Among various 3D brain imaging techniques, magnetic resonance (MR) imaging is the most suitable for brain, and it is now widely applied in hospitals, because it is helpful in the four ways of diagnosis, prognosis, pre-surgical, and postsurgical procedures. There are automatic detection methods; however they suffer from low accuracy. Therefore, we proposed a novel approach which employed 2D discrete wavelet transform (DWT), and calculated the entropies of the subbands as features. Then, a bat algorithm optimized extreme learning machine (BA-ELM) was trained to identify pathological brains from healthy controls. A 10x10-fold cross validation was performed to evaluate the out-of-sample performance. The method achieved a sensitivity of 99.04%, a specificity of 93.89%, and an overall accuracy of 98.33% over 132 MR brain images. The experimental results suggest that the proposed approach is accurate and robust in pathological brain detection. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. A Scalable Neuro-inspired Robot Controller Integrating a Machine Learning Algorithm and a Spiking Cerebellar-like Network

    DEFF Research Database (Denmark)

    Baira Ojeda, Ismael; Tolu, Silvia; Lund, Henrik Hautop

    2017-01-01

    Combining Fable robot, a modular robot, with a neuroinspired controller, we present the proof of principle of a system that can scale to several neurally controlled compliant modules. The motor control and learning of a robot module are carried out by a Unit Learning Machine (ULM) that embeds...... the Locally Weighted Projection Regression algorithm (LWPR) and a spiking cerebellar-like microcircuit. The LWPR guarantees both an optimized representation of the input space and the learning of the dynamic internal model (IM) of the robot. However, the cerebellar-like sub-circuit integrates LWPR input...

  6. Genetic Algorithms for Optimization of Machine-learning Models and their Applications in Bioinformatics

    KAUST Repository

    Magana-Mora, Arturo

    2017-04-29

    Machine-learning (ML) techniques have been widely applied to solve different problems in biology. However, biological data are large and complex, which often result in extremely intricate ML models. Frequently, these models may have a poor performance or may be computationally unfeasible. This study presents a set of novel computational methods and focuses on the application of genetic algorithms (GAs) for the simplification and optimization of ML models and their applications to biological problems. The dissertation addresses the following three challenges. The first is to develop a generalizable classification methodology able to systematically derive competitive models despite the complexity and nature of the data. Although several algorithms for the induction of classification models have been proposed, the algorithms are data dependent. Consequently, we developed OmniGA, a novel and generalizable framework that uses different classification models in a treeXlike decision structure, along with a parallel GA for the optimization of the OmniGA structure. Results show that OmniGA consistently outperformed existing commonly used classification models. The second challenge is the prediction of translation initiation sites (TIS) in plants genomic DNA. We performed a statistical analysis of the genomic DNA and proposed a new set of discriminant features for this problem. We developed a wrapper method based on GAs for selecting an optimal feature subset, which, in conjunction with a classification model, produced the most accurate framework for the recognition of TIS in plants. Finally, results demonstrate that despite the evolutionary distance between different plants, our approach successfully identified conserved genomic elements that may serve as the starting point for the development of a generic model for prediction of TIS in eukaryotic organisms. Finally, the third challenge is the accurate prediction of polyadenylation signals in human genomic DNA. To achieve

  7. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  8. Machine Learning in Medicine

    Science.gov (United States)

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  9. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study.

    Science.gov (United States)

    Olivera, André Rodrigues; Roesler, Valter; Iochpe, Cirano; Schmidt, Maria Inês; Vigo, Álvaro; Barreto, Sandhi Maria; Duncan, Bruce Bartholow

    2017-01-01

    Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.

  10. An evaluation of scanpath-comparison and machine-learning classification algorithms used to study the dynamics of analogy making.

    Science.gov (United States)

    French, Robert M; Glady, Yannick; Thibaut, Jean-Pierre

    2017-08-01

    In recent years, eyetracking has begun to be used to study the dynamics of analogy making. Numerous scanpath-comparison algorithms and machine-learning techniques are available that can be applied to the raw eyetracking data. We show how scanpath-comparison algorithms, combined with multidimensional scaling and a classification algorithm, can be used to resolve an outstanding question in analogy making-namely, whether or not children's and adults' strategies in solving analogy problems are different. (They are.) We show which of these scanpath-comparison algorithms is best suited to the kinds of analogy problems that have formed the basis of much analogy-making research over the years. Furthermore, we use machine-learning classification algorithms to examine the item-to-item saccade vectors making up these scanpaths. We show which of these algorithms best predicts, from very early on in a trial, on the basis of the frequency of various item-to-item saccades, whether a child or an adult is doing the problem. This type of analysis can also be used to predict, on the basis of the item-to-item saccade dynamics in the first third of a trial, whether or not a problem will be solved correctly.

  11. MODIS-Based Estimation of Terrestrial Latent Heat Flux over North America Using Three Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Xuanyu Wang

    2017-12-01

    Full Text Available Terrestrial latent heat flux (LE is a key component of the global terrestrial water, energy, and carbon exchanges. Accurate estimation of LE from moderate resolution imaging spectroradiometer (MODIS data remains a major challenge. In this study, we estimated the daily LE for different plant functional types (PFTs across North America using three machine learning algorithms: artificial neural network (ANN; support vector machines (SVM; and, multivariate adaptive regression spline (MARS driven by MODIS and Modern Era Retrospective Analysis for Research and Applications (MERRA meteorology data. These three predictive algorithms, which were trained and validated using observed LE over the period 2000–2007, all proved to be accurate. However, ANN outperformed the other two algorithms for the majority of the tested configurations for most PFTs and was the only method that arrived at 80% precision for LE estimation. We also applied three machine learning algorithms for MODIS data and MERRA meteorology to map the average annual terrestrial LE of North America during 2002–2004 using a spatial resolution of 0.05°, which proved to be useful for estimating the long-term LE over North America.

  12. Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy.

    Science.gov (United States)

    Sengupta, Partho P; Huang, Yen-Min; Bansal, Manish; Ashrafi, Ali; Fisher, Matt; Shameer, Khader; Gall, Walt; Dudley, Joel T

    2016-06-01

    Associating a patient's profile with the memories of prototypical patients built through previous repeat clinical experience is a key process in clinical judgment. We hypothesized that a similar process using a cognitive computing tool would be well suited for learning and recalling multidimensional attributes of speckle tracking echocardiography data sets derived from patients with known constrictive pericarditis and restrictive cardiomyopathy. Clinical and echocardiographic data of 50 patients with constrictive pericarditis and 44 with restrictive cardiomyopathy were used for developing an associative memory classifier-based machine-learning algorithm. The speckle tracking echocardiography data were normalized in reference to 47 controls with no structural heart disease, and the diagnostic area under the receiver operating characteristic curve of the associative memory classifier was evaluated for differentiating constrictive pericarditis from restrictive cardiomyopathy. Using only speckle tracking echocardiography variables, associative memory classifier achieved a diagnostic area under the curve of 89.2%, which improved to 96.2% with addition of 4 echocardiographic variables. In comparison, the area under the curve of early diastolic mitral annular velocity and left ventricular longitudinal strain were 82.1% and 63.7%, respectively. Furthermore, the associative memory classifier demonstrated greater accuracy and shorter learning curves than other machine-learning approaches, with accuracy asymptotically approaching 90% after a training fraction of 0.3 and remaining flat at higher training fractions. This study demonstrates feasibility of a cognitive machine-learning approach for learning and recalling patterns observed during echocardiographic evaluations. Incorporation of machine-learning algorithms in cardiac imaging may aid standardized assessments and support the quality of interpretations, particularly for novice readers with limited experience. © 2016

  13. A Cognitive Machine Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis from Restrictive Cardiomyopathy

    Science.gov (United States)

    Sengupta, Partho P.; Huang, Yen-Min; Bansal, Manish; Ashrafi, Ali; Fisher, Matt; Shameer, Khader; Gall, Walt; Dudley, Joel T

    2016-01-01

    Background Associating a patient’s profile with the memories of prototypical patients built through previous repeat clinical experience is a key process in clinical judgment. We hypothesized that a similar process using a cognitive computing tool would be well suited for learning and recalling multidimensional attributes of speckle tracking echocardiography (STE) data sets derived from patients with known constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Methods and Results Clinical and echocardiographic data of 50 patients with CP and 44 with RCM were used for developing an associative memory classifier (AMC) based machine learning algorithm. The STE data was normalized in reference to 47 controls with no structural heart disease, and the diagnostic area under the receiver operating characteristic curve (AUC) of the AMC was evaluated for differentiating CP from RCM. Using only STE variables, AMC achieved a diagnostic AUC of 89·2%, which improved to 96·2% with addition of 4 echocardiographic variables. In comparison, the AUC of early diastolic mitral annular velocity and left ventricular longitudinal strain were 82.1% and 63·7%, respectively. Furthermore, AMC demonstrated greater accuracy and shorter learning curves than other machine learning approaches with accuracy asymptotically approaching 90% after a training fraction of 0·3 and remaining flat at higher training fractions. Conclusions This study demonstrates feasibility of a cognitive machine learning approach for learning and recalling patterns observed during echocardiographic evaluations. Incorporation of machine learning algorithms in cardiac imaging may aid standardized assessments and support the quality of interpretations, particularly for novice readers with limited experience. PMID:27266599

  14. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-06-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of 0.8 cm2 and weighs only 180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved a

  15. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice

    2017-06-14

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  16. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Directory of Open Access Journals (Sweden)

    Ceylan Koydemir Hatice

    2017-06-01

    Full Text Available Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond

  17. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-01-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  18. Evaluation of Empirical and Machine Learning Algorithms for Estimation of Coastal Water Quality Parameters

    Directory of Open Access Journals (Sweden)

    Majid Nazeer

    2017-11-01

    Full Text Available Coastal waters are one of the most vulnerable resources that require effective monitoring programs. One of the key factors for effective coastal monitoring is the use of remote sensing technologies that significantly capture the spatiotemporal variability of coastal waters. Optical properties of coastal waters are strongly linked to components, such as colored dissolved organic matter (CDOM, chlorophyll-a (Chl-a, and suspended solids (SS concentrations, which are essential for the survival of a coastal ecosystem and usually independent of each other. Thus, developing effective remote sensing models to estimate these important water components based on optical properties of coastal waters is mandatory for a successful coastal monitoring program. This study attempted to evaluate the performance of empirical predictive models (EPM and neural networks (NN-based algorithms to estimate Chl-a and SS concentrations, in the coastal area of Hong Kong. Remotely-sensed data over a 13-year period was used to develop regional and local models to estimate Chl-a and SS over the entire Hong Kong waters and for each water class within the study area, respectively. The accuracy of regional models derived from EPM and NN in estimating Chl-a and SS was 83%, 93%, 78%, and 97%, respectively, whereas the accuracy of local models in estimating Chl-a and SS ranged from 60–94% and 81–94%, respectively. Both the regional and local NN models exhibited a higher performance than those models derived from empirical analysis. Thus, this study suggests using machine learning methods (i.e., NN for the more accurate and efficient routine monitoring of coastal water quality parameters (i.e., Chl-a and SS concentrations over the complex coastal area of Hong Kong and other similar coastal environments.

  19. Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

    2018-03-01

    Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p breast cancer detected in the next subsequent mammography screening.

  20. Online transfer learning with extreme learning machine

    Science.gov (United States)

    Yin, Haibo; Yang, Yun-an

    2017-05-01

    In this paper, we propose a new transfer learning algorithm for online training. The proposed algorithm, which is called Online Transfer Extreme Learning Machine (OTELM), is based on Online Sequential Extreme Learning Machine (OSELM) while it introduces Semi-Supervised Extreme Learning Machine (SSELM) to transfer knowledge from the source to the target domain. With the manifold regularization, SSELM picks out instances from the source domain that are less relevant to those in the target domain to initialize the online training, so as to improve the classification performance. Experimental results demonstrate that the proposed OTELM can effectively use instances in the source domain to enhance the learning performance.

  1. Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Hollingsworth, Alan B.; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qiu, Yuchen; Liu, Hong; Zheng, Bin

    2018-02-01

    In order to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, this study aims to investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. A dataset involving negative mammograms acquired from 500 women was assembled. This dataset was divided into two age-matched classes of 250 high risk cases in which cancer was detected in the next subsequent mammography screening and 250 low risk cases, which remained negative. First, a computer-aided image processing scheme was applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, a multi-feature fusion based machine learning classifier was built to predict the risk of cancer detection in the next mammography screening. A leave-one-case-out (LOCO) cross-validation method was applied to train and test the machine learning classifier embedded with a LLP algorithm, which generated a new operational vector with 4 features using a maximal variance approach in each LOCO process. Results showed a 9.7% increase in risk prediction accuracy when using this LPP-embedded machine learning approach. An increased trend of adjusted odds ratios was also detected in which odds ratios increased from 1.0 to 11.2. This study demonstrated that applying the LPP algorithm effectively reduced feature dimensionality, and yielded higher and potentially more robust performance in predicting short-term breast cancer risk.

  2. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  3. Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment.

    Science.gov (United States)

    Li, Yang; Li, Guoqing; Wang, Zhenhao

    2015-01-01

    In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.

  4. SOLAR FLARE PREDICTION USING SDO/HMI VECTOR MAGNETIC FIELD DATA WITH A MACHINE-LEARNING ALGORITHM

    International Nuclear Information System (INIS)

    Bobra, M. G.; Couvidat, S.

    2015-01-01

    We attempt to forecast M- and X-class solar flares using a machine-learning algorithm, called support vector machine (SVM), and four years of data from the Solar Dynamics Observatory's Helioseismic and Magnetic Imager, the first instrument to continuously map the full-disk photospheric vector magnetic field from space. Most flare forecasting efforts described in the literature use either line-of-sight magnetograms or a relatively small number of ground-based vector magnetograms. This is the first time a large data set of vector magnetograms has been used to forecast solar flares. We build a catalog of flaring and non-flaring active regions sampled from a database of 2071 active regions, comprised of 1.5 million active region patches of vector magnetic field data, and characterize each active region by 25 parameters. We then train and test the machine-learning algorithm and we estimate its performances using forecast verification metrics with an emphasis on the true skill statistic (TSS). We obtain relatively high TSS scores and overall predictive abilities. We surmise that this is partly due to fine-tuning the SVM for this purpose and also to an advantageous set of features that can only be calculated from vector magnetic field data. We also apply a feature selection algorithm to determine which of our 25 features are useful for discriminating between flaring and non-flaring active regions and conclude that only a handful are needed for good predictive abilities

  5. Machine Learning via Mathematical Programming

    National Research Council Canada - National Science Library

    Mamgasarian, Olivi

    1999-01-01

    Mathematical programming approaches were applied to a variety of problems in machine learning in order to gain deeper understanding of the problems and to come up with new and more efficient computational algorithms...

  6. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

    Science.gov (United States)

    2017-01-01

    Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions Machine learning algorithms can classify open-text feedback

  7. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    Science.gov (United States)

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high

  8. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  9. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  10. Use of a machine learning algorithm to classify expertise: analysis of hand motion patterns during a simulated surgical task.

    Science.gov (United States)

    Watson, Robert A

    2014-08-01

    To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.

  11. Machine learning and medical imaging

    CERN Document Server

    Shen, Dinggang; Sabuncu, Mert

    2016-01-01

    Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, a...

  12. A Machine-Learning Algorithm Toward Color Analysis for Chronic Liver Disease Classification, Employing Ultrasound Shear Wave Elastography.

    Science.gov (United States)

    Gatos, Ilias; Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitris; Theotokas, Ioannis; Zoumpoulis, Pavlos; Loupas, Thanasis; Hazle, John D; Kagadis, George C

    2017-09-01

    The purpose of the present study was to employ a computer-aided diagnosis system that classifies chronic liver disease (CLD) using ultrasound shear wave elastography (SWE) imaging, with a stiffness value-clustering and machine-learning algorithm. A clinical data set of 126 patients (56 healthy controls, 70 with CLD) was analyzed. First, an RGB-to-stiffness inverse mapping technique was employed. A five-cluster segmentation was then performed associating corresponding different-color regions with certain stiffness value ranges acquired from the SWE manufacturer-provided color bar. Subsequently, 35 features (7 for each cluster), indicative of physical characteristics existing within the SWE image, were extracted. A stepwise regression analysis toward feature reduction was used to derive a reduced feature subset that was fed into the support vector machine classification algorithm to classify CLD from healthy cases. The highest accuracy in classification of healthy to CLD subject discrimination from the support vector machine model was 87.3% with sensitivity and specificity values of 93.5% and 81.2%, respectively. Receiver operating characteristic curve analysis gave an area under the curve value of 0.87 (confidence interval: 0.77-0.92). A machine-learning algorithm that quantifies color information in terms of stiffness values from SWE images and discriminates CLD from healthy cases is introduced. New objective parameters and criteria for CLD diagnosis employing SWE images provided by the present study can be considered an important step toward color-based interpretation, and could assist radiologists' diagnostic performance on a daily basis after being installed in a PC and employed retrospectively, immediately after the examination. Copyright © 2017 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.

  13. Classification of caesarean section and normal vaginal deliveries using foetal heart rate signals and advanced machine learning algorithms.

    Science.gov (United States)

    Fergus, Paul; Hussain, Abir; Al-Jumeily, Dhiya; Huang, De-Shuang; Bouguila, Nizar

    2017-07-06

    Visual inspection of cardiotocography traces by obstetricians and midwives is the gold standard for monitoring the wellbeing of the foetus during antenatal care. However, inter- and intra-observer variability is high with only a 30% positive predictive value for the classification of pathological outcomes. This has a significant negative impact on the perinatal foetus and often results in cardio-pulmonary arrest, brain and vital organ damage, cerebral palsy, hearing, visual and cognitive defects and in severe cases, death. This paper shows that using machine learning and foetal heart rate signals provides direct information about the foetal state and helps to filter the subjective opinions of medical practitioners when used as a decision support tool. The primary aim is to provide a proof-of-concept that demonstrates how machine learning can be used to objectively determine when medical intervention, such as caesarean section, is required and help avoid preventable perinatal deaths. This is evidenced using an open dataset that comprises 506 controls (normal virginal deliveries) and 46 cases (caesarean due to pH ≤ 7.20-acidosis, n = 18; pH > 7.20 and pH machine-learning algorithms are trained, and validated, using binary classifier performance measures. The findings show that deep learning classification achieves sensitivity = 94%, specificity = 91%, Area under the curve = 99%, F-score = 100%, and mean square error = 1%. The results demonstrate that machine learning significantly improves the efficiency for the detection of caesarean section and normal vaginal deliveries using foetal heart rate signals compared with obstetrician and midwife predictions and systems reported in previous studies.

  14. Particle identification at LHCb: new calibration techniques and machine learning classification algorithms

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Particle identification (PID) plays a crucial role in LHCb analyses. Combining information from LHCb subdetectors allows one to distinguish between various species of long-lived charged and neutral particles. PID performance directly affects the sensitivity of most LHCb measurements. Advanced multivariate approaches are used at LHCb to obtain the best PID performance and control systematic uncertainties. This talk highlights recent developments in PID that use innovative machine learning techniques, as well as novel data-driven approaches which ensure that PID performance is well reproduced in simulation.

  15. Teaching High School Students Machine Learning Algorithms to Analyze Flood Risk Factors in River Deltas

    Science.gov (United States)

    Rose, R.; Aizenman, H.; Mei, E.; Choudhury, N.

    2013-12-01

    High School students interested in the STEM fields benefit most when actively participating, so I created a series of learning modules on how to analyze complex systems using machine-learning that give automated feedback to students. The automated feedbacks give timely responses that will encourage the students to continue testing and enhancing their programs. I have designed my modules to take the tactical learning approach in conveying the concepts behind correlation, linear regression, and vector distance based classification and clustering. On successful completion of these modules, students will learn how to calculate linear regression, Pearson's correlation, and apply classification and clustering techniques to a dataset. Working on these modules will allow the students to take back to the classroom what they've learned and then apply it to the Earth Science curriculum. During my research this summer, we applied these lessons to analyzing river deltas; we looked at trends in the different variables over time, looked for similarities in NDVI, precipitation, inundation, runoff and discharge, and attempted to predict floods based on the precipitation, waves mean, area of discharge, NDVI, and inundation.

  16. Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices.

    Science.gov (United States)

    Tatjewski, Marcin; Kierczak, Marcin; Plewczynski, Dariusz

    2017-01-01

    Here, we present two perspectives on the task of predicting post translational modifications (PTMs) from local sequence fragments using machine learning algorithms. The first is the description of the fundamental steps required to construct a PTM predictor from the very beginning. These steps include data gathering, feature extraction, or machine-learning classifier selection. The second part of our work contains the detailed discussion of more advanced problems which are encountered in PTM prediction task. Probably the most challenging issues which we have covered here are: (1) how to address the training data class imbalance problem (we also present statistics describing the problem); (2) how to properly set up cross-validation folds with an approach which takes into account the homology of protein data records, to address this problem we present our folds-over-clusters algorithm; and (3) how to efficiently reach for new sources of learning features. Presented techniques and notes resulted from intense studies in the field, performed by our and other groups, and can be useful both for researchers beginning in the field of PTM prediction and for those who want to extend the repertoire of their research techniques.

  17. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    Science.gov (United States)

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  18. Parameterization of typhoon-induced ocean cooling using temperature equation and machine learning algorithms: an example of typhoon Soulik (2013)

    Science.gov (United States)

    Wei, Jun; Jiang, Guo-Qing; Liu, Xin

    2017-09-01

    This study proposed three algorithms that can potentially be used to provide sea surface temperature (SST) conditions for typhoon prediction models. Different from traditional data assimilation approaches, which provide prescribed initial/boundary conditions, our proposed algorithms aim to resolve a flow-dependent SST feedback between growing typhoons and oceans in the future time. Two of these algorithms are based on linear temperature equations (TE-based), and the other is based on an innovative technique involving machine learning (ML-based). The algorithms are then implemented into a Weather Research and Forecasting model for the simulation of typhoon to assess their effectiveness, and the results show significant improvement in simulated storm intensities by including ocean cooling feedback. The TE-based algorithm I considers wind-induced ocean vertical mixing and upwelling processes only, and thus obtained a synoptic and relatively smooth sea surface temperature cooling. The TE-based algorithm II incorporates not only typhoon winds but also ocean information, and thus resolves more cooling features. The ML-based algorithm is based on a neural network, consisting of multiple layers of input variables and neurons, and produces the best estimate of the cooling structure, in terms of its amplitude and position. Sensitivity analysis indicated that the typhoon-induced ocean cooling is a nonlinear process involving interactions of multiple atmospheric and oceanic variables. Therefore, with an appropriate selection of input variables and neuron sizes, the ML-based algorithm appears to be more efficient in prognosing the typhoon-induced ocean cooling and in predicting typhoon intensity than those algorithms based on linear regression methods.

  19. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  20. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Louppe, Gilles; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu

    2012-01-01

    Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings....

  1. Machine Learning Algorithms for prediction of regions of high Reynolds Averaged Navier Stokes Uncertainty

    Science.gov (United States)

    Mishra, Aashwin; Iaccarino, Gianluca

    2017-11-01

    In spite of their deficiencies, RANS models represent the workhorse for industrial investigations into turbulent flows. In this context, it is essential to provide diagnostic measures to assess the quality of RANS predictions. To this end, the primary step is to identify feature importances amongst massive sets of potentially descriptive and discriminative flow features. This aids the physical interpretability of the resultant discrepancy model and its extensibility to similar problems. Recent investigations have utilized approaches such as Random Forests, Support Vector Machines and the Least Absolute Shrinkage and Selection Operator for feature selection. With examples, we exhibit how such methods may not be suitable for turbulent flow datasets. The underlying rationale, such as the correlation bias and the required conditions for the success of penalized algorithms, are discussed with illustrative examples. Finally, we provide alternate approaches using convex combinations of regularized regression approaches and randomized sub-sampling in combination with feature selection algorithms, to infer model structure from data. This research was supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo).

  2. Structural Damage Detection using Frequency Response Function Index and Surrogate Model Based on Optimized Extreme Learning Machine Algorithm

    Directory of Open Access Journals (Sweden)

    R. Ghiasi

    2017-09-01

    Full Text Available Utilizing surrogate models based on artificial intelligence methods for detecting structural damages has attracted the attention of many researchers in recent decades. In this study, a new kernel based on Littlewood-Paley Wavelet (LPW is proposed for Extreme Learning Machine (ELM algorithm to improve the accuracy of detecting multiple damages in structural systems.  ELM is used as metamodel (surrogate model of exact finite element analysis of structures in order to efficiently reduce the computational cost through updating process. In the proposed two-step method, first a damage index, based on Frequency Response Function (FRF of the structure, is used to identify the location of damages. In the second step, the severity of damages in identified elements is detected using ELM. In order to evaluate the efficacy of ELM, the results obtained from the proposed kernel were compared with other kernels proposed for ELM as well as Least Square Support Vector Machine algorithm. The solved numerical problems indicated that ELM algorithm accuracy in detecting structural damages is increased drastically in case of using LPW kernel.

  3. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  4. Model-based machine learning.

    Science.gov (United States)

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  5. Forecasting the Occurrence of Severe Haze Events in Asia using Machine Learning Algorithms

    Science.gov (United States)

    Wang, C.

    2017-12-01

    Particulate pollution has become a serious environmental issue of many Asian countries in recent decades, threatening human health and frequently causing low visibility or haze days that interrupt from working, outdoor, and school activities to air, road, and sea transportation. To ultimately prevent such severe haze to occur requires many difficult tasks to be accomplished, dealing with trade and negotiation, emission control, energy consumption, transportation, land and plantation management, among other, of all involved countries or parties. Whereas, before these difficult measures could finally take place, it would be more practical to reduce the economic loss by developing skills to predict the occurrence of such events in reasonable accuracy so that effective mitigation or adaptation measures could be implemented ahead of time. The "traditional" numerical models developed based on fluid dynamics and explicit or parameterized representations of physiochemical processes can be certainly used for this task. However, the significant and sophisticated spatiotemporal variabilities associated with these events, the propagation of numerical or parameterization errors through model integration, and the computational demand all pose serious challenges to the practice of using these models to accomplish this interdisciplinary task. On the other hand, large quantity of meteorological, hydrological, atmospheric aerosol and composition, and surface visibility data from in-situ observation, reanalysis, or satellite retrievals, have become available to the community. These data might still not sufficient for evaluating and improving certain important aspects of the "traditional" models. Nevertheless, it is likely that these data can already support the effort to develop alternative "task-oriented" and computationally efficient forecasting skill using deep machine learning technique to avoid directly dealing with the sophisticated interplays across multiple process layers. I

  6. Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Chien-Hung Huang

    2015-01-01

    Full Text Available Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues’s method by employing the protein-protein interaction (PPI data, domain-domain interaction (DDI data, weighted domain frequency score (DFS, and cancer linker degree (CLD data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I using individual algorithm, (II combining algorithms, and (III combining the same classification types of algorithms. When compared with Aragues’s method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.

  7. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  8. Comparison of machine learning and semi-quantification algorithms for (I123)FP-CIT classification: the beginning of the end for semi-quantification?

    Science.gov (United States)

    Taylor, Jonathan Christopher; Fenner, John Wesley

    2017-11-29

    Semi-quantification methods are well established in the clinic for assisted reporting of (I123) Ioflupane images. Arguably, these are limited diagnostic tools. Recent research has demonstrated the potential for improved classification performance offered by machine learning algorithms. A direct comparison between methods is required to establish whether a move towards widespread clinical adoption of machine learning algorithms is justified. This study compared three machine learning algorithms with that of a range of semi-quantification methods, using the Parkinson's Progression Markers Initiative (PPMI) research database and a locally derived clinical database for validation. Machine learning algorithms were based on support vector machine classifiers with three different sets of features: Voxel intensities Principal components of image voxel intensities Striatal binding radios from the putamen and caudate. Semi-quantification methods were based on striatal binding ratios (SBRs) from both putamina, with and without consideration of the caudates. Normal limits for the SBRs were defined through four different methods: Minimum of age-matched controls Mean minus 1/1.5/2 standard deviations from age-matched controls Linear regression of normal patient data against age (minus 1/1.5/2 standard errors) Selection of the optimum operating point on the receiver operator characteristic curve from normal and abnormal training data Each machine learning and semi-quantification technique was evaluated with stratified, nested 10-fold cross-validation, repeated 10 times. The mean accuracy of the semi-quantitative methods for classification of local data into Parkinsonian and non-Parkinsonian groups varied from 0.78 to 0.87, contrasting with 0.89 to 0.95 for classifying PPMI data into healthy controls and Parkinson's disease groups. The machine learning algorithms gave mean accuracies between 0.88 to 0.92 and 0.95 to 0.97 for local and PPMI data respectively. Classification

  9. Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

    Science.gov (United States)

    Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

    2016-03-01

    The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care

  10. Surface electromyography based muscle fatigue detection using high-resolution time-frequency methods and machine learning algorithms.

    Science.gov (United States)

    Karthick, P A; Ghosh, Diptasree Maitra; Ramakrishnan, S

    2018-02-01

    Surface electromyography (sEMG) based muscle fatigue research is widely preferred in sports science and occupational/rehabilitation studies due to its noninvasiveness. However, these signals are complex, multicomponent and highly nonstationary with large inter-subject variations, particularly during dynamic contractions. Hence, time-frequency based machine learning methodologies can improve the design of automated system for these signals. In this work, the analysis based on high-resolution time-frequency methods, namely, Stockwell transform (S-transform), B-distribution (BD) and extended modified B-distribution (EMBD) are proposed to differentiate the dynamic muscle nonfatigue and fatigue conditions. The nonfatigue and fatigue segments of sEMG signals recorded from the biceps brachii of 52 healthy volunteers are preprocessed and subjected to S-transform, BD and EMBD. Twelve features are extracted from each method and prominent features are selected using genetic algorithm (GA) and binary particle swarm optimization (BPSO). Five machine learning algorithms, namely, naïve Bayes, support vector machine (SVM) of polynomial and radial basis kernel, random forest and rotation forests are used for the classification. The results show that all the proposed time-frequency distributions (TFDs) are able to show the nonstationary variations of sEMG signals. Most of the features exhibit statistically significant difference in the muscle fatigue and nonfatigue conditions. The maximum number of features (66%) is reduced by GA and BPSO for EMBD and BD-TFD respectively. The combination of EMBD- polynomial kernel based SVM is found to be most accurate (91% accuracy) in classifying the conditions with the features selected using GA. The proposed methods are found to be capable of handling the nonstationary and multicomponent variations of sEMG signals recorded in dynamic fatiguing contractions. Particularly, the combination of EMBD- polynomial kernel based SVM could be used to

  11. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  12. The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

    Directory of Open Access Journals (Sweden)

    Reza Samizade

    2018-06-01

    Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

  13. Application of Classification Algorithm of Machine Learning and Buffer Analysis in Torism Regional Planning

    Science.gov (United States)

    Zhang, T. H.; Ji, H. W.; Hu, Y.; Ye, Q.; Lin, Y.

    2018-04-01

    Remote Sensing (RS) and Geography Information System (GIS) technologies are widely used in ecological analysis and regional planning. With the advantages of large scale monitoring, combination of point and area, multiple time-phases and repeated observation, they are suitable for monitoring and analysis of environmental information in a large range. In this study, support vector machine (SVM) classification algorithm is used to monitor the land use and land cover change (LUCC), and then to perform the ecological evaluation for Chaohu lake tourism area quantitatively. The automatic classification and the quantitative spatial-temporal analysis for the Chaohu Lake basin are realized by the analysis of multi-temporal and multispectral satellite images, DEM data and slope information data. Furthermore, the ecological buffer zone analysis is also studied to set up the buffer width for each catchment area surrounding Chaohu Lake. The results of LUCC monitoring from 1992 to 2015 has shown obvious affections by human activities. Since the construction of the Chaohu Lake basin is in the crucial stage of the rapid development of urbanization, the application of RS and GIS technique can effectively provide scientific basis for land use planning, ecological management, environmental protection and tourism resources development in the Chaohu Lake Basin.

  14. Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment.

    Directory of Open Access Journals (Sweden)

    Yang Li

    Full Text Available In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA methods, a new rule extraction method based on extreme learning machine (ELM and an improved Ant-miner (IAM algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.

  15. Monitoring mangrove biomass change in Vietnam using SPOT images and an object-based approach combined with machine learning algorithms

    Science.gov (United States)

    Pham, Lien T. H.; Brabyn, Lars

    2017-06-01

    Mangrove forests are well-known for their provision of ecosystem services and capacity to reduce carbon dioxide concentrations in the atmosphere. Mapping and quantifying mangrove biomass is useful for the effective management of these forests and maximizing their ecosystem service performance. The objectives of this research were to model, map, and analyse the biomass change between 2000 and 2011 of mangrove forests in the Cangio region in Vietnam. SPOT 4 and 5 images were used in conjunction with object-based image analysis and machine learning algorithms. The study area included natural and planted mangroves of diverse species. After image preparation, three different mangrove associations were identified using two levels of image segmentation followed by a Support Vector Machine classifier and a range of spectral, texture and GIS information for classification. The overall classification accuracy for the 2000 and 2011 images were 77.1% and 82.9%, respectively. Random Forest regression algorithms were then used for modelling and mapping biomass. The model that integrated spectral, vegetation association type, texture, and vegetation indices obtained the highest accuracy (R2adj = 0.73). Among the different variables, vegetation association type was the most important variable identified by the Random Forest model. Based on the biomass maps generated from the Random Forest, total biomass in the Cangio mangrove forest increased by 820,136 tons over this period, although this change varied between the three different mangrove associations.

  16. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  17. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  18. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  19. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales

    Directory of Open Access Journals (Sweden)

    Jihoon Oh

    2017-09-01

    Full Text Available Classification and prediction of suicide attempts in high-risk groups is important for preventing suicide. The purpose of this study was to investigate whether the information from multiple clinical scales has classification power for identifying actual suicide attempts. Patients with depression and anxiety disorders (N = 573 were included, and each participant completed 31 self-report psychiatric scales and questionnaires about their history of suicide attempts. We then trained an artificial neural network classifier with 41 variables (31 psychiatric scales and 10 sociodemographic elements and ranked the contribution of each variable for the classification of suicide attempts. To evaluate the clinical applicability of our model, we measured classification performance with top-ranked predictors. Our model had an overall accuracy of 93.7% in 1-month, 90.8% in 1-year, and 87.4% in lifetime suicide attempts detection. The area under the receiver operating characteristic curve (AUROC was the highest for 1-month suicide attempts detection (0.93, followed by lifetime (0.89, and 1-year detection (0.87. Among all variables, the Emotion Regulation Questionnaire had the highest contribution, and the positive and negative characteristics of the scales similarly contributed to classification performance. Performance on suicide attempts classification was largely maintained when we only used the top five ranked variables for training (AUROC; 1-month, 0.75, 1-year, 0.85, lifetime suicide attempts detection, 0.87. Our findings indicate that information from self-report clinical scales can be useful for the classification of suicide attempts. Based on the reliable performance of the top five predictors alone, this machine learning approach could help clinicians identify high-risk patients in clinical settings.

  20. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales.

    Science.gov (United States)

    Oh, Jihoon; Yun, Kyongsik; Hwang, Ji-Hyun; Chae, Jeong-Ho

    2017-01-01

    Classification and prediction of suicide attempts in high-risk groups is important for preventing suicide. The purpose of this study was to investigate whether the information from multiple clinical scales has classification power for identifying actual suicide attempts. Patients with depression and anxiety disorders ( N  = 573) were included, and each participant completed 31 self-report psychiatric scales and questionnaires about their history of suicide attempts. We then trained an artificial neural network classifier with 41 variables (31 psychiatric scales and 10 sociodemographic elements) and ranked the contribution of each variable for the classification of suicide attempts. To evaluate the clinical applicability of our model, we measured classification performance with top-ranked predictors. Our model had an overall accuracy of 93.7% in 1-month, 90.8% in 1-year, and 87.4% in lifetime suicide attempts detection. The area under the receiver operating characteristic curve (AUROC) was the highest for 1-month suicide attempts detection (0.93), followed by lifetime (0.89), and 1-year detection (0.87). Among all variables, the Emotion Regulation Questionnaire had the highest contribution, and the positive and negative characteristics of the scales similarly contributed to classification performance. Performance on suicide attempts classification was largely maintained when we only used the top five ranked variables for training (AUROC; 1-month, 0.75, 1-year, 0.85, lifetime suicide attempts detection, 0.87). Our findings indicate that information from self-report clinical scales can be useful for the classification of suicide attempts. Based on the reliable performance of the top five predictors alone, this machine learning approach could help clinicians identify high-risk patients in clinical settings.

  1. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  2. Introduction to machine learning

    OpenAIRE

    Baştanlar, Yalın; Özuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning app...

  3. An introduction to quantum machine learning

    OpenAIRE

    Schuld, M.; Sinayskiy, I.; Petruccione, F.

    2014-01-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum compute...

  4. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  5. Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients.

    Science.gov (United States)

    Delahanty, Ryan J; Kaufman, David; Jones, Spencer S

    2018-06-01

    Risk adjustment algorithms for ICU mortality are necessary for measuring and improving ICU performance. Existing risk adjustment algorithms are not widely adopted. Key barriers to adoption include licensing and implementation costs as well as labor costs associated with human-intensive data collection. Widespread adoption of electronic health records makes automated risk adjustment feasible. Using modern machine learning methods and open source tools, we developed and evaluated a retrospective risk adjustment algorithm for in-hospital mortality among ICU patients. The Risk of Inpatient Death score can be fully automated and is reliant upon data elements that are generated in the course of usual hospital processes. One hundred thirty-one ICUs in 53 hospitals operated by Tenet Healthcare. A cohort of 237,173 ICU patients discharged between January 2014 and December 2016. The data were randomly split into training (36 hospitals), and validation (17 hospitals) data sets. Feature selection and model training were carried out using the training set while the discrimination, calibration, and accuracy of the model were assessed in the validation data set. Model discrimination was evaluated based on the area under receiver operating characteristic curve; accuracy and calibration were assessed via adjusted Brier scores and visual analysis of calibration curves. Seventeen features, including a mix of clinical and administrative data elements, were retained in the final model. The Risk of Inpatient Death score demonstrated excellent discrimination (area under receiver operating characteristic curve = 0.94) and calibration (adjusted Brier score = 52.8%) in the validation dataset; these results compare favorably to the published performance statistics for the most commonly used mortality risk adjustment algorithms. Low adoption of ICU mortality risk adjustment algorithms impedes progress toward increasing the value of the healthcare delivered in ICUs. The Risk of Inpatient Death

  6. Analyzing the evolutionary mechanisms of the Air Transportation System-of-Systems using network theory and machine learning algorithms

    Science.gov (United States)

    Kotegawa, Tatsuya

    Complexity in the Air Transportation System (ATS) arises from the intermingling of many independent physical resources, operational paradigms, and stakeholder interests, as well as the dynamic variation of these interactions over time. Currently, trade-offs and cost benefit analyses of new ATS concepts are carried out on system-wide evaluation simulations driven by air traffic forecasts that assume fixed airline routes. However, this does not well reflect reality as airlines regularly add and remove routes. A airline service route network evolution model that projects route addition and removal was created and combined with state-of-the-art air traffic forecast methods to better reflect the dynamic properties of the ATS in system-wide simulations. Guided by a system-of-systems framework, network theory metrics and machine learning algorithms were applied to develop the route network evolution models based on patterns extracted from historical data. Constructing the route addition section of the model posed the greatest challenge due to the large pool of new link candidates compared to the actual number of routes historically added to the network. Of the models explored, algorithms based on logistic regression, random forests, and support vector machines showed best route addition and removal forecast accuracies at approximately 20% and 40%, respectively, when validated with historical data. The combination of network evolution models and a system-wide evaluation tool quantified the impact of airline route network evolution on air traffic delay. The expected delay minutes when considering network evolution increased approximately 5% for a forecasted schedule on 3/19/2020. Performance trade-off studies between several airline route network topologies from the perspectives of passenger travel efficiency, fuel burn, and robustness were also conducted to provide bounds that could serve as targets for ATS transformation efforts. The series of analysis revealed that high

  7. Identification of Forested Landslides Using LiDar Data, Object-based Image Analysis, and Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Xianju Li

    2015-07-01

    Full Text Available For identification of forested landslides, most studies focus on knowledge-based and pixel-based analysis (PBA of LiDar data, while few studies have examined (semi- automated methods and object-based image analysis (OBIA. Moreover, most of them are focused on soil-covered areas with gentle hillslopes. In bedrock-covered mountains with steep and rugged terrain, it is so difficult to identify landslides that there is currently no research on whether combining semi-automated methods and OBIA with only LiDar derivatives could be more effective. In this study, a semi-automatic object-based landslide identification approach was developed and implemented in a forested area, the Three Gorges of China. Comparisons of OBIA and PBA, two different machine learning algorithms and their respective sensitivity to feature selection (FS, were first investigated. Based on the classification result, the landslide inventory was finally obtained according to (1 inclusion of holes encircled by the landslide body; (2 removal of isolated segments, and (3 delineation of closed envelope curves for landslide objects by manual digitizing operation. The proposed method achieved the following: (1 the filter features of surface roughness were first applied for calculating object features, and proved useful; (2 FS improved classification accuracy and reduced features; (3 the random forest algorithm achieved higher accuracy and was less sensitive to FS than a support vector machine; (4 compared to PBA, OBIA was more sensitive to FS, remarkably reduced computing time, and depicted more contiguous terrain segments; (5 based on the classification result with an overall accuracy of 89.11% ± 0.03%, the obtained inventory map was consistent with the referenced landslide inventory map, with a position mismatch value of 9%. The outlined approach would be helpful for forested landslide identification in steep and rugged terrain.

  8. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  9. Classification of HTTP traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, Tahir; Pedersen, Jens Myrup

    2012-01-01

    streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier...

  10. Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

    Science.gov (United States)

    Ruske, Simon; Topping, David O.; Foot, Virginia E.; Kaye, Paul H.; Stanley, Warren R.; Crawford, Ian; Morse, Andrew P.; Gallagher, Martin W.

    2017-03-01

    Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was

  11. Automated detection and classification of cryptographic algorithms in binary programs through machine learning

    OpenAIRE

    Hosfelt, Diane Duros

    2015-01-01

    Threats from the internet, particularly malicious software (i.e., malware) often use cryptographic algorithms to disguise their actions and even to take control of a victim's system (as in the case of ransomware). Malware and other threats proliferate too quickly for the time-consuming traditional methods of binary analysis to be effective. By automating detection and classification of cryptographic algorithms, we can speed program analysis and more efficiently combat malware. This thesis wil...

  12. SU-G-201-09: Evaluation of a Novel Machine-Learning Algorithm for Permanent Prostate Brachytherapy Treatment Planning

    International Nuclear Information System (INIS)

    Nicolae, A; Lu, L; Morton, G; Chung, H; Helou, J; Al Hanaqta, M; Loblaw, A; Ravi, A; Heath, E

    2016-01-01

    Purpose: A novel, automated, algorithm for permanent prostate brachytherapy (PPB) treatment planning has been developed. The novel approach uses machine-learning (ML), a form of artificial intelligence, to substantially decrease planning time while simultaneously retaining the clinical intuition of plans created by radiation oncologists. This study seeks to compare the ML algorithm against expert-planned PPB plans to evaluate the equivalency of dosimetric and clinical plan quality. Methods: Plan features were computed from historical high-quality PPB treatments (N = 100) and stored in a relational database (RDB). The ML algorithm matched new PPB features to a highly similar case in the RDB; this initial plan configuration was then further optimized using a stochastic search algorithm. PPB pre-plans (N = 30) generated using the ML algorithm were compared to plan variants created by an expert dosimetrist (RT), and radiation oncologist (MD). Planning time and pre-plan dosimetry were evaluated using a one-way Student’s t-test and ANOVA, respectively (significance level = 0.05). Clinical implant quality was evaluated by expert PPB radiation oncologists as part of a qualitative study. Results: Average planning time was 0.44 ± 0.42 min compared to 17.88 ± 8.76 min for the ML algorithm and RT, respectively, a significant advantage [t(9), p = 0.01]. A post-hoc ANOVA [F(2,87) = 6.59, p = 0.002] using Tukey-Kramer criteria showed a significantly lower mean prostate V150% for the ML plans (52.9%) compared to the RT (57.3%), and MD (56.2%) plans. Preliminary qualitative study results indicate comparable clinical implant quality between RT and ML plans with a trend towards preference for ML plans. Conclusion: PPB pre-treatment plans highly comparable to those of an expert radiation oncologist can be created using a novel ML planning model. The use of an ML-based planning approach is expected to translate into improved PPB accessibility and plan uniformity.

  13. SU-G-201-09: Evaluation of a Novel Machine-Learning Algorithm for Permanent Prostate Brachytherapy Treatment Planning

    Energy Technology Data Exchange (ETDEWEB)

    Nicolae, A [Odette Cancer Centre, Sunnybrook Health Sciences Centre, Toronto, ON (Canada); Department of Physics, Ryerson University, Toronto, ON (Canada); Lu, L; Morton, G; Chung, H; Helou, J; Al Hanaqta, M; Loblaw, A; Ravi, A [Odette Cancer Centre, Sunnybrook Health Sciences Centre, Toronto, ON (Canada); Heath, E [Carleton Laboratory for Radiotherapy Physics, Carleton University, Ottawa, ON, CA (Canada)

    2016-06-15

    Purpose: A novel, automated, algorithm for permanent prostate brachytherapy (PPB) treatment planning has been developed. The novel approach uses machine-learning (ML), a form of artificial intelligence, to substantially decrease planning time while simultaneously retaining the clinical intuition of plans created by radiation oncologists. This study seeks to compare the ML algorithm against expert-planned PPB plans to evaluate the equivalency of dosimetric and clinical plan quality. Methods: Plan features were computed from historical high-quality PPB treatments (N = 100) and stored in a relational database (RDB). The ML algorithm matched new PPB features to a highly similar case in the RDB; this initial plan configuration was then further optimized using a stochastic search algorithm. PPB pre-plans (N = 30) generated using the ML algorithm were compared to plan variants created by an expert dosimetrist (RT), and radiation oncologist (MD). Planning time and pre-plan dosimetry were evaluated using a one-way Student’s t-test and ANOVA, respectively (significance level = 0.05). Clinical implant quality was evaluated by expert PPB radiation oncologists as part of a qualitative study. Results: Average planning time was 0.44 ± 0.42 min compared to 17.88 ± 8.76 min for the ML algorithm and RT, respectively, a significant advantage [t(9), p = 0.01]. A post-hoc ANOVA [F(2,87) = 6.59, p = 0.002] using Tukey-Kramer criteria showed a significantly lower mean prostate V150% for the ML plans (52.9%) compared to the RT (57.3%), and MD (56.2%) plans. Preliminary qualitative study results indicate comparable clinical implant quality between RT and ML plans with a trend towards preference for ML plans. Conclusion: PPB pre-treatment plans highly comparable to those of an expert radiation oncologist can be created using a novel ML planning model. The use of an ML-based planning approach is expected to translate into improved PPB accessibility and plan uniformity.

  14. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  15. Machine vision theory, algorithms, practicalities

    CERN Document Server

    Davies, E R

    2005-01-01

    In the last 40 years, machine vision has evolved into a mature field embracing a wide range of applications including surveillance, automated inspection, robot assembly, vehicle guidance, traffic monitoring and control, signature verification, biometric measurement, and analysis of remotely sensed images. While researchers and industry specialists continue to document their work in this area, it has become increasingly difficult for professionals and graduate students to understand the essential theory and practicalities well enough to design their own algorithms and systems. This book directl

  16. Evaluation of a Machine-Learning Algorithm for Treatment Planning in Prostate Low-Dose-Rate Brachytherapy

    Energy Technology Data Exchange (ETDEWEB)

    Nicolae, Alexandru [Department of Physics, Ryerson University, Toronto, Ontario (Canada); Department of Medical Physics, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Morton, Gerard; Chung, Hans; Loblaw, Andrew [Department of Radiation Oncology, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Jain, Suneil; Mitchell, Darren [Department of Clinical Oncology, The Northern Ireland Cancer Centre, Belfast City Hospital, Antrim, Northern Ireland (United Kingdom); Lu, Lin [Department of Radiation Therapy, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Helou, Joelle; Al-Hanaqta, Motasem [Department of Radiation Oncology, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada); Heath, Emily [Department of Physics, Carleton University, Ottawa, Ontario (Canada); Ravi, Ananth, E-mail: ananth.ravi@sunnybrook.ca [Department of Medical Physics, Odette Cancer Center, Sunnybrook Health Sciences Centre, Toronto, Ontario (Canada)

    2017-03-15

    Purpose: This work presents the application of a machine learning (ML) algorithm to automatically generate high-quality, prostate low-dose-rate (LDR) brachytherapy treatment plans. The ML algorithm can mimic characteristics of preoperative treatment plans deemed clinically acceptable by brachytherapists. The planning efficiency, dosimetry, and quality (as assessed by experts) of preoperative plans generated with an ML planning approach was retrospectively evaluated in this study. Methods and Materials: Preimplantation and postimplantation treatment plans were extracted from 100 high-quality LDR treatments and stored within a training database. The ML training algorithm matches similar features from a new LDR case to those within the training database to rapidly obtain an initial seed distribution; plans were then further fine-tuned using stochastic optimization. Preimplantation treatment plans generated by the ML algorithm were compared with brachytherapist (BT) treatment plans in terms of planning time (Wilcoxon rank sum, α = 0.05) and dosimetry (1-way analysis of variance, α = 0.05). Qualitative preimplantation plan quality was evaluated by expert LDR radiation oncologists using a Likert scale questionnaire. Results: The average planning time for the ML approach was 0.84 ± 0.57 minutes, compared with 17.88 ± 8.76 minutes for the expert planner (P=.020). Preimplantation plans were dosimetrically equivalent to the BT plans; the average prostate V150% was 4% lower for ML plans (P=.002), although the difference was not clinically significant. Respondents ranked the ML-generated plans as equivalent to expert BT treatment plans in terms of target coverage, normal tissue avoidance, implant confidence, and the need for plan modifications. Respondents had difficulty differentiating between plans generated by a human or those generated by the ML algorithm. Conclusions: Prostate LDR preimplantation treatment plans that have equivalent quality to plans created

  17. Evaluation of a Machine-Learning Algorithm for Treatment Planning in Prostate Low-Dose-Rate Brachytherapy

    International Nuclear Information System (INIS)

    Nicolae, Alexandru; Morton, Gerard; Chung, Hans; Loblaw, Andrew; Jain, Suneil; Mitchell, Darren; Lu, Lin; Helou, Joelle; Al-Hanaqta, Motasem; Heath, Emily; Ravi, Ananth

    2017-01-01

    Purpose: This work presents the application of a machine learning (ML) algorithm to automatically generate high-quality, prostate low-dose-rate (LDR) brachytherapy treatment plans. The ML algorithm can mimic characteristics of preoperative treatment plans deemed clinically acceptable by brachytherapists. The planning efficiency, dosimetry, and quality (as assessed by experts) of preoperative plans generated with an ML planning approach was retrospectively evaluated in this study. Methods and Materials: Preimplantation and postimplantation treatment plans were extracted from 100 high-quality LDR treatments and stored within a training database. The ML training algorithm matches similar features from a new LDR case to those within the training database to rapidly obtain an initial seed distribution; plans were then further fine-tuned using stochastic optimization. Preimplantation treatment plans generated by the ML algorithm were compared with brachytherapist (BT) treatment plans in terms of planning time (Wilcoxon rank sum, α = 0.05) and dosimetry (1-way analysis of variance, α = 0.05). Qualitative preimplantation plan quality was evaluated by expert LDR radiation oncologists using a Likert scale questionnaire. Results: The average planning time for the ML approach was 0.84 ± 0.57 minutes, compared with 17.88 ± 8.76 minutes for the expert planner (P=.020). Preimplantation plans were dosimetrically equivalent to the BT plans; the average prostate V150% was 4% lower for ML plans (P=.002), although the difference was not clinically significant. Respondents ranked the ML-generated plans as equivalent to expert BT treatment plans in terms of target coverage, normal tissue avoidance, implant confidence, and the need for plan modifications. Respondents had difficulty differentiating between plans generated by a human or those generated by the ML algorithm. Conclusions: Prostate LDR preimplantation treatment plans that have equivalent quality to plans created

  18. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  19. Machine learning and radiology.

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  20. Storage capacity of the Tilinglike Learning Algorithm

    International Nuclear Information System (INIS)

    Buhot, Arnaud; Gordon, Mirta B.

    2001-01-01

    The storage capacity of an incremental learning algorithm for the parity machine, the Tilinglike Learning Algorithm, is analytically determined in the limit of a large number of hidden perceptrons. Different learning rules for the simple perceptron are investigated. The usual Gardner-Derrida rule leads to a storage capacity close to the upper bound, which is independent of the learning algorithm considered

  1. Using Machine Learning to Predict Student Performance

    OpenAIRE

    Pojon, Murat

    2017-01-01

    This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance. Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification ...

  2. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  3. Spatiotemporal prediction of continuous daily PM2.5 concentrations across China using a spatially explicit machine learning algorithm

    Science.gov (United States)

    Zhan, Yu; Luo, Yuzhou; Deng, Xunfei; Chen, Huajin; Grieneisen, Michael L.; Shen, Xueyou; Zhu, Lizhong; Zhang, Minghua

    2017-04-01

    A high degree of uncertainty associated with the emission inventory for China tends to degrade the performance of chemical transport models in predicting PM2.5 concentrations especially on a daily basis. In this study a novel machine learning algorithm, Geographically-Weighted Gradient Boosting Machine (GW-GBM), was developed by improving GBM through building spatial smoothing kernels to weigh the loss function. This modification addressed the spatial nonstationarity of the relationships between PM2.5 concentrations and predictor variables such as aerosol optical depth (AOD) and meteorological conditions. GW-GBM also overcame the estimation bias of PM2.5 concentrations due to missing AOD retrievals, and thus potentially improved subsequent exposure analyses. GW-GBM showed good performance in predicting daily PM2.5 concentrations (R2 = 0.76, RMSE = 23.0 μg/m3) even with partially missing AOD data, which was better than the original GBM model (R2 = 0.71, RMSE = 25.3 μg/m3). On the basis of the continuous spatiotemporal prediction of PM2.5 concentrations, it was predicted that 95% of the population lived in areas where the estimated annual mean PM2.5 concentration was higher than 35 μg/m3, and 45% of the population was exposed to PM2.5 >75 μg/m3 for over 100 days in 2014. GW-GBM accurately predicted continuous daily PM2.5 concentrations in China for assessing acute human health effects.

  4. Multi–GPU Implementation of Machine Learning Algorithm using CUDA and OpenCL

    Directory of Open Access Journals (Sweden)

    Jan Masek

    2016-06-01

    Full Text Available Using modern Graphic Processing Units (GPUs becomes very useful for computing complex and time consuming processes. GPUs provide high–performance computation capabilities with a good price. This paper deals with a multi–GPU OpenCL and CUDA implementations of k–Nearest Neighbor (k–NN algorithm. This work compares performances of OpenCLand CUDA implementations where each of them is suitable for different number of used attributes. The proposed CUDA algorithm achieves acceleration up to 880x in comparison witha single thread CPU version. The common k-NN was modified to be faster when the lower number of k neighbors is set. The performance of algorithm was verified with two GPUs dual-core NVIDIA GeForce GTX 690 and CPU Intel Core i7 3770 with 4.1 GHz frequency. The results of speed up were measured for one GPU, two GPUs, three and four GPUs. We performed several tests with data sets containing up to 4 million elements with various number of attributes.

  5. Creativity in Machine Learning

    OpenAIRE

    Thoma, Martin

    2016-01-01

    Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summary of the current work and give people who are new to machine learning some starting points.

  6. Drug repositioning for non-small cell lung cancer by using machine learning algorithms and topological graph theory.

    Science.gov (United States)

    Huang, Chien-Hung; Chang, Peter Mu-Hsin; Hsu, Chia-Wei; Huang, Chi-Ying F; Ng, Ka-Lok

    2016-01-11

    Non-small cell lung cancer (NSCLC) is one of the leading causes of death globally, and research into NSCLC has been accumulating steadily over several years. Drug repositioning is the current trend in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development process of drugs, as well as reducing side effects. This work integrates two approaches--machine learning algorithms and topological parameter-based classification--to develop a novel pipeline of drug repositioning to analyze four lung cancer microarray datasets, enriched biological processes, potential therapeutic drugs and targeted genes for NSCLC treatments. A total of 7 (8) and 11 (12) promising drugs (targeted genes) were discovered for treating early- and late-stage NSCLC, respectively. The effectiveness of these drugs is supported by the literature, experimentally determined in-vitro IC50 and clinical trials. This work provides better drug prediction accuracy than competitive research according to IC50 measurements. With the novel pipeline of drug repositioning, the discovery of enriched pathways and potential drugs related to NSCLC can provide insight into the key regulators of tumorigenesis and the treatment of NSCLC. Based on the verified effectiveness of the targeted drugs predicted by this pipeline, we suggest that our drug-finding pipeline is effective for repositioning drugs.

  7. PRGPred: A platform for prediction of domains of resistance gene analogue (RGA in Arecaceae developed using machine learning algorithms

    Directory of Open Access Journals (Sweden)

    MATHODIYIL S. MANJULA

    2015-12-01

    Full Text Available Plant disease resistance genes (R-genes are responsible for initiation of defense mechanism against various phytopathogens. The majority of plant R-genes are members of very large multi-gene families, which encode structurally related proteins containing nucleotide binding site domains (NBS and C-terminal leucine rich repeats (LRR. Other classes possess' an extracellular LRR domain, a transmembrane domain and sometimes, an intracellular serine/threonine kinase domain. R-proteins work in pathogen perception and/or the activation of conserved defense signaling networks. In the present study, sequences representing resistance gene analogues (RGAs of coconut, arecanut, oil palm and date palm were collected from NCBI, sorted based on domains and assembled into a database. The sequences were analyzed in PRINTS database to find out the conserved domains and their motifs present in the RGAs. Based on these domains, we have also developed a tool to predict the domains of palm R-genes using various machine learning algorithms. The model files were selected based on the performance of the best classifier in training and testing. All these information is stored and made available in the online ‘PRGpred' database and prediction tool.

  8. Integrated analysis of CFD data with K-means clustering algorithm and extreme learning machine for localized HVAC control

    International Nuclear Information System (INIS)

    Zhou, Hongming; Soh, Yeng Chai; Wu, Xiaoying

    2015-01-01

    Maintaining a desired comfort level while minimizing the total energy consumed is an interesting optimization problem in Heating, ventilating and air conditioning (HVAC) system control. This paper proposes a localized control strategy that uses Computational Fluid Dynamics (CFD) simulation results and K-means clustering algorithm to optimally partition an air-conditioned room into different zones. The temperature and air velocity results from CFD simulation are combined in two ways: 1) based on the relationship indicated in predicted mean vote (PMV) formula; 2) based on the relationship extracted from ASHRAE RP-884 database using extreme learning machine (ELM). Localized control can then be effected in which each of the zones can be treated individually and an optimal control strategy can be developed based on the partitioning result. - Highlights: • The paper provides a visual guideline for thermal comfort analysis. • CFD, K-means, PMV and ELM are used to analyze thermal conditions within a room. • Localized control strategy could be developed based on our clustering results

  9. Opinion versus practice regarding the use of rehabilitation services in home care: an investigation using machine learning algorithms.

    Science.gov (United States)

    Cheng, Lu; Zhu, Mu; Poss, Jeffrey W; Hirdes, John P; Glenny, Christine; Stolee, Paul

    2015-10-09

    Resources for home care rehabilitation are limited, and many home care clients who could benefit do not receive rehabilitation therapy. The interRAI Contact Assessment (CA) is a new screening instrument comprised of a subset of interRAI Home Care (HC) items, designed to be used as a preliminary assessment to identify which potential home care clients should be referred for a full assessment, or for services such as rehabilitation. We investigated which client characteristics are most relevant in predicting rehabilitation use in the full interRAI HC assessment. We applied two algorithms from machine learning and data mining - the LASSO and the random forest - to frequency matched interRAI HC and service utilization data for home care clients in Ontario, Canada. Analyses confirmed the importance of functional decline and mobility variables in targeting rehabilitation services, but suggested that other items in use as potential predictors may be less relevant. Six of the most highly ranked items related to ambulation. Diagnosis of cancer was highly associated with decreased rehabilitation use; however, cognitive status was not. Inconsistencies between variables considered important for classifying clients who need rehabilitation and those identified in this study based on use may indicate a discrepancy in the client characteristics considered relevant in theory versus actual practice.

  10. Search for gamma-ray emitting AGN among unidentified Fermi-LAT sources using machine learning algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Doert, Marlene [Technische Universitaet Dortmund (Germany); Ruhr-Universitaet Bochum (Germany); Einecke, Sabrina [Technische Universitaet Dortmund (Germany); Errando, Manel [Barnard College, Columbia University, New York City (United States)

    2015-07-01

    The second Fermi-LAT source catalog (2FGL) is the deepest all-sky survey of the gamma-ray sky currently available to the community. Out of the 1873 catalog sources, 576 remain unassociated. We present a search for active galactic nuclei (AGN) among these unassociated objects, which aims at a reduction of the number of unassociated gamma-ray sources and a more complete characterization of the population of gamma-ray emitting AGN. Our study uses two complimentary machine learning algorithms which are individually trained on the gamma-ray properties of associated 2FGL sources and thereafter applied to the unassociated sample. The intersection of the two methods yields a high-confidence sample of 231 AGN candidate sources. We estimate the performance of the classification by taking inherent differences between the samples of associated and unassociated 2FGL sources into account. A search for infra-red counterparts and first results from follow-up studies in the X-ray band using Swift satellite data for a subset of our AGN candidates are also presented.

  11. Advancing of Land Surface Temperature Retrieval Using Extreme Learning Machine and Spatio-Temporal Adaptive Data Fusion Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Bai

    2015-04-01

    Full Text Available As a critical variable to characterize the biophysical processes in ecological environment, and as a key indicator in the surface energy balance, evapotranspiration and urban heat islands, Land Surface Temperature (LST retrieved from Thermal Infra-Red (TIR images at both high temporal and spatial resolution is in urgent need. However, due to the limitations of the existing satellite sensors, there is no earth observation which can obtain TIR at detailed spatial- and temporal-resolution simultaneously. Thus, several attempts of image fusion by blending the TIR data from high temporal resolution sensor with data from high spatial resolution sensor have been studied. This paper presents a novel data fusion method by integrating image fusion and spatio-temporal fusion techniques, for deriving LST datasets at 30 m spatial resolution from daily MODIS image and Landsat ETM+ images. The Landsat ETM+ TIR data were firstly enhanced based on extreme learning machine (ELM algorithm using neural network regression model, from 60 m to 30 m resolution. Then, the MODIS LST and enhanced Landsat ETM+ TIR data were fused by Spatio-temporal Adaptive Data Fusion Algorithm for Temperature mapping (SADFAT in order to derive high resolution synthetic data. The synthetic images were evaluated for both testing and simulated satellite images. The average difference (AD and absolute average difference (AAD are smaller than 1.7 K, where the correlation coefficient (CC and root-mean-square error (RMSE are 0.755 and 1.824, respectively, showing that the proposed method enhances the spatial resolution of the predicted LST images and preserves the spectral information at the same time.

  12. Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Recent advances in training deep (multi-layer architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the 'Extreme Learning Machine' (ELM approach, which also enables a very rapid training time (∼ 10 minutes. Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5% error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random 'receptive field' sampling of the input ensures the input weight matrix is sparse, with about 90% of weights equal to zero. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST and NORB suggest that the ease of use and accuracy of the ELM algorithm for designing a single-hidden-layer neural network classifier should cause it to be given greater consideration either as a standalone method for simpler problems, or as the final classification stage in deep neural networks applied to more difficult problems.

  13. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    CSIR Research Space (South Africa)

    Adelabu, S

    2013-11-01

    Full Text Available in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples. We performed...

  14. Quantum Machine Learning

    OpenAIRE

    Romero García, Cristian

    2017-01-01

    [EN] In a world in which accessible information grows exponentially, the selection of the appropriate information turns out to be an extremely relevant problem. In this context, the idea of Machine Learning (ML), a subfield of Artificial Intelligence, emerged to face problems in data mining, pattern recognition, automatic prediction, among others. Quantum Machine Learning is an interdisciplinary research area combining quantum mechanics with methods of ML, in which quantum properties allow fo...

  15. An algorithm for the classification of mRNA patterns in eosinophilic esophagitis: Integration of machine learning.

    Science.gov (United States)

    Sallis, Benjamin F; Erkert, Lena; Moñino-Romero, Sherezade; Acar, Utkucan; Wu, Rina; Konnikova, Liza; Lexmond, Willem S; Hamilton, Matthew J; Dunn, W Augustine; Szepfalusi, Zsolt; Vanderhoof, Jon A; Snapper, Scott B; Turner, Jerrold R; Goldsmith, Jeffrey D; Spencer, Lisa A; Nurko, Samuel; Fiebiger, Edda

    2018-04-01

    Diagnostic evaluation of eosinophilic esophagitis (EoE) remains difficult, particularly the assessment of the patient's allergic status. This study sought to establish an automated medical algorithm to assist in the evaluation of EoE. Machine learning techniques were used to establish a diagnostic probability score for EoE, p(EoE), based on esophageal mRNA transcript patterns from biopsies of patients with EoE, gastroesophageal reflux disease and controls. Dimensionality reduction in the training set established weighted factors, which were confirmed by immunohistochemistry. Following weighted factor analysis, p(EoE) was determined by random forest classification. Accuracy was tested in an external test set, and predictive power was assessed with equivocal patients. Esophageal IgE production was quantified with epsilon germ line (IGHE) transcripts and correlated with serum IgE and the T h 2-type mRNA profile to establish an IGHE score for tissue allergy. In the primary analysis, a 3-class statistical model generated a p(EoE) score based on common characteristics of the inflammatory EoE profile. A p(EoE) ≥ 25 successfully identified EoE with high accuracy (sensitivity: 90.9%, specificity: 93.2%, area under the curve: 0.985) and improved diagnosis of equivocal cases by 84.6%. The p(EoE) changed in response to therapy. A secondary analysis loop in EoE patients defined an IGHE score of ≥37.5 for a patient subpopulation with increased esophageal allergic inflammation. The development of intelligent data analysis from a machine learning perspective provides exciting opportunities to improve diagnostic precision and improve patient care in EoE. The p(EoE) and the IGHE score are steps toward the development of decision trees to define EoE subpopulations and, consequently, will facilitate individualized therapy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  16. A Deep Machine Learning Algorithm to Optimize the Forecast of Atmospherics

    Science.gov (United States)

    Russell, A. M.; Alliss, R. J.; Felton, B. D.

    approach is validated using in-situ observations of clouds. All of the hybrid RF-WRF experiments demonstrated here significantly outperform the base WRF local low cloud cover forecasts in terms of the probability of detection and the overall bias. In particular, RF experiments that use only regional three-dimensional moisture predictors from the WRF model produce the highest accuracy when compared to RF experiments that use local surface predictors only or regional inversion predictors only. Furthermore, adding multiple types of WRF predictors and additional WRF predictors to the RF algorithm does not necessarily add more value in the resulting forecasts, indicating that it is better to have a small set of meaningful predictors than to have a vast set of indiscriminately-chosen predictors. This work also reveals that the WRF-based RF approach is highly sensitive to the time period over which the algorithm is trained and evaluated. Future work will focus on developing a similar WRF-based RF model for high cloud prediction and expanding the algorithm to two-dimensions horizontally.

  17. Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms.

    Science.gov (United States)

    Chen, Hongming; Carlsson, Lars; Eriksson, Mats; Varkonyi, Peter; Norinder, Ulf; Nilsson, Ingemar

    2013-06-24

    A novel methodology was developed to build Free-Wilson like local QSAR models by combining R-group signatures and the SVM algorithm. Unlike Free-Wilson analysis this method is able to make predictions for compounds with R-groups not present in a training set. Eleven public data sets were chosen as test cases for comparing the performance of our new method with several other traditional modeling strategies, including Free-Wilson analysis. Our results show that the R-group signature SVM models achieve better prediction accuracy compared with Free-Wilson analysis in general. Moreover, the predictions of R-group signature models are also comparable to the models using ECFP6 fingerprints and signatures for the whole compound. Most importantly, R-group contributions to the SVM model can be obtained by calculating the gradient for R-group signatures. For most of the studied data sets, a significant correlation with that of a corresponding Free-Wilson analysis is shown. These results suggest that the R-group contribution can be used to interpret bioactivity data and highlight that the R-group signature based SVM modeling method is as interpretable as Free-Wilson analysis. Hence the signature SVM model can be a useful modeling tool for any drug discovery project.

  18. Performance of Machine Learning Algorithms for Qualitative and Quantitative Prediction Drug Blockade of hERG1 channel.

    Science.gov (United States)

    Wacker, Soren; Noskov, Sergei Yu

    2018-05-01

    Drug-induced abnormal heart rhythm known as Torsades de Pointes (TdP) is a potential lethal ventricular tachycardia found in many patients. Even newly released anti-arrhythmic drugs, like ivabradine with HCN channel as a primary target, block the hERG potassium current in overlapping concentration interval. Promiscuous drug block to hERG channel may potentially lead to perturbation of the action potential duration (APD) and TdP, especially when with combined with polypharmacy and/or electrolyte disturbances. The example of novel anti-arrhythmic ivabradine illustrates clinically important and ongoing deficit in drug design and warrants for better screening methods. There is an urgent need to develop new approaches for rapid and accurate assessment of how drugs with complex interactions and multiple subcellular targets can predispose or protect from drug-induced TdP. One of the unexpected outcomes of compulsory hERG screening implemented in USA and European Union resulted in large datasets of IC 50 values for various molecules entering the market. The abundant data allows now to construct predictive machine-learning (ML) models. Novel ML algorithms and techniques promise better accuracy in determining IC 50 values of hERG blockade that is comparable or surpassing that of the earlier QSAR or molecular modeling technique. To test the performance of modern ML techniques, we have developed a computational platform integrating various workflows for quantitative structure activity relationship (QSAR) models using data from the ChEMBL database. To establish predictive powers of ML-based algorithms we computed IC 50 values for large dataset of molecules and compared it to automated patch clamp system for a large dataset of hERG blocking and non-blocking drugs, an industry gold standard in studies of cardiotoxicity. The optimal protocol with high sensitivity and predictive power is based on the novel eXtreme gradient boosting (XGBoost) algorithm. The ML-platform with XGBoost

  19. Assessment of genetic and nongenetic interactions for the prediction of depressive symptomatology: an analysis of the Wisconsin Longitudinal Study using machine learning algorithms.

    Science.gov (United States)

    Roetker, Nicholas S; Page, C David; Yonker, James A; Chang, Vicky; Roan, Carol L; Herd, Pamela; Hauser, Taissa S; Hauser, Robert M; Atwood, Craig S

    2013-10-01

    We examined depression within a multidimensional framework consisting of genetic, environmental, and sociobehavioral factors and, using machine learning algorithms, explored interactions among these factors that might better explain the etiology of depressive symptoms. We measured current depressive symptoms using the Center for Epidemiologic Studies Depression Scale (n = 6378 participants in the Wisconsin Longitudinal Study). Genetic factors were 78 single nucleotide polymorphisms (SNPs); environmental factors-13 stressful life events (SLEs), plus a composite proportion of SLEs index; and sociobehavioral factors-18 personality, intelligence, and other health or behavioral measures. We performed traditional SNP associations via logistic regression likelihood ratio testing and explored interactions with support vector machines and Bayesian networks. After correction for multiple testing, we found no significant single genotypic associations with depressive symptoms. Machine learning algorithms showed no evidence of interactions. Naïve Bayes produced the best models in both subsets and included only environmental and sociobehavioral factors. We found no single or interactive associations with genetic factors and depressive symptoms. Various environmental and sociobehavioral factors were more predictive of depressive symptoms, yet their impacts were independent of one another. A genome-wide analysis of genetic alterations using machine learning methodologies will provide a framework for identifying genetic-environmental-sociobehavioral interactions in depressive symptoms.

  20. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  1. An introduction to machine learning with Scikit-Learn

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  2. Hybrid Pareto artificial bee colony algorithm for multi-objective single machine group scheduling problem with sequence-dependent setup times and learning effects.

    Science.gov (United States)

    Yue, Lei; Guan, Zailin; Saif, Ullah; Zhang, Fei; Wang, Hao

    2016-01-01

    Group scheduling is significant for efficient and cost effective production system. However, there exist setup times between the groups, which require to decrease it by sequencing groups in an efficient way. Current research is focused on a sequence dependent group scheduling problem with an aim to minimize the makespan in addition to minimize the total weighted tardiness simultaneously. In most of the production scheduling problems, the processing time of jobs is assumed as fixed. However, the actual processing time of jobs may be reduced due to "learning effect". The integration of sequence dependent group scheduling problem with learning effects has been rarely considered in literature. Therefore, current research considers a single machine group scheduling problem with sequence dependent setup times and learning effects simultaneously. A novel hybrid Pareto artificial bee colony algorithm (HPABC) with some steps of genetic algorithm is proposed for current problem to get Pareto solutions. Furthermore, five different sizes of test problems (small, small medium, medium, large medium, large) are tested using proposed HPABC. Taguchi method is used to tune the effective parameters of the proposed HPABC for each problem category. The performance of HPABC is compared with three famous multi objective optimization algorithms, improved strength Pareto evolutionary algorithm (SPEA2), non-dominated sorting genetic algorithm II (NSGAII) and particle swarm optimization algorithm (PSO). Results indicate that HPABC outperforms SPEA2, NSGAII and PSO and gives better Pareto optimal solutions in terms of diversity and quality for almost all the instances of the different sizes of problems.

  3. Clojure for machine learning

    CERN Document Server

    Wali, Akhil

    2014-01-01

    A book that brings out the strengths of Clojure programming that have to facilitate machine learning. Each topic is described in substantial detail, and examples and libraries in Clojure are also demonstrated.This book is intended for Clojure developers who want to explore the area of machine learning. Basic understanding of the Clojure programming language is required, but thorough acquaintance with the standard Clojure library or any libraries are not required. Familiarity with theoretical concepts and notation of mathematics and statistics would be an added advantage.

  4. International Conference on Extreme Learning Machines 2014

    CERN Document Server

    Mao, Kezhi; Cambria, Erik; Man, Zhihong; Toh, Kar-Ann

    2015-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2014, which was held in Singapore, December 8-10, 2014. This conference brought together the researchers and practitioners of Extreme Learning Machine (ELM) from a variety of fields to promote research and development of “learning without iterative tuning”.  The book covers theories, algorithms and applications of ELM. It gives the readers a glance of the most recent advances of ELM.  

  5. International Conference on Extreme Learning Machine 2015

    CERN Document Server

    Mao, Kezhi; Wu, Jonathan; Lendasse, Amaury; ELM 2015; Theory, Algorithms and Applications (I); Theory, Algorithms and Applications (II)

    2016-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2015, which was held in Hangzhou, China, December 15-17, 2015. This conference brought together researchers and engineers to share and exchange R&D experience on both theoretical studies and practical applications of the Extreme Learning Machine (ELM) technique and brain learning. This book covers theories, algorithms ad applications of ELM. It gives readers a glance of the most recent advances of ELM. .

  6. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  7. Quantum machine learning for quantum anomaly detection

    Science.gov (United States)

    Liu, Nana; Rebentrost, Patrick

    2018-04-01

    Anomaly detection is used for identifying data that deviate from "normal" data patterns. Its usage on classical data finds diverse applications in many important areas such as finance, fraud detection, medical diagnoses, data cleaning, and surveillance. With the advent of quantum technologies, anomaly detection of quantum data, in the form of quantum states, may become an important component of quantum applications. Machine-learning algorithms are playing pivotal roles in anomaly detection using classical data. Two widely used algorithms are the kernel principal component analysis and the one-class support vector machine. We find corresponding quantum algorithms to detect anomalies in quantum states. We show that these two quantum algorithms can be performed using resources that are logarithmic in the dimensionality of quantum states. For pure quantum states, these resources can also be logarithmic in the number of quantum states used for training the machine-learning algorithm. This makes these algorithms potentially applicable to big quantum data applications.

  8. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  9. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

    Science.gov (United States)

    Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

    2011-01-01

    Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433

  10. Machine Learning for Security

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  11. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop

  12. An introduction to quantum machine learning

    Science.gov (United States)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  13. Neuromorphic Deep Learning Machines

    OpenAIRE

    Neftci, E; Augustine, C; Paul, S; Detorakis, G

    2017-01-01

    An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Back Propagation (BP) rule, often relies on the immediate availability of network-wide...

  14. What is the machine learning?

    Science.gov (United States)

    Chang, Spencer; Cohen, Timothy; Ostdiek, Bryan

    2018-03-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables—aided by physical intuition—that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus nonlinear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

  15. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  16. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  17. Teraflop-scale Incremental Machine Learning

    OpenAIRE

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-u...

  18. Machine Learning for Flapping Wing Flight Control

    NARCIS (Netherlands)

    Goedhart, Menno; van Kampen, E.; Armanini, S.F.; de Visser, C.C.; Chu, Q.

    2018-01-01

    Flight control of Flapping Wing Micro Air Vehicles is challenging, because of their complex dynamics and variability due to manufacturing inconsistencies. Machine Learning algorithms can be used to tackle these challenges. A Policy Gradient algorithm is used to tune the gains of a

  19. Comparison of Different Machine Learning Algorithms for Lithological Mapping Using Remote Sensing Data and Morphological Features: A Case Study in Kurdistan Region, NE Iraq

    Science.gov (United States)

    Othman, Arsalan; Gloaguen, Richard

    2015-04-01

    Topographic effects and complex vegetation cover hinder lithology classification in mountain regions based not only in field, but also in reflectance remote sensing data. The area of interest "Bardi-Zard" is located in the NE of Iraq. It is part of the Zagros orogenic belt, where seven lithological units outcrop and is known for its chromite deposit. The aim of this study is to compare three machine learning algorithms (MLAs): Maximum Likelihood (ML), Support Vector Machines (SVM), and Random Forest (RF) in the context of a supervised lithology classification task using Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite, its derived, spatial information (spatial coordinates) and geomorphic data. We emphasize the enhancement in remote sensing lithological mapping accuracy that arises from the integration of geomorphic features and spatial information (spatial coordinates) in classifications. This study identifies that RF is better than ML and SVM algorithms in almost the sixteen combination datasets, which were tested. The overall accuracy of the best dataset combination with the RF map for the all seven classes reach ~80% and the producer and user's accuracies are ~73.91% and 76.09% respectively while the kappa coefficient is ~0.76. TPI is more effective with SVM algorithm than an RF algorithm. This paper demonstrates that adding geomorphic indices such as TPI and spatial information in the dataset increases the lithological classification accuracy.

  20. A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

    Directory of Open Access Journals (Sweden)

    Li Zhen

    2008-05-01

    Full Text Available Abstract Background Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML methods. Results The classification performance of Artificial Neural Networks (ANN, K-Nearest Neighbors (KNN, Linear Discriminant Analysis (LDA, Naïve Bayes (NB, Recursive Partitioning and Regression Trees (RPART, and Support Vector Machines (SVM in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5 were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion We have developed a novel simulation model to evaluate machine learning methods for the

  1. The potential for machine learning algorithms to improve and reduce the cost of 3-dimensional printing for surgical planning.

    Science.gov (United States)

    Huff, Trevor J; Ludwig, Parker E; Zuniga, Jorge M

    2018-05-01

    3D-printed anatomical models play an important role in medical and research settings. The recent successes of 3D anatomical models in healthcare have led many institutions to adopt the technology. However, there remain several issues that must be addressed before it can become more wide-spread. Of importance are the problems of cost and time of manufacturing. Machine learning (ML) could be utilized to solve these issues by streamlining the 3D modeling process through rapid medical image segmentation and improved patient selection and image acquisition. The current challenges, potential solutions, and future directions for ML and 3D anatomical modeling in healthcare are discussed. Areas covered: This review covers research articles in the field of machine learning as related to 3D anatomical modeling. Topics discussed include automated image segmentation, cost reduction, and related time constraints. Expert commentary: ML-based segmentation of medical images could potentially improve the process of 3D anatomical modeling. However, until more research is done to validate these technologies in clinical practice, their impact on patient outcomes will remain unknown. We have the necessary computational tools to tackle the problems discussed. The difficulty now lies in our ability to collect sufficient data.

  2. Student Modeling and Machine Learning

    OpenAIRE

    Sison , Raymund; Shimura , Masamichi

    1998-01-01

    After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine le...

  3. Quantum Machine Learning

    Science.gov (United States)

    Biswas, Rupak

    2018-01-01

    Quantum computing promises an unprecedented ability to solve intractable problems by harnessing quantum mechanical effects such as tunneling, superposition, and entanglement. The Quantum Artificial Intelligence Laboratory (QuAIL) at NASA Ames Research Center is the space agency's primary facility for conducting research and development in quantum information sciences. QuAIL conducts fundamental research in quantum physics but also explores how best to exploit and apply this disruptive technology to enable NASA missions in aeronautics, Earth and space sciences, and space exploration. At the same time, machine learning has become a major focus in computer science and captured the imagination of the public as a panacea to myriad big data problems. In this talk, we will discuss how classical machine learning can take advantage of quantum computing to significantly improve its effectiveness. Although we illustrate this concept on a quantum annealer, other quantum platforms could be used as well. If explored fully and implemented efficiently, quantum machine learning could greatly accelerate a wide range of tasks leading to new technologies and discoveries that will significantly change the way we solve real-world problems.

  4. Efficient tuning in supervised machine learning

    NARCIS (Netherlands)

    Koch, Patrick

    2013-01-01

    The tuning of learning algorithm parameters has become more and more important during the last years. With the fast growth of computational power and available memory databases have grown dramatically. This is very challenging for the tuning of parameters arising in machine learning, since the

  5. Hungarian contribution to the Global Soil Organic Carbon Map (GSOC17) using advanced machine learning algorithms and geostatistics

    Science.gov (United States)

    Szatmári, Gábor; Laborczi, Annamária; Takács, Katalin; Pásztor, László

    2017-04-01

    The knowledge about soil organic carbon (SOC) baselines and changes, and the detection of vulnerable hot spots for SOC losses and gains under climate change and changed land management is still fairly limited. Thus Global Soil Partnership (GSP) has been requested to develop a global SOC mapping campaign by 2017. GSPs concept builds on official national data sets, therefore, a bottom-up (country-driven) approach is pursued. The elaborated Hungarian methodology suits the general specifications of GSOC17 provided by GSP. The input data for GSOC17@HU mapping approach has involved legacy soil data bases, as well as proper environmental covariates related to the main soil forming factors, such as climate, organisms, relief and parent material. Nowadays, digital soil mapping (DSM) highly relies on the assumption that soil properties of interest can be modelled as a sum of a deterministic and stochastic component, which can be treated and modelled separately. We also adopted this assumption in our methodology. In practice, multiple regression techniques are commonly used to model the deterministic part. However, this global (and usually linear) models commonly oversimplify the often complex and non-linear relationship, which has a crucial effect on the resulted soil maps. Thus, we integrated machine learning algorithms (namely random forest and quantile regression forest) in the elaborated methodology, supposing then to be more suitable for the problem in hand. This approach has enable us to model the GSOC17 soil properties in that complex and non-linear forms as the soil itself. Furthermore, it has enable us to model and assess the uncertainty of the results, which is highly relevant in decision making. The applied methodology has used geostatistical approach to model the stochastic part of the spatial variability of the soil properties of interest. We created GSOC17@HU map with 1 km grid resolution according to the GSPs specifications. The map contributes to the GSPs

  6. Machine learning in jet physics

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    High energy collider experiments produce several petabytes of data every year. Given the magnitude and complexity of the raw data, machine learning algorithms provide the best available platform to transform and analyse these data to obtain valuable insights to understand Standard Model and Beyond Standard Model theories. These collider experiments produce both quark and gluon initiated hadronic jets as the core components. Deep learning techniques enable us to classify quark/gluon jets through image recognition and help us to differentiate signals and backgrounds in Beyond Standard Model searches at LHC. We are currently working on quark/gluon jet classification and progressing in our studies to find the bias between event generators using domain adversarial neural networks (DANN). We also plan to investigate top tagging, weak supervision on mixed samples in high energy physics, utilizing transfer learning from simulated data to real experimental data.

  7. Algorithmic learning in a random world

    CERN Document Server

    Vovk, Vladimir; Shafer, Glenn

    2005-01-01

    A new scientific monograph developing significant new algorithmic foundations in machine learning theory. Researchers and postgraduates in CS, statistics, and A.I. will find the book an authoritative and formal presentation of some of the most promising theoretical developments in machine learning.

  8. MACHINE LEARNING TECHNIQUES USED IN BIG DATA

    Directory of Open Access Journals (Sweden)

    STEFANIA LOREDANA NITA

    2016-07-01

    Full Text Available The classical tools used in data analysis are not enough in order to benefit of all advantages of big data. The amount of information is too large for a complete investigation, and the possible connections and relations between data could be missed, because it is difficult or even impossible to verify all assumption over the information. Machine learning is a great solution in order to find concealed correlations or relationships between data, because it runs at scale machine and works very well with large data sets. The more data we have, the more the machine learning algorithm is useful, because it “learns” from the existing data and applies the found rules on new entries. In this paper, we present some machine learning algorithms and techniques used in big data.

  9. Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database.

    Science.gov (United States)

    Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee

    2017-07-01

    Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.

  10. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  11. Automatic Classification of Sub-Techniques in Classical Cross-Country Skiing Using a Machine Learning Algorithm on Micro-Sensor Data

    Directory of Open Access Journals (Sweden)

    Ole Marius Hoel Rindal

    2017-12-01

    Full Text Available The automatic classification of sub-techniques in classical cross-country skiing provides unique possibilities for analyzing the biomechanical aspects of outdoor skiing. This is currently possible due to the miniaturization and flexibility of wearable inertial measurement units (IMUs that allow researchers to bring the laboratory to the field. In this study, we aimed to optimize the accuracy of the automatic classification of classical cross-country skiing sub-techniques by using two IMUs attached to the skier’s arm and chest together with a machine learning algorithm. The novelty of our approach is the reliable detection of individual cycles using a gyroscope on the skier’s arm, while a neural network machine learning algorithm robustly classifies each cycle to a sub-technique using sensor data from an accelerometer on the chest. In this study, 24 datasets from 10 different participants were separated into the categories training-, validation- and test-data. Overall, we achieved a classification accuracy of 93.9% on the test-data. Furthermore, we illustrate how an accurate classification of sub-techniques can be combined with data from standard sports equipment including position, altitude, speed and heart rate measuring systems. Combining this information has the potential to provide novel insight into physiological and biomechanical aspects valuable to coaches, athletes and researchers.

  12. A method for the evaluation of image quality according to the recognition effectiveness of objects in the optical remote sensing image using machine learning algorithm.

    Directory of Open Access Journals (Sweden)

    Tao Yuan

    Full Text Available Objective and effective image quality assessment (IQA is directly related to the application of optical remote sensing images (ORSI. In this study, a new IQA method of standardizing the target object recognition rate (ORR is presented to reflect quality. First, several quality degradation treatments with high-resolution ORSIs are implemented to model the ORSIs obtained in different imaging conditions; then, a machine learning algorithm is adopted for recognition experiments on a chosen target object to obtain ORRs; finally, a comparison with commonly used IQA indicators was performed to reveal their applicability and limitations. The results showed that the ORR of the original ORSI was calculated to be up to 81.95%, whereas the ORR ratios of the quality-degraded images to the original images were 65.52%, 64.58%, 71.21%, and 73.11%. The results show that these data can more accurately reflect the advantages and disadvantages of different images in object identification and information extraction when compared with conventional digital image assessment indexes. By recognizing the difference in image quality from the application effect perspective, using a machine learning algorithm to extract regional gray scale features of typical objects in the image for analysis, and quantitatively assessing quality of ORSI according to the difference, this method provides a new approach for objective ORSI assessment.

  13. A method for the evaluation of image quality according to the recognition effectiveness of objects in the optical remote sensing image using machine learning algorithm.

    Science.gov (United States)

    Yuan, Tao; Zheng, Xinqi; Hu, Xuan; Zhou, Wei; Wang, Wei

    2014-01-01

    Objective and effective image quality assessment (IQA) is directly related to the application of optical remote sensing images (ORSI). In this study, a new IQA method of standardizing the target object recognition rate (ORR) is presented to reflect quality. First, several quality degradation treatments with high-resolution ORSIs are implemented to model the ORSIs obtained in different imaging conditions; then, a machine learning algorithm is adopted for recognition experiments on a chosen target object to obtain ORRs; finally, a comparison with commonly used IQA indicators was performed to reveal their applicability and limitations. The results showed that the ORR of the original ORSI was calculated to be up to 81.95%, whereas the ORR ratios of the quality-degraded images to the original images were 65.52%, 64.58%, 71.21%, and 73.11%. The results show that these data can more accurately reflect the advantages and disadvantages of different images in object identification and information extraction when compared with conventional digital image assessment indexes. By recognizing the difference in image quality from the application effect perspective, using a machine learning algorithm to extract regional gray scale features of typical objects in the image for analysis, and quantitatively assessing quality of ORSI according to the difference, this method provides a new approach for objective ORSI assessment.

  14. Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq

    Science.gov (United States)

    Othman, Arsalan A.; Gloaguen, Richard

    2017-09-01

    Lithological mapping in mountainous regions is often impeded by limited accessibility due to relief. This study aims to evaluate (1) the performance of different supervised classification approaches using remote sensing data and (2) the use of additional information such as geomorphology. We exemplify the methodology in the Bardi-Zard area in NE Iraq, a part of the Zagros Fold - Thrust Belt, known for its chromite deposits. We highlighted the improvement of remote sensing geological classification by integrating geomorphic features and spatial information in the classification scheme. We performed a Maximum Likelihood (ML) classification method besides two Machine Learning Algorithms (MLA): Support Vector Machine (SVM) and Random Forest (RF) to allow the joint use of geomorphic features, Band Ratio (BR), Principal Component Analysis (PCA), spatial information (spatial coordinates) and multispectral data of the Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite. The RF algorithm showed reliable results and discriminated serpentinite, talus and terrace deposits, red argillites with conglomerates and limestone, limy conglomerates and limestone conglomerates, tuffites interbedded with basic lavas, limestone and Metamorphosed limestone and reddish green shales. The best overall accuracy (∼80%) was achieved by Random Forest (RF) algorithms in the majority of the sixteen tested combination datasets.

  15. Artificial Mangrove Species Mapping Using Pléiades-1: An Evaluation of Pixel-Based and Object-Based Classifications with Selected Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Dezhi Wang

    2018-02-01

    Full Text Available In the dwindling natural mangrove today, mangrove reforestation projects are conducted worldwide to prevent further losses. Due to monoculture and the low survival rate of artificial mangroves, it is necessary to pay attention to mapping and monitoring them dynamically. Remote sensing techniques have been widely used to map mangrove forests due to their capacity for large-scale, accurate, efficient, and repetitive monitoring. This study evaluated the capability of a 0.5-m Pléiades-1 in classifying artificial mangrove species using both pixel-based and object-based classification schemes. For comparison, three machine learning algorithms—decision tree (DT, support vector machine (SVM, and random forest (RF—were used as the classifiers in the pixel-based and object-based classification procedure. The results showed that both the pixel-based and object-based approaches could recognize the major discriminations between the four major artificial mangrove species. However, the object-based method had a better overall accuracy than the pixel-based method on average. For pixel-based image analysis, SVM produced the highest overall accuracy (79.63%; for object-based image analysis, RF could achieve the highest overall accuracy (82.40%, and it was also the best machine learning algorithm for classifying artificial mangroves. The patches produced by object-based image analysis approaches presented a more generalized appearance and could contiguously depict mangrove species communities. When the same machine learning algorithms were compared by McNemar’s test, a statistically significant difference in overall classification accuracy between the pixel-based and object-based classifications only existed in the RF algorithm. Regarding species, monoculture and dominant mangrove species Sonneratia apetala group 1 (SA1 as well as partly mixed and regular shape mangrove species Hibiscus tiliaceus (HT could well be identified. However, for complex and easily

  16. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  17. Application of machine learning methods in bioinformatics

    Science.gov (United States)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  18. Diagnostic information system dynamics in the evaluation of machine learning algorithms for the supervision of energy efficiency of district heating-supplied buildings

    International Nuclear Information System (INIS)

    Kiluk, Sebastian

    2017-01-01

    Highlights: • Energy efficiency classification sustainability benefits from knowledge prediction. • Diagnostic classification can be validated with its dynamics and current data. • Diagnostic classification dynamics provides novelty extraction for knowledge update. • Data mining comparison can be performed with knowledge dynamics and uncertainty. • Diagnostic information refinement benefits form comparing classifiers dynamics. - Abstract: Modern ways of exploring the diagnostic knowledge provided by data mining and machine learning raise some concern about the ways of evaluating the quality of output knowledge, usually represented by information systems. Especially in district heating, the stationarity of efficiency models, and thus the relevance of diagnostic classification system, cannot be ensured due to the impact of social, economic or technological changes, which are hard to identify or predict. Therefore, data mining and machine learning have become an attractive strategy for automatically and continuously absorbing such dynamics. This paper presents a new method of evaluation and comparison of diagnostic information systems gathered algorithmically in district heating efficiency supervision based on exploring the evolution of information system and analyzing its dynamic features. The process of data mining and knowledge discovery was applied to the data acquired from district heating substations’ energy meters to provide the automated discovery of diagnostic knowledge base necessary for the efficiency supervision of district heating-supplied buildings. The implemented algorithm consists of several steps of processing the billing data, including preparation, segmentation, aggregation and knowledge discovery stage, where classes of abstract models representing energy efficiency constitute an information system representing diagnostic knowledge about the energy efficiency of buildings favorably operating under similar climate conditions and

  19. Machine Learning and Applied Linguistics

    OpenAIRE

    Vajjala, Sowmya

    2018-01-01

    This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.

  20. Machine learning applications in genetics and genomics.

    Science.gov (United States)

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  1. Machine Learning of Fault Friction

    Science.gov (United States)

    Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.

    2017-12-01

    We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025

  2. Machine learning techniques applied to system characterization and equalization

    DEFF Research Database (Denmark)

    Zibar, Darko; Thrane, Jakob; Wass, Jesper

    2016-01-01

    Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals.......Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals....

  3. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  4. Algorithms for Reinforcement Learning

    CERN Document Server

    Szepesvari, Csaba

    2010-01-01

    Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms'

  5. BELM: Bayesian extreme learning machine.

    Science.gov (United States)

    Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J

    2011-03-01

    The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.

  6. Assessment of Beer Quality Based on a Robotic Pourer, Computer Vision, and Machine Learning Algorithms Using Commercial Beers.

    Science.gov (United States)

    Gonzalez Viejo, Claudia; Fuentes, Sigfredo; Torrico, Damir D; Howell, Kate; Dunshea, Frank R

    2018-05-01

    Sensory attributes of beer are directly linked to perceived foam-related parameters and beer color. The aim of this study was to develop an objective predictive model using machine learning modeling to assess the intensity levels of sensory descriptors in beer using the physical measurements of color and foam-related parameters. A robotic pourer (RoboBEER), was used to obtain 15 color and foam-related parameters from 22 different commercial beer samples. A sensory session using quantitative descriptive analysis (QDA ® ) with trained panelists was conducted to assess the intensity of 10 beer descriptors. Results showed that the principal component analysis explained 64% of data variability with correlations found between foam-related descriptors from sensory and RoboBEER such as the positive and significant correlation between carbon dioxide and carbonation mouthfeel (R = 0.62), correlation of viscosity to sensory, and maximum volume of foam and total lifetime of foam (R = 0.75, R = 0.77, respectively). Using the RoboBEER parameters as inputs, an artificial neural network (ANN) regression model showed high correlation (R = 0.91) to predict the intensity levels of 10 related sensory descriptors such as yeast, grains and hops aromas, hops flavor, bitter, sour and sweet tastes, viscosity, carbonation, and astringency. This paper is a novel approach for food science using machine modeling techniques that could contribute significantly to rapid screenings of food and brewage products for the food industry and the implementation of Artificial Intelligence (AI). The use of RoboBEER to assess beer quality showed to be a reliable, objective, accurate, and less time-consuming method to predict sensory descriptors compared to trained sensory panels. Hence, this method could be useful as a rapid screening procedure to evaluate beer quality at the end of the production line for industry applications. © 2018 Institute of Food Technologists®.

  7. Machine Learning Algorithms Utilizing Quantitative CT Features May Predict Eventual Onset of Bronchiolitis Obliterans Syndrome After Lung Transplantation.

    Science.gov (United States)

    Barbosa, Eduardo J Mortani; Lanclus, Maarten; Vos, Wim; Van Holsbeke, Cedric; De Backer, William; De Backer, Jan; Lee, James

    2018-02-19

    Long-term survival after lung transplantation (LTx) is limited by bronchiolitis obliterans syndrome (BOS), defined as a sustained decline in forced expiratory volume in the first second (FEV 1 ) not explained by other causes. We assessed whether machine learning (ML) utilizing quantitative computed tomography (qCT) metrics can predict eventual development of BOS. Paired inspiratory-expiratory CT scans of 71 patients who underwent LTx were analyzed retrospectively (BOS [n = 41] versus non-BOS [n = 30]), using at least two different time points. The BOS cohort experienced a reduction in FEV 1 of >10% compared to baseline FEV 1 post LTx. Multifactor analysis correlated declining FEV 1 with qCT features linked to acute inflammation or BOS onset. Student t test and ML were applied on baseline qCT features to identify lung transplant patients at baseline that eventually developed BOS. The FEV 1 decline in the BOS cohort correlated with an increase in the lung volume (P = .027) and in the central airway volume at functional residual capacity (P = .018), not observed in non-BOS patients, whereas the non-BOS cohort experienced a decrease in the central airway volume at total lung capacity with declining FEV 1 (P = .039). Twenty-three baseline qCT parameters could significantly distinguish between non-BOS patients and eventual BOS developers (P machine), we could identify BOS developers at baseline with an accuracy of 85%, using only three qCT parameters. ML utilizing qCT could discern distinct mechanisms driving FEV 1 decline in BOS and non-BOS LTx patients and predict eventual onset of BOS. This approach may become useful to optimize management of LTx patients. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  8. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.

    Science.gov (United States)

    McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne

    2018-04-01

    Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Intelligent Vehicle Power Management Using Machine Learning and Fuzzy Logic

    National Research Council Canada - National Science Library

    Chen, ZhiHang; Masrur, M. A; Murphey, Yi L

    2008-01-01

    .... A machine learning algorithm, LOPPS, has been developed to learn about optimal power source combinations with respect to minimum power loss for all possible load requests and various system power states...

  10. Machine learning topological states

    Science.gov (United States)

    Deng, Dong-Ling; Li, Xiaopeng; Das Sarma, S.

    2017-11-01

    Artificial neural networks and machine learning have now reached a new era after several decades of improvement where applications are to explode in many fields of science, industry, and technology. Here, we use artificial neural networks to study an intriguing phenomenon in quantum physics—the topological phases of matter. We find that certain topological states, either symmetry-protected or with intrinsic topological order, can be represented with classical artificial neural networks. This is demonstrated by using three concrete spin systems, the one-dimensional (1D) symmetry-protected topological cluster state and the 2D and 3D toric code states with intrinsic topological orders. For all three cases, we show rigorously that the topological ground states can be represented by short-range neural networks in an exact and efficient fashion—the required number of hidden neurons is as small as the number of physical spins and the number of parameters scales only linearly with the system size. For the 2D toric-code model, we find that the proposed short-range neural networks can describe the excited states with Abelian anyons and their nontrivial mutual statistics as well. In addition, by using reinforcement learning we show that neural networks are capable of finding the topological ground states of nonintegrable Hamiltonians with strong interactions and studying their topological phase transitions. Our results demonstrate explicitly the exceptional power of neural networks in describing topological quantum states, and at the same time provide valuable guidance to machine learning of topological phases in generic lattice models.

  11. Parsimonious Wavelet Kernel Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Wang Qin

    2015-11-01

    Full Text Available In this study, a parsimonious scheme for wavelet kernel extreme learning machine (named PWKELM was introduced by combining wavelet theory and a parsimonious algorithm into kernel extreme learning machine (KELM. In the wavelet analysis, bases that were localized in time and frequency to represent various signals effectively were used. Wavelet kernel extreme learning machine (WELM maximized its capability to capture the essential features in “frequency-rich” signals. The proposed parsimonious algorithm also incorporated significant wavelet kernel functions via iteration in virtue of Householder matrix, thus producing a sparse solution that eased the computational burden and improved numerical stability. The experimental results achieved from the synthetic dataset and a gas furnace instance demonstrated that the proposed PWKELM is efficient and feasible in terms of improving generalization accuracy and real time performance.

  12. Combination of mass spectrometry-based targeted lipidomics and supervised machine learning algorithms in detecting adulterated admixtures of white rice.

    Science.gov (United States)

    Lim, Dong Kyu; Long, Nguyen Phuoc; Mo, Changyeun; Dong, Ziyuan; Cui, Lingmei; Kim, Giyoung; Kwon, Sung Won

    2017-10-01

    The mixing of extraneous ingredients with original products is a common adulteration practice in food and herbal medicines. In particular, authenticity of white rice and its corresponding blended products has become a key issue in food industry. Accordingly, our current study aimed to develop and evaluate a novel discrimination method by combining targeted lipidomics with powerful supervised learning methods, and eventually introduce a platform to verify the authenticity of white rice. A total of 30 cultivars were collected, and 330 representative samples of white rice from Korea and China as well as seven mixing ratios were examined. Random forests (RF), support vector machines (SVM) with a radial basis function kernel, C5.0, model averaged neural network, and k-nearest neighbor classifiers were used for the classification. We achieved desired results, and the classifiers effectively differentiated white rice from Korea to blended samples with high prediction accuracy for the contamination ratio as low as five percent. In addition, RF and SVM classifiers were generally superior to and more robust than the other techniques. Our approach demonstrated that the relative differences in lysoGPLs can be successfully utilized to detect the adulterated mixing of white rice originating from different countries. In conclusion, the present study introduces a novel and high-throughput platform that can be applied to authenticate adulterated admixtures from original white rice samples. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system.

    Science.gov (United States)

    Bal, Mert; Amasyali, M Fatih; Sever, Hayri; Kose, Guven; Demirhan, Ayse

    2014-01-01

    The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.

  14. Performance Evaluation of the Machine Learning Algorithms Used in Inference Mechanism of a Medical Decision Support System

    Directory of Open Access Journals (Sweden)

    Mert Bal

    2014-01-01

    Full Text Available The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.

  15. Adaptive Machine Aids to Learning.

    Science.gov (United States)

    Starkweather, John A.

    With emphasis on man-machine relationships and on machine evolution, computer-assisted instruction (CAI) is examined in this paper. The discussion includes the background of machine assistance to learning, the current status of CAI, directions of development, the development of criteria for successful instruction, meeting the needs of users,…

  16. Machine Learning: developing an image recognition program : with Python, Scikit Learn and OpenCV

    OpenAIRE

    Nguyen, Minh

    2016-01-01

    Machine Learning is one of the most debated topic in computer world these days, especially after the first Computer Go program has beaten human Go world champion. Among endless application of Machine Learning, image recognition, which problem is processing enormous amount of data from dynamic input. This thesis will present the basic concept of Machine Learning, Machine Learning algorithms, Python programming language and Scikit Learn – a simple and efficient tool for data analysis in P...

  17. Comparison Between Manual Auditing and a Natural Language Process With Machine Learning Algorithm to Evaluate Faculty Use of Standardized Reports in Radiology.

    Science.gov (United States)

    Guimaraes, Carolina V; Grzeszczuk, Robert; Bisset, George S; Donnelly, Lane F

    2018-03-01

    When implementing or monitoring department-sanctioned standardized radiology reports, feedback about individual faculty performance has been shown to be a useful driver of faculty compliance. Most commonly, these data are derived from manual audit, which can be both time-consuming and subject to sampling error. The purpose of this study was to evaluate whether a software program using natural language processing and machine learning could accurately audit radiologist compliance with the use of standardized reports compared with performed manual audits. Radiology reports from a 1-month period were loaded into such a software program, and faculty compliance with use of standardized reports was calculated. For that same period, manual audits were performed (25 reports audited for each of 42 faculty members). The mean compliance rates calculated by automated auditing were then compared with the confidence interval of the mean rate by manual audit. The mean compliance rate for use of standardized reports as determined by manual audit was 91.2% with a confidence interval between 89.3% and 92.8%. The mean compliance rate calculated by automated auditing was 92.0%, within that confidence interval. This study shows that by use of natural language processing and machine learning algorithms, an automated analysis can accurately define whether reports are compliant with use of standardized report templates and language, compared with manual audits. This may avoid significant labor costs related to conducting the manual auditing process. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  18. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    Science.gov (United States)

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  19. ML Confidential : machine learning on encrypted data

    NARCIS (Netherlands)

    Graepel, T.; Lauter, K.; Naehrig, M.; Kwon, T.; Lee, M.-K.; Kwon, D.

    2013-01-01

    We demonstrate that, by using a recently proposed leveled homomorphic encryption scheme, it is possible to delegate the execution of a machine learning algorithm to a computing service while retaining con¿dentiality of the training and test data. Since the computational complexity of the homomorphic

  20. ML Confidential : machine learning on encrypted data

    NARCIS (Netherlands)

    Graepel, T.; Lauter, K.; Naehrig, M.

    2012-01-01

    We demonstrate that by using a recently proposed somewhat homomorphic encryption (SHE) scheme it is possible to delegate the execution of a machine learning (ML) algorithm to a compute service while retaining confidentiality of the training and test data. Since the computational complexity of the

  1. Document Classification Using Distributed Machine Learning

    OpenAIRE

    Aydin, Galip; Hallac, Ibrahim Riza

    2018-01-01

    In this paper, we investigate the performance and success rates of Na\\"ive Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.

  2. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  3. Constructing first-principles phase diagrams of amorphous LixSi using machine-learning-assisted sampling with an evolutionary algorithm

    Science.gov (United States)

    Artrith, Nongnuch; Urban, Alexander; Ceder, Gerbrand

    2018-06-01

    The atomistic modeling of amorphous materials requires structure sizes and sampling statistics that are challenging to achieve with first-principles methods. Here, we propose a methodology to speed up the sampling of amorphous and disordered materials using a combination of a genetic algorithm and a specialized machine-learning potential based on artificial neural networks (ANNs). We show for the example of the amorphous LiSi alloy that around 1000 first-principles calculations are sufficient for the ANN-potential assisted sampling of low-energy atomic configurations in the entire amorphous LixSi phase space. The obtained phase diagram is validated by comparison with the results from an extensive sampling of LixSi configurations using molecular dynamics simulations and a general ANN potential trained to ˜45 000 first-principles calculations. This demonstrates the utility of the approach for the first-principles modeling of amorphous materials.

  4. Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait--a cohort study.

    Science.gov (United States)

    Farran, Bassam; Channanath, Arshad Mohamed; Behbehani, Kazem; Thanaraj, Thangavel Alphonse

    2013-05-14

    We build classification models and risk assessment tools for diabetes, hypertension and comorbidity using machine-learning algorithms on data from Kuwait. We model the increased proneness in diabetic patients to develop hypertension and vice versa. We ascertain the importance of ethnicity (and natives vs expatriate migrants) and of using regional data in risk assessment. Retrospective cohort study. Four machine-learning techniques were used: logistic regression, k-nearest neighbours (k-NN), multifactor dimensionality reduction and support vector machines. The study uses fivefold cross validation to obtain generalisation accuracies and errors. Kuwait Health Network (KHN) that integrates data from primary health centres and hospitals in Kuwait. 270 172 hospital visitors (of which, 89 858 are diabetic, 58 745 hypertensive and 30 522 comorbid) comprising Kuwaiti natives, Asian and Arab expatriates. Incident type 2 diabetes, hypertension and comorbidity. Classification accuracies of >85% (for diabetes) and >90% (for hypertension) are achieved using only simple non-laboratory-based parameters. Risk assessment tools based on k-NN classification models are able to assign 'high' risk to 75% of diabetic patients and to 94% of hypertensive patients. Only 5% of diabetic patients are seen assigned 'low' risk. Asian-specific models and assessments perform even better. Pathological conditions of diabetes in the general population or in hypertensive population and those of hypertension are modelled. Two-stage aggregate classification models and risk assessment tools, built combining both the component models on diabetes (or on hypertension), perform better than individual models. Data on diabetes, hypertension and comorbidity from the cosmopolitan State of Kuwait are available for the first time. This enabled us to apply four different case-control models to assess risks. These tools aid in the preliminary non-intrusive assessment of the population. Ethnicity is seen significant

  5. A Novel Flavour Tagging Algorithm using Machine Learning Techniques and a Precision Measurement of the $B^0 - \\overline{B^0}$ Oscillation Frequency at the LHCb Experiment

    CERN Document Server

    Kreplin, Katharina

    This thesis presents a novel flavour tagging algorithm using machine learning techniques and a precision measurement of the $B^0 -\\overline{B^0}$ oscillation frequency $\\Delta m_d$ using semileptonic $B^0$ decays. The LHC Run I data set is used which corresponds to $3 \\textrm{fb}^{-1}$ of data taken by the LHCb experiment at a center-of-mass energy of 7 TeV and 8 TeV. The performance of flavour tagging algorithms, exploiting the $b\\bar{b}$ pair production and the $b$ quark hadronization, is relatively low at the LHC due to the large amount of soft QCD background in inelastic proton-proton collisions. The standard approach is a cut-based selection of particles, whose charges are correlated to the production flavour of the $B$ meson. The novel tagging algorithm classifies the particles using an artificial neural network (ANN). It assigns higher weights to particles, which are likely to be correlated to the $b$ flavour. A second ANN combines the particles with the highest weights to derive the tagging decision. ...

  6. Machine learning enhanced optical distance sensor

    Science.gov (United States)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  7. Quantum algorithm for support matrix machines

    Science.gov (United States)

    Duan, Bojia; Yuan, Jiabin; Liu, Ying; Li, Dan

    2017-09-01

    We propose a quantum algorithm for support matrix machines (SMMs) that efficiently addresses an image classification problem by introducing a least-squares reformulation. This algorithm consists of two core subroutines: a quantum matrix inversion (Harrow-Hassidim-Lloyd, HHL) algorithm and a quantum singular value thresholding (QSVT) algorithm. The two algorithms can be implemented on a universal quantum computer with complexity O[log(npq) ] and O[log(pq)], respectively, where n is the number of the training data and p q is the size of the feature space. By iterating the algorithms, we can find the parameters for the SMM classfication model. Our analysis shows that both HHL and QSVT algorithms achieve an exponential increase of speed over their classical counterparts.

  8. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  9. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  10. Machine learning of molecular properties: Locality and active learning

    Science.gov (United States)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  11. Machine learning applied to crime prediction

    OpenAIRE

    Vaquero Barnadas, Miquel

    2016-01-01

    Machine Learning is a cornerstone when it comes to artificial intelligence and big data analysis. It provides powerful algorithms that are capable of recognizing patterns, classifying data, and, basically, learn by themselves to perform a specific task. This field has incredibly grown in popularity these days, however, it still remains unknown for the majority of people, and even for most professionals. This project intends to provide an understandable explanation of what is it, what types ar...

  12. A Learning Algorithm based on High School Teaching Wisdom

    OpenAIRE

    Philip, Ninan Sajeeth

    2010-01-01

    A learning algorithm based on primary school teaching and learning is presented. The methodology is to continuously evaluate a student and to give them training on the examples for which they repeatedly fail, until, they can correctly answer all types of questions. This incremental learning procedure produces better learning curves by demanding the student to optimally dedicate their learning time on the failed examples. When used in machine learning, the algorithm is found to train a machine...

  13. Virtual Things for Machine Learning Applications

    OpenAIRE

    Bovet , Gérôme; Ridi , Antonio; Hennebert , Jean

    2014-01-01

    International audience; Internet-of-Things (IoT) devices, especially sensors are pro-ducing large quantities of data that can be used for gather-ing knowledge. In this field, machine learning technologies are increasingly used to build versatile data-driven models. In this paper, we present a novel architecture able to ex-ecute machine learning algorithms within the sensor net-work, presenting advantages in terms of privacy and data transfer efficiency. We first argument that some classes of ...

  14. The ATLAS Higgs Machine Learning Challenge

    CERN Document Server

    Cowan, Glen; The ATLAS collaboration; Bourdarios, Claire

    2015-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 1990s with Artificial Neural Net and more recently with Boosted Decision Trees, Random Forest etc. Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, data scientists are developing new Machine Learning algorithms to extract meaning from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, and at the same time data scientists have advanced algorithms: the goal of the HiggsML project was to bring the two together by a “challenge”: participants from all over the world and any scientific background could compete online to obtain the best Higgs to tau tau signal significance on a set of ATLAS fully simulated Monte Carlo signal and background. Instead of HEP physicists browsing through machine learning papers and trying to infer which new algorithms might be useful for HEP, then c...

  15. Unsupervised learning algorithms

    CERN Document Server

    Aydin, Kemal

    2016-01-01

    This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...

  16. Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

    Science.gov (United States)

    Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

    2016-02-01

    Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  17. Computer and machine vision theory, algorithms, practicalities

    CERN Document Server

    Davies, E R

    2012-01-01

    Computer and Machine Vision: Theory, Algorithms, Practicalities (previously entitled Machine Vision) clearly and systematically presents the basic methodology of computer and machine vision, covering the essential elements of the theory while emphasizing algorithmic and practical design constraints. This fully revised fourth edition has brought in more of the concepts and applications of computer vision, making it a very comprehensive and up-to-date tutorial text suitable for graduate students, researchers and R&D engineers working in this vibrant subject. Key features include: Practical examples and case studies give the 'ins and outs' of developing real-world vision systems, giving engineers the realities of implementing the principles in practice New chapters containing case studies on surveillance and driver assistance systems give practical methods on these cutting-edge applications in computer vision Necessary mathematics and essential theory are made approachable by careful explanations and well-il...

  18. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief.

    Science.gov (United States)

    Douglas, P K; Harris, Sam; Yuille, Alan; Cohen, Mark S

    2011-05-15

    Machine learning (ML) has become a popular tool for mining functional neuroimaging data, and there are now hopes of performing such analyses efficiently in real-time. Towards this goal, we compared accuracy of six different ML algorithms applied to neuroimaging data of persons engaged in a bivariate task, asserting their belief or disbelief of a variety of propositional statements. We performed unsupervised dimension reduction and automated feature extraction using independent component (IC) analysis and extracted IC time courses. Optimization of classification hyperparameters across each classifier occurred prior to assessment. Maximum accuracy was achieved at 92% for Random Forest, followed by 91% for AdaBoost, 89% for Naïve Bayes, 87% for a J48 decision tree, 86% for K*, and 84% for support vector machine. For real-time decoding applications, finding a parsimonious subset of diagnostic ICs might be useful. We used a forward search technique to sequentially add ranked ICs to the feature subspace. For the current data set, we determined that approximately six ICs represented a meaningful basis set for classification. We then projected these six IC spatial maps forward onto a later scanning session within subject. We then applied the optimized ML algorithms to these new data instances, and found that classification accuracy results were reproducible. Additionally, we compared our classification method to our previously published general linear model results on this same data set. The highest ranked IC spatial maps show similarity to brain regions associated with contrasts for belief > disbelief, and disbelief < belief. Copyright © 2010 Elsevier Inc. All rights reserved.

  19. A Teaching System To Learn Programming: the Programmer's Learning Machine

    OpenAIRE

    Quinson , Martin; Oster , Gérald

    2015-01-01

    International audience; The Programmer's Learning Machine (PLM) is an interactive exerciser for learning programming and algorithms. Using an integrated and graphical environment that provides a short feedback loop, it allows students to learn in a (semi)-autonomous way. This generic platform also enables teachers to create specific programming microworlds that match their teaching goals. This paper discusses our design goals and motivations, introduces the existing material and the proposed ...

  20. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  1. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  2. Introduction to machine learning for brain imaging.

    Science.gov (United States)

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.

  3. Quantum machine learning: a classical perspective.

    Science.gov (United States)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  4. Quantum machine learning: a classical perspective

    Science.gov (United States)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  5. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  6. Machine learning for healthcare technologies

    CERN Document Server

    Clifton, David A

    2016-01-01

    This book brings together chapters on the state-of-the-art in machine learning (ML) as it applies to the development of patient-centred technologies, with a special emphasis on 'big data' and mobile data.

  7. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  8. Using human brain activity to guide machine learning.

    Science.gov (United States)

    Fong, Ruth C; Scheirer, Walter J; Cox, David D

    2018-03-29

    Machine learning is a field of computer science that builds algorithms that learn. In many cases, machine learning algorithms are used to recreate a human ability like adding a caption to a photo, driving a car, or playing a game. While the human brain has long served as a source of inspiration for machine learning, little effort has been made to directly use data collected from working brains as a guide for machine learning algorithms. Here we demonstrate a new paradigm of "neurally-weighted" machine learning, which takes fMRI measurements of human brain activity from subjects viewing images, and infuses these data into the training process of an object recognition learning algorithm to make it more consistent with the human brain. After training, these neurally-weighted classifiers are able to classify images without requiring any additional neural data. We show that our neural-weighting approach can lead to large performance gains when used with traditional machine vision features, as well as to significant improvements with already high-performing convolutional neural network features. The effectiveness of this approach points to a path forward for a new class of hybrid machine learning algorithms which take both inspiration and direct constraints from neuronal data.

  9. Machine Learning of Musical Gestures

    OpenAIRE

    Caramiaux, Baptiste; Tanaka, Atau

    2013-01-01

    We present an overview of machine learning (ML) techniques and theirapplication in interactive music and new digital instruments design. We firstgive to the non-specialist reader an introduction to two ML tasks,classification and regression, that are particularly relevant for gesturalinteraction. We then present a review of the literature in current NIMEresearch that uses ML in musical gesture analysis and gestural sound control.We describe the ways in which machine learning is useful for cre...

  10. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  11. Machine learning in Python essential techniques for predictive analysis

    CERN Document Server

    Bowles, Michael

    2015-01-01

    Learn a simpler and more effective way to analyze data and predict outcomes with Python Machine Learning in Python shows you how to successfully analyze data using only two core machine learning algorithms, and how to apply them using Python. By focusing on two algorithm families that effectively predict outcomes, this book is able to provide full descriptions of the mechanisms at work, and the examples that illustrate the machinery with specific, hackable code. The algorithms are explained in simple terms with no complex math and applied using Python, with guidance on algorithm selection, d

  12. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  13. SU-C-BRA-05: Delineating High-Dose Clinical Target Volumes for Head and Neck Tumors Using Machine Learning Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Cardenas, C [Department of Radiation Physics, The University of Texas M.D. Anderson Cancer Center, Houston, TX (United States); The University of Texas Graduate School of Biomedical Sciences, Houston, TX (United States); Wong, A [Department of Radiation Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, TX (United States); School of Medicine, The University of Texas Health Sciences Center at San Antonio, San Antonio, TX (United States); Mohamed, A; Fuller, C [Department of Radiation Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, TX (United States); Yang, J; Court, L; Aristophanous, M [Department of Radiation Physics, The University of Texas M.D. Anderson Cancer Center, Houston, TX (United States); Rao, A [Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, TX (United States)

    2016-06-15

    Purpose: To develop and test population-based machine learning algorithms for delineating high-dose clinical target volumes (CTVs) in H&N tumors. Automating and standardizing the contouring of CTVs can reduce both physician contouring time and inter-physician variability, which is one of the largest sources of uncertainty in H&N radiotherapy. Methods: Twenty-five node-negative patients treated with definitive radiotherapy were selected (6 right base of tongue, 11 left and 9 right tonsil). All patients had GTV and CTVs manually contoured by an experienced radiation oncologist prior to treatment. This contouring process, which is driven by anatomical, pathological, and patient specific information, typically results in non-uniform margin expansions about the GTV. Therefore, we tested two methods to delineate high-dose CTV given a manually-contoured GTV: (1) regression-support vector machines(SVM) and (2) classification-SVM. These models were trained and tested on each patient group using leave-one-out cross-validation. The volume difference(VD) and Dice similarity coefficient(DSC) between the manual and auto-contoured CTV were calculated to evaluate the results. Distances from GTV-to-CTV were computed about each patient’s GTV and these distances, in addition to distances from GTV to surrounding anatomy in the expansion direction, were utilized in the regression-SVM method. The classification-SVM method used categorical voxel-information (GTV, selected anatomical structures, else) from a 3×3×3cm3 ROI centered about the voxel to classify voxels as CTV. Results: Volumes for the auto-contoured CTVs ranged from 17.1 to 149.1cc and 17.4 to 151.9cc; the average(range) VD between manual and auto-contoured CTV were 0.93 (0.48–1.59) and 1.16(0.48–1.97); while average(range) DSC values were 0.75(0.59–0.88) and 0.74(0.59–0.81) for the regression-SVM and classification-SVM methods, respectively. Conclusion: We developed two novel machine learning methods to delineate

  14. SU-C-BRA-05: Delineating High-Dose Clinical Target Volumes for Head and Neck Tumors Using Machine Learning Algorithms

    International Nuclear Information System (INIS)

    Cardenas, C; Wong, A; Mohamed, A; Fuller, C; Yang, J; Court, L; Aristophanous, M; Rao, A

    2016-01-01

    Purpose: To develop and test population-based machine learning algorithms for delineating high-dose clinical target volumes (CTVs) in H&N tumors. Automating and standardizing the contouring of CTVs can reduce both physician contouring time and inter-physician variability, which is one of the largest sources of uncertainty in H&N radiotherapy. Methods: Twenty-five node-negative patients treated with definitive radiotherapy were selected (6 right base of tongue, 11 left and 9 right tonsil). All patients had GTV and CTVs manually contoured by an experienced radiation oncologist prior to treatment. This contouring process, which is driven by anatomical, pathological, and patient specific information, typically results in non-uniform margin expansions about the GTV. Therefore, we tested two methods to delineate high-dose CTV given a manually-contoured GTV: (1) regression-support vector machines(SVM) and (2) classification-SVM. These models were trained and tested on each patient group using leave-one-out cross-validation. The volume difference(VD) and Dice similarity coefficient(DSC) between the manual and auto-contoured CTV were calculated to evaluate the results. Distances from GTV-to-CTV were computed about each patient’s GTV and these distances, in addition to distances from GTV to surrounding anatomy in the expansion direction, were utilized in the regression-SVM method. The classification-SVM method used categorical voxel-information (GTV, selected anatomical structures, else) from a 3×3×3cm3 ROI centered about the voxel to classify voxels as CTV. Results: Volumes for the auto-contoured CTVs ranged from 17.1 to 149.1cc and 17.4 to 151.9cc; the average(range) VD between manual and auto-contoured CTV were 0.93 (0.48–1.59) and 1.16(0.48–1.97); while average(range) DSC values were 0.75(0.59–0.88) and 0.74(0.59–0.81) for the regression-SVM and classification-SVM methods, respectively. Conclusion: We developed two novel machine learning methods to delineate

  15. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  16. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  17. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    Science.gov (United States)

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  18. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  19. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  20. A novel flavour tagging algorithm using machine learning techniques and a precision measurement of the B0- anti B0 oscillation frequency at the LHCb experiment

    International Nuclear Information System (INIS)

    Kreplin, Katharina

    2015-01-01

    This thesis presents a novel flavour tagging algorithm using machine learning techniques and a precision measurement of the B 0 - anti B 0 oscillation frequency Δm d using semileptonic B 0 decays. The LHC Run I data set is used which corresponds to 3 fb -1 of data taken by the LHCb experiment at a center-of-mass energy of 7 TeV and 8 TeV. The performance of flavour tagging algorithms, exploiting the b anti b pair production and the b quark hadronization, is relatively low at the LHC due to the large amount of soft QCD background in inelastic proton-proton collisions. The standard approach is a cut-based selection of particles, whose charges are correlated to the production flavour of the B meson. The novel tagging algorithm classifies the particles using an artificial neural network (ANN). It assigns higher weights to particles, which are likely to be correlated to the b flavour. A second ANN combines the particles with the highest weights to derive the tagging decision. An increase of the opposite side kaon tagging performance of 50% and 30% is achieved on B + → J/ψK + data. The second number corresponds to a readjustment of the algorithm to the B 0 s production topology. This algorithm is employed in the precision measurement of Δm d . A data set of 3.2 x 10 6 semileptonic B 0 decays is analysed, where the B 0 decays into a D - (K + π - π - ) or D *- (π - anti D 0 (K + π - )) and a μ + ν μ pair. The ν μ is not reconstructed, therefore, the B 0 momentum needs to be statistically corrected for the missing momentum of the neutrino to compute the correct B 0 decay time. A result of Δm d =0.503±0.002(stat.)±0.001(syst.) ps -1 is obtained. This is the world's best measurement of this quantity.

  1. Machine learning with quantum relative entropy

    Energy Technology Data Exchange (ETDEWEB)

    Tsuda, Koji [Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tuebingen, 72076 (Germany)], E-mail: koji.tsuda@tuebingen.mpg.de

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  2. Machine learning with quantum relative entropy

    International Nuclear Information System (INIS)

    Tsuda, Koji

    2009-01-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  3. A Machine Learning Concept for DTN Routing

    Science.gov (United States)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  4. Kernel learning algorithms for face recognition

    CERN Document Server

    Li, Jun-Bao; Pan, Jeng-Shyang

    2013-01-01

    Kernel Learning Algorithms for Face Recognition covers the framework of kernel based face recognition. This book discusses the advanced kernel learning algorithms and its application on face recognition. This book also focuses on the theoretical deviation, the system framework and experiments involving kernel based face recognition. Included within are algorithms of kernel based face recognition, and also the feasibility of the kernel based face recognition method. This book provides researchers in pattern recognition and machine learning area with advanced face recognition methods and its new

  5. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  6. A Modified Method Combined with a Support Vector Machine and Bayesian Algorithms in Biological Information

    Directory of Open Access Journals (Sweden)

    Wen-Gang Zhou

    2015-06-01

    Full Text Available With the deep research of genomics and proteomics, the number of new protein sequences has expanded rapidly. With the obvious shortcomings of high cost and low efficiency of the traditional experimental method, the calculation method for protein localization prediction has attracted a lot of attention due to its convenience and low cost. In the machine learning techniques, neural network and support vector machine (SVM are often used as learning tools. Due to its complete theoretical framework, SVM has been widely applied. In this paper, we make an improvement on the existing machine learning algorithm of the support vector machine algorithm, and a new improved algorithm has been developed, combined with Bayesian algorithms. The proposed algorithm can improve calculation efficiency, and defects of the original algorithm are eliminated. According to the verification, the method has proved to be valid. At the same time, it can reduce calculation time and improve prediction efficiency.

  7. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  8. Machine Learning for ATLAS DDM Network Metrics

    CERN Document Server

    Lassnig, Mario; The ATLAS collaboration; Vamosi, Ralf

    2016-01-01

    The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from our ongoing automation efforts. First, we describe our framework for distributed data management and network metrics, automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  9. Machine learning in cardiovascular medicine: are we there yet?

    Science.gov (United States)

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning

    OpenAIRE

    Tang, Yuan

    2016-01-01

    TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to simplify the process of creating, configuring, training, evaluating, and experimenting a machine learning model. TF.Learn integrates a wide range of state-of-art machine learning algorithms built on top of TensorFlow's low level APIs for small to large-scale supervised and unsupervised problems. This module focuses on bringing machine learning t...

  11. Two-Dimensional Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Bo Jia

    2015-01-01

    (BP networks. However, like many other methods, ELM is originally proposed to handle vector pattern while nonvector patterns in real applications need to be explored, such as image data. We propose the two-dimensional extreme learning machine (2DELM based on the very natural idea to deal with matrix data directly. Unlike original ELM which handles vectors, 2DELM take the matrices as input features without vectorization. Empirical studies on several real image datasets show the efficiency and effectiveness of the algorithm.

  12. Gaussian processes for machine learning.

    Science.gov (United States)

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

  13. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior......Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...

  14. Data Mining and Machine Learning in Astronomy

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  15. Genomic risk prediction of aromatase inhibitor-related arthralgia in patients with breast cancer using a novel machine-learning algorithm.

    Science.gov (United States)

    Reinbolt, Raquel E; Sonis, Stephen; Timmers, Cynthia D; Fernández-Martínez, Juan Luis; Cernea, Ana; de Andrés-Galiana, Enrique J; Hashemi, Sepehr; Miller, Karin; Pilarski, Robert; Lustberg, Maryam B

    2018-01-01

    Many breast cancer (BC) patients treated with aromatase inhibitors (AIs) develop aromatase inhibitor-related arthralgia (AIA). Candidate gene studies to identify AIA risk are limited in scope. We evaluated the potential of a novel analytic algorithm (NAA) to predict AIA using germline single nucleotide polymorphisms (SNP) data obtained before treatment initiation. Systematic chart review of 700 AI-treated patients with stage I-III BC identified asymptomatic patients (n = 39) and those with clinically significant AIA resulting in AI termination or therapy switch (n = 123). Germline DNA was obtained and SNP genotyping performed using the Affymetrix UK BioBank Axiom Array to yield 695,277 SNPs. SNP clusters that most closely defined AIA risk were discovered using an NAA that sequentially combined statistical filtering and a machine-learning algorithm. NCBI PhenGenI and Ensemble databases defined gene attribution of the most discriminating SNPs. Phenotype, pathway, and ontologic analyses assessed functional and mechanistic validity. Demographics were similar in cases and controls. A cluster of 70 SNPs, correlating to 57 genes, was identified. This SNP group predicted AIA occurrence with a maximum accuracy of 75.93%. Strong associations with arthralgia, breast cancer, and estrogen phenotypes were seen in 19/57 genes (33%) and were functionally consistent. Using a NAA, we identified a 70 SNP cluster that predicted AIA risk with fair accuracy. Phenotype, functional, and pathway analysis of attributed genes was consistent with clinical phenotypes. This study is the first to link a specific SNP/gene cluster to AIA risk independent of candidate gene bias. © 2017 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

  16. Implementing Machine Learning in Radiology Practice and Research.

    Science.gov (United States)

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  17. Machine Learning and Inverse Problem in Geodynamics

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R.

    2017-12-01

    During the past few decades numerical modeling and traditional HPC have been widely deployed in many diverse fields for problem solutions. However, in recent years the rapid emergence of machine learning (ML), a subfield of the artificial intelligence (AI), in many fields of sciences, engineering, and finance seems to mark a turning point in the replacement of traditional modeling procedures with artificial intelligence-based techniques. The study of the circulation in the interior of Earth relies on the study of high pressure mineral physics, geochemistry, and petrology where the number of the mantle parameters is large and the thermoelastic parameters are highly pressure- and temperature-dependent. More complexity arises from the fact that many of these parameters that are incorporated in the numerical models as input parameters are not yet well established. In such complex systems the application of machine learning algorithms can play a valuable role. Our focus in this study is the application of supervised machine learning (SML) algorithms in predicting mantle properties with the emphasis on SML techniques in solving the inverse problem. As a sample problem we focus on the spin transition in ferropericlase and perovskite that may cause slab and plume stagnation at mid-mantle depths. The degree of the stagnation depends on the degree of negative density anomaly at the spin transition zone. The training and testing samples for the machine learning models are produced by the numerical convection models with known magnitudes of density anomaly (as the class labels of the samples). The volume fractions of the stagnated slabs and plumes which can be considered as measures for the degree of stagnation are assigned as sample features. The machine learning models can determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at mid-mantle depths. Employing support vector machine (SVM) algorithms we show that SML techniques

  18. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  19. Kernel based machine learning algorithm for the efficient prediction of type III polyketide synthase family of proteins

    Directory of Open Access Journals (Sweden)

    Mallika V

    2010-03-01

    Full Text Available Type III Polyketide synthases (PKS are family of proteins considered to have significant role in the biosynthesis of various polyketides in plants, fungi and bacteria. As these proteins show positive effects to human health, more researches are going on regarding this particular protein. Developing a tool to identify the probability of sequence, being a type III polyketide synthase will minimize the time consumption and manpower efforts. In this approach, we have designed and implemented PKSIIIpred, a high performance prediction server for type III PKS where the classifier is Support Vector Machine (SVM. Based on the limited training dataset, the tool efficiently predicts the type III PKS superfamily of proteins with high sensitivity and specificity. PKSIIIpred is available at http://type3pks.in/prediction/. We expect that this tool may serve as a useful resource for type III PKS researchers. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.

  20. A review of machine learning in obesity.

    Science.gov (United States)

    DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M

    2018-05-01

    Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.

  1. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  2. Quantum machine learning what quantum computing means to data mining

    CERN Document Server

    Wittek, Peter

    2014-01-01

    Quantum Machine Learning bridges the gap between abstract developments in quantum computing and the applied research on machine learning. Paring down the complexity of the disciplines involved, it focuses on providing a synthesis that explains the most important machine learning algorithms in a quantum framework. Theoretical advances in quantum computing are hard to follow for computer scientists, and sometimes even for researchers involved in the field. The lack of a step-by-step guide hampers the broader understanding of this emergent interdisciplinary body of research. Quantum Machine L

  3. Deep learning: Using machine learning to study biological vision

    OpenAIRE

    Majaj, Najib; Pelli, Denis

    2017-01-01

    Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.

  4. Higgs Machine Learning Challenge 2014

    CERN Document Server

    Olivier, A-P; Bourdarios, C ; LAL / Orsay; Goldfarb, S ; University of Michigan

    2014-01-01

    High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics. We are now exploring new ways to improve the cross-fertilization of the two fields by setting up a data challenge, following the footsteps of, among others, the astrophysics community (dark matter and galaxy zoo challenges) and neurobiology (connectomics and decoding the human brain). The organization committee consists of ATLAS physicists and machine learning researchers. The Challenge will run from Monday 12th to September 2014.

  5. Parallel algorithms on the ASTRA SIMD machine

    International Nuclear Information System (INIS)

    Odor, G.; Rohrbach, F.; Vesztergombi, G.; Varga, G.; Tatrai, F.

    1996-01-01

    In view of the tremendous computing power jump of modern RISC processors the interest in parallel computing seems to be thinning out. Why use a complicated system of parallel processors, if the problem can be solved by a single powerful micro-chip. It is a general law, however, that exponential growth will always end by some kind of a saturation, and then parallelism will again become a hot topic. We try to prepare ourselves for this eventuality. The MPPC project started in 1990 in the keydeys of parallelism and produced four ASTRA machines (presented at CHEP's 92) with 4k processors (which are expandable to 16k) based on yesterday's chip-technology (chip presented at CHEP'91). These machines now provide excellent test-beds for algorithmic developments in a complete, real environment. We are developing for example fast-pattern recognition algorithms which could be used in high-energy physics experiments at the LHC (planned to be operational after 2004 at CERN) for triggering and data reduction. The basic feature of our ASP (Associate String Processor) approach is to use extremely simple (thus very cheap) processor elements but in huge quantities (up to millions of processors) connected together by a very simple string-like communication chain. In this paper we present powerful algorithms based on this architecture indicating the performance perspectives if the hardware quality reaches present or even future technology levels. (author)

  6. Exploring the Potential of WorldView-2 Red-Edge Band-Based Vegetation Indices for Estimation of Mangrove Leaf Area Index with Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Yuanhui Zhu

    2017-10-01

    Full Text Available To accurately estimate leaf area index (LAI in mangrove areas, the selection of appropriate models and predictor variables is critical. However, there is a major challenge in quantifying and mapping LAI using multi-spectral sensors due to the saturation effects of traditional vegetation indices (VIs for mangrove forests. WorldView-2 (WV2 imagery has proven to be effective to estimate LAI of grasslands and forests, but the sensitivity of its vegetation indices (VIs has been uncertain for mangrove forests. Furthermore, the single model may exhibit certain randomness and instability in model calibration and estimation accuracy. Therefore, this study aims to explore the sensitivity of WV2 VIs for estimating mangrove LAI by comparing artificial neural network regression (ANNR, support vector regression (SVR and random forest regression (RFR. The results suggest that the RFR algorithm yields the best results (RMSE = 0.45, 14.55% of the average LAI, followed by ANNR (RMSE = 0.49, 16.04% of the average LAI, and then SVR (RMSE = 0.51, 16.56% of the average LAI algorithms using 5-fold cross validation (CV using all VIs. Quantification of the variable importance shows that the VIs derived from the red-edge band consistently remain the most important contributor to LAI estimation. When the red-edge band-derived VIs are removed from the models, estimation accuracies measured in relative RMSE (RMSEr decrease by 3.79%, 2.70% and 4.47% for ANNR, SVR and RFR models respectively. VIs derived from red-edge band also yield better accuracy compared with other traditional bands of WV2, such as near-infrared-1 and near-infrared-2 band. Furthermore, the estimated LAI values vary significantly across different mangrove species. The study demonstrates the utility of VIs of WV2 imagery and the selected machine-learning algorithms in developing LAI models in mangrove forests. The results indicate that the red-edge band of WV2 imagery can help alleviate the saturation

  7. Manifold learning in machine vision and robotics

    Science.gov (United States)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  8. Machine learning spatial geometry from entanglement features

    Science.gov (United States)

    You, Yi-Zhuang; Yang, Zhao; Qi, Xiao-Liang

    2018-02-01

    Motivated by the close relations of the renormalization group with both the holography duality and the deep learning, we propose that the holographic geometry can emerge from deep learning the entanglement feature of a quantum many-body state. We develop a concrete algorithm, call the entanglement feature learning (EFL), based on the random tensor network (RTN) model for the tensor network holography. We show that each RTN can be mapped to a Boltzmann machine, trained by the entanglement entropies over all subregions of a given quantum many-body state. The goal is to construct the optimal RTN that best reproduce the entanglement feature. The RTN geometry can then be interpreted as the emergent holographic geometry. We demonstrate the EFL algorithm on a 1D free fermion system and observe the emergence of the hyperbolic geometry (AdS3 spatial geometry) as we tune the fermion system towards the gapless critical point (CFT2 point).

  9. Learning About Climate and Atmospheric Models Through Machine Learning

    Science.gov (United States)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  10. Using Machine Learning to Search for MSSM Higgs Bosons

    CERN Document Server

    Diesing, Rebecca

    2016-01-01

    This paper examines the performance of machine learning in the identification of Minimally Su- persymmetric Standard Model (MSSM) Higgs Bosons, and compares this performance to that of traditional cut strategies. Two boosted decision tree algorithms were tested, scikit-learn and XGBoost. These tests indicated that machine learning can perform significantly better than traditional cuts. However, since machine learning in this form cannot be directly implemented in a real MSSM Higgs analysis, this performance information was instead used to better understand the relationships between training variables. Further studies might use this information to construct an improved cut strategy.

  11. The ATLAS Higgs machine learning challenge

    CERN Document Server

    Davey, W; The ATLAS collaboration; Rousseau, D; Cowan, G; Kegl, B; Germain-Renaud, C; Guyon, I

    2014-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 90's with Artificial Neural Net for example, more recently with Boosted Decision Trees, Random Forest etc... Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, Data Scientists are developing new Machine Learning algorithms to extract sense from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, data scientists have advanced algorithms: the goal of the HiggsML project is to bring the two together by a “challenge”: participants from all over the world and any scientific background can compete online ( https://www.kaggle.com/c/higgs-boson ) to obtain the best Higgs to tau tau signal significance on a set of ATLAS full simulated Monte Carlo signal and background. Winners with the best scores will receive money prizes ; authors of the best method (most usable) will be invited t...

  12. Machine Learning applications in CMS

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine Learning is used in many aspects of CMS data taking, monitoring, processing and analysis. We review a few of these use cases and the most recent developments, with an outlook to future applications in the LHC Run III and for the High-Luminosity phase.

  13. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  14. Visible Machine Learning for Biomedicine.

    Science.gov (United States)

    Yu, Michael K; Ma, Jianzhu; Fisher, Jasmin; Kreisberg, Jason F; Raphael, Benjamin J; Ideker, Trey

    2018-06-14

    A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology. Copyright © 2018. Published by Elsevier Inc.

  15. Machine Learning Techniques for Optical Performance Monitoring from Directly Detected PDM-QAM Signals

    DEFF Research Database (Denmark)

    Thrane, Jakob; Wass, Jesper; Piels, Molly

    2017-01-01

    Linear signal processing algorithms are effective in dealing with linear transmission channel and linear signal detection, while the nonlinear signal processing algorithms, from the machine learning community, are effective in dealing with nonlinear transmission channel and nonlinear signal...... detection. In this paper, a brief overview of the various machine learning methods and their application in optical communication is presented and discussed. Moreover, supervised machine learning methods, such as neural networks and support vector machine, are experimentally demonstrated for in-band optical...

  16. Optimizing placements of ground-based snow sensors for areal snow cover estimation using a machine-learning algorithm and melt-season snow-LiDAR data

    Science.gov (United States)

    Oroza, C.; Zheng, Z.; Glaser, S. D.; Bales, R. C.; Conklin, M. H.

    2016-12-01

    We present a structured, analytical approach to optimize ground-sensor placements based on time-series remotely sensed (LiDAR) data and machine-learning algorithms. We focused on catchments within the Merced and Tuolumne river basins, covered by the JPL Airborne Snow Observatory LiDAR program. First, we used a Gaussian mixture model to identify representative sensor locations in the space of independent variables for each catchment. Multiple independent variables that govern the distribution of snow depth were used, including elevation, slope, and aspect. Second, we used a Gaussian process to estimate the areal distribution of snow depth from the initial set of measurements. This is a covariance-based model that also estimates the areal distribution of model uncertainty based on the independent variable weights and autocorrelation. The uncertainty raster was used to strategically add sensors to minimize model uncertainty. We assessed the temporal accuracy of the method using LiDAR-derived snow-depth rasters collected in water-year 2014. In each area, optimal sensor placements were determined using the first available snow raster for the year. The accuracy in the remaining LiDAR surveys was compared to 100 configurations of sensors selected at random. We found the accuracy of the model from the proposed placements to be higher and more consistent in each remaining survey than the average random configuration. We found that a relatively small number of sensors can be used to accurately reproduce the spatial patterns of snow depth across the basins, when placed using spatial snow data. Our approach also simplifies sensor placement. At present, field surveys are required to identify representative locations for such networks, a process that is labor intensive and provides limited guarantees on the networks' representation of catchment independent variables.

  17. Prediction of compounds activity in nuclear receptor signaling and stress pathway assays using machine learning algorithms and low dimensional molecular descriptors

    Directory of Open Access Journals (Sweden)

    Filip eStefaniak

    2015-12-01

    Full Text Available Toxicity evaluation of newly synthesized or used compounds is one of the main challenges during product development in many areas of industry. For example, toxicity is the second reason - after lack of efficacy - for failure in preclinical and clinical studies of drug candidates. To avoid attrition at the late stage of the drug development process, the toxicity analyses are employed at the early stages of a discovery pipeline, along with activity and selectivity enhancing. Although many assays for screening in vitro toxicity are available, their massive application is not always time and cost effective. Thus the need for fast and reliable in silico tools, which can be used not only for toxicity prediction of existing compounds, but also for prioritization of compounds planned for synthesis or acquisition. Here I present the benchmark results of the combination of various attribute selection methods and machine learning algorithms and their application to the data sets of the Tox21 Data Challenge. The best performing method: Best First for attribute selection with the Rotation Forest/ADTree classifier offers good accuracy for most tested cases. For 11 out of 12 targets, the AUROC value for the final evaluation set was ≥0.72, while for three targets the AUROC value was ≥ 0.80, with the average AUROC being 0.784±0.069. The use of two-dimensional descriptors sets enables fast screening and compound prioritization even for a very large database. Open source tools used in this project make the presented approach widely available and encourage the community to further improve the presented scheme.

  18. Comparison between extreme learning machine and wavelet neural networks in data classification

    Science.gov (United States)

    Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri

    2017-03-01

    Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.

  19. Machine learning for micro-tomography

    Science.gov (United States)

    Parkinson, Dilworth Y.; Pelt, Daniël. M.; Perciano, Talita; Ushizima, Daniela; Krishnan, Harinarayan; Barnard, Harold S.; MacDowell, Alastair A.; Sethian, James

    2017-09-01

    Machine learning has revolutionized a number of fields, but many micro-tomography users have never used it for their work. The micro-tomography beamline at the Advanced Light Source (ALS), in collaboration with the Center for Applied Mathematics for Energy Research Applications (CAMERA) at Lawrence Berkeley National Laboratory, has now deployed a series of tools to automate data processing for ALS users using machine learning. This includes new reconstruction algorithms, feature extraction tools, and image classification and recommen- dation systems for scientific image. Some of these tools are either in automated pipelines that operate on data as it is collected or as stand-alone software. Others are deployed on computing resources at Berkeley Lab-from workstations to supercomputers-and made accessible to users through either scripting or easy-to-use graphical interfaces. This paper presents a progress report on this work.

  20. Automated training for algorithms that learn from genomic data.

    Science.gov (United States)

    Cilingir, Gokcen; Broschat, Shira L

    2015-01-01

    Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable.

  1. Effect of Co-segregating Markers on High-Density Genetic Maps and Prediction of Map Expansion Using Machine Learning Algorithms.

    Science.gov (United States)

    N'Diaye, Amidou; Haile, Jemanesh K; Fowler, D Brian; Ammar, Karim; Pozniak, Curtis J

    2017-01-01

    Advances in sequencing and genotyping methods have enable cost-effective production of high throughput single nucleotide polymorphism (SNP) markers, making them the choice for linkage mapping. As a result, many laboratories have developed high-throughput SNP assays and built high-density genetic maps. However, the number of markers may, by orders of magnitude, exceed the resolution of recombination for a given population size so that only a minority of markers can accurately be ordered. Another issue attached to the so-called 'large p, small n' problem is that high-density genetic maps inevitably result in many markers clustering at the same position (co-segregating markers). While there are a number of related papers, none have addressed the impact of co-segregating markers on genetic maps. In the present study, we investigated the effects of co-segregating markers on high-density genetic map length and marker order using empirical data from two populations of wheat, Mohawk × Cocorit (durum wheat) and Norstar × Cappelle Desprez (bread wheat). The maps of both populations consisted of 85% co-segregating markers. Our study clearly showed that excess of co-segregating markers can lead to map expansion, but has little effect on markers order. To estimate the inflation factor (IF), we generated a total of 24,473 linkage maps (8,203 maps for Mohawk × Cocorit and 16,270 maps for Norstar × Cappelle Desprez). Using seven machine learning algorithms, we were able to predict with an accuracy of 0.7 the map expansion due to the proportion of co-segregating markers. For example in Mohawk × Cocorit, with 10 and 80% co-segregating markers the length of the map inflated by 4.5 and 16.6%, respectively. Similarly, the map of Norstar × Cappelle Desprez expanded by 3.8 and 11.7% with 10 and 80% co-segregating markers. With the increasing number of markers on SNP-chips, the proportion of co-segregating markers in high-density maps will continue to increase making map expansion

  2. Effect of Co-segregating Markers on High-Density Genetic Maps and Prediction of Map Expansion Using Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Amidou N’Diaye

    2017-08-01

    Full Text Available Advances in sequencing and genotyping methods have enable cost-effective production of high throughput single nucleotide polymorphism (SNP markers, making them the choice for linkage mapping. As a result, many laboratories have developed high-throughput SNP assays and built high-density genetic maps. However, the number of markers may, by orders of magnitude, exceed the resolution of recombination for a given population size so that only a minority of markers can accurately be ordered. Another issue attached to the so-called ‘large p, small n’ problem is that high-density genetic maps inevitably result in many markers clustering at the same position (co-segregating markers. While there are a number of related papers, none have addressed the impact of co-segregating markers on genetic maps. In the present study, we investigated the effects of co-segregating markers on high-density genetic map length and marker order using empirical data from two populations of wheat, Mohawk × Cocorit (durum wheat and Norstar × Cappelle Desprez (bread wheat. The maps of both populations consisted of 85% co-segregating markers. Our study clearly showed that excess of co-segregating markers can lead to map expansion, but has little effect on markers order. To estimate the inflation factor (IF, we generated a total of 24,473 linkage maps (8,203 maps for Mohawk × Cocorit and 16,270 maps for Norstar × Cappelle Desprez. Using seven machine learning algorithms, we were able to predict with an accuracy of 0.7 the map expansion due to the proportion of co-segregating markers. For example in Mohawk × Cocorit, with 10 and 80% co-segregating markers the length of the map inflated by 4.5 and 16.6%, respectively. Similarly, the map of Norstar × Cappelle Desprez expanded by 3.8 and 11.7% with 10 and 80% co-segregating markers. With the increasing number of markers on SNP-chips, the proportion of co-segregating markers in high-density maps will continue to increase

  3. Automated quantification of cerebral edema following hemispheric infarction: Application of a machine-learning algorithm to evaluate CSF shifts on serial head CTs

    Directory of Open Access Journals (Sweden)

    Yasheng Chen

    2016-01-01

    Full Text Available Although cerebral edema is a major cause of death and deterioration following hemispheric stroke, there remains no validated biomarker that captures the full spectrum of this critical complication. We recently demonstrated that reduction in intracranial cerebrospinal fluid (CSF volume (∆CSF on serial computed tomography (CT scans provides an accurate measure of cerebral edema severity, which may aid in early triaging of stroke patients for craniectomy. However, application of such a volumetric approach would be too cumbersome to perform manually on serial scans in a real-world setting. We developed and validated an automated technique for CSF segmentation via integration of random forest (RF based machine learning with geodesic active contour (GAC segmentation. The proposed RF + GAC approach was compared to conventional Hounsfield Unit (HU thresholding and RF segmentation methods using Dice similarity coefficient (DSC and the correlation of volumetric measurements, with manual delineation serving as the ground truth. CSF spaces were outlined on scans performed at baseline (<6 h after stroke onset and early follow-up (FU (closest to 24 h in 38 acute ischemic stroke patients. RF performed significantly better than optimized HU thresholding (p < 10−4 in baseline and p < 10−5 in FU and RF + GAC performed significantly better than RF (p < 10−3 in baseline and p < 10−5 in FU. Pearson correlation coefficients between the automatically detected ∆CSF and the ground truth were r = 0.178 (p = 0.285, r = 0.876 (p < 10−6 and r = 0.879 (p < 10−6 for thresholding, RF and RF + GAC, respectively, with a slope closer to the line of identity in RF + GAC. When we applied the algorithm trained from images of one stroke center to segment CTs from another center, similar findings held. In conclusion, we have developed and validated an accurate automated approach to segment CSF and calculate its shifts on serial CT scans

  4. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    International Nuclear Information System (INIS)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.; McEwen, Jason D.

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  5. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    Energy Technology Data Exchange (ETDEWEB)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K. [Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT (United Kingdom); McEwen, Jason D., E-mail: dr.michelle.lochner@gmail.com [Mullard Space Science Laboratory, University College London, Surrey RH5 6NT (United Kingdom)

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  6. Source localization in an ocean waveguide using supervised machine learning.

    Science.gov (United States)

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  7. Ensemble Machine Learning Methods and Applications

    CERN Document Server

    Ma, Yunqian

    2012-01-01

    It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed “ensemble learning” by researchers in computational intelligence and machine learning, it is known to improve a decision system’s robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as “boosting” and “random forest” facilitate solutions to key computational issues such as face detection and are now being applied in areas as diverse as object trackingand bioinformatics.   Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs. At once a solid theoretical study and a practical guide, the volume is a windfall for r...

  8. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  9. Machine protection system algorithm compiler and simulator

    International Nuclear Information System (INIS)

    White, G.R.; Sherwin, G.

    1993-01-01

    The Machine Protection System (MPS) component of the SLC's beam selection system, in which integrated current is continuously monitored and limited to safe levels through careful selection and feedback of the beam repetition rate, is described elsewhere in these proceedings. The novel decision making mechanism by which that system can evaluate open-quotes safe levelsclose quotes, and choose an appropriate repetition rate in real-time, is described here. The algorithm that this mechanism uses to make its decision is written in test files and expressed in states of the accelerator and its devices, one file per accelerator region. Before being used, a file is open-quotes compiledclose quotes to a binary format which can be easily processed as a forward-chaining decision tree. It is processed by distributed microcomputers local to the accelerator regions. A parent algorithm evaluates all results, and reports directly to the beam control microprocessor. Operators can test new algorithms, or changes they make to them, with an online graphical MPS simulator

  10. Status Checking System of Home Appliances using machine learning

    Directory of Open Access Journals (Sweden)

    Yoon Chi-Yurl

    2017-01-01

    Full Text Available This paper describes status checking system of home appliances based on machine learning, which can be applied to existing household appliances without networking function. Designed status checking system consists of sensor modules, a wireless communication module, cloud server, android application and a machine learning algorithm. The developed system applied to washing machine analyses and judges the four-kinds of appliance’s status such as staying, washing, rinsing and spin-drying. The measurements of sensor and transmission of sensing data are operated on an Arduino board and the data are transmitted to cloud server in real time. The collected data are parsed by an Android application and injected into the machine learning algorithm for learning the status of the appliances. The machine learning algorithm compares the stored learning data with collected real-time data from the appliances. Our results are expected to contribute as a base technology to design an automatic control system based on machine learning technology for household appliances in real-time.

  11. Kernel Methods for Machine Learning with Life Science Applications

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie

    Kernel methods refer to a family of widely used nonlinear algorithms for machine learning tasks like classification, regression, and feature extraction. By exploiting the so-called kernel trick straightforward extensions of classical linear algorithms are enabled as long as the data only appear a...

  12. Machine learning and medicine: book review and commentary.

    Science.gov (United States)

    Koprowski, Robert; Foster, Kenneth R

    2018-02-01

    This article is a review of the book "Master machine learning algorithms, discover how they work and implement them from scratch" (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com . An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book.

  13. Predicting breast screening attendance using machine learning techniques.

    Science.gov (United States)

    Baskaran, Vikraman; Guergachi, Aziz; Bali, Rajeev K; Naguib, Raouf N G

    2011-03-01

    Machine learning-based prediction has been effectively applied for many healthcare applications. Predicting breast screening attendance using machine learning (prior to the actual mammogram) is a new field. This paper presents new predictor attributes for such an algorithm. It describes a new hybrid algorithm that relies on back-propagation and radial basis function-based neural networks for prediction. The algorithm has been developed in an open source-based environment. The algorithm was tested on a 13-year dataset (1995-2008). This paper compares the algorithm and validates its accuracy and efficiency with different platforms. Nearly 80% accuracy and 88% positive predictive value and sensitivity were recorded for the algorithm. The results were encouraging; 40-50% of negative predictive value and specificity warrant further work. Preliminary results were promising and provided ample amount of reasons for testing the algorithm on a larger scale.

  14. A Semisupervised Support Vector Machines Algorithm for BCI Systems

    Science.gov (United States)

    Qin, Jianzhao; Li, Yuanqing; Sun, Wei

    2007-01-01

    As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141

  15. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  16. Modelling tick abundance using machine learning techniques and satellite imagery

    DEFF Research Database (Denmark)

    Kjær, Lene Jung; Korslund, L.; Kjelland, V.

    satellite images to run Boosted Regression Tree machine learning algorithms to predict overall distribution (presence/absence of ticks) and relative tick abundance of nymphs and larvae in southern Scandinavia. For nymphs, the predicted abundance had a positive correlation with observed abundance...... the predicted distribution of larvae was mostly even throughout Denmark, it was primarily around the coastlines in Norway and Sweden. Abundance was fairly low overall except in some fragmented patches corresponding to forested habitats in the region. Machine learning techniques allow us to predict for larger...... the collected ticks for pathogens and using the same machine learning techniques to develop prevalence maps of the ScandTick region....

  17. Proceedings of the IEEE Machine Learning for Signal Processing XVII

    DEFF Research Database (Denmark)

    The seventeenth of a series of workshops sponsored by the IEEE Signal Processing Society and organized by the Machine Learning for Signal Processing Technical Committee (MLSP-TC). The field of machine learning has matured considerably in both methodology and real-world application domains and has...... become particularly important for solution of problems in signal processing. As reflected in this collection, machine learning for signal processing combines many ideas from adaptive signal/image processing, learning theory and models, and statistics in order to solve complex real-world signal processing......, and two papers from the winners of the Data Analysis Competition. The program included papers in the following areas: genomic signal processing, pattern recognition and classification, image and video processing, blind signal processing, models, learning algorithms, and applications of machine learning...

  18. Learning Machines Implemented on Non-Deterministic Hardware

    OpenAIRE

    Gupta, Suyog; Sindhwani, Vikas; Gopalakrishnan, Kailash

    2014-01-01

    This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most part -- oblivious to the details of the underlying hardware-level implementations. The hardware/software co-design methodology advocated here hinges on the deployment of compute-intensive machine learning kernels onto compute platforms that trade-off deter...

  19. Global Bathymetry: Machine Learning for Data Editing

    Science.gov (United States)

    Sandwell, D. T.; Tea, B.; Freund, Y.

    2017-12-01

    The accuracy of global bathymetry depends primarily on the coverage and accuracy of the sounding data and secondarily on the depth predicted from gravity. A main focus of our research is to add newly-available data to the global compilation. Most data sources have 1-12% of erroneous soundings caused by a wide array of blunders and measurement errors. Over the years we have hand-edited this data using undergraduate employees at UCSD (440 million soundings at 500 m resolution). We are developing a machine learning approach to refine the flagging of the older soundings and provide automated editing of newly-acquired soundings. The approach has three main steps: 1) Combine the sounding data with additional information that may inform the machine learning algorithm. The additional parameters include: depth predicted from gravity; distance to the nearest sounding from other cruises; seafloor age; spreading rate; sediment thickness; and vertical gravity gradient. 2) Use available edit decisions as training data sets for a boosted tree algorithm with a binary logistic objective function and L2 regularization. Initial results with poor quality single beam soundings show that the automated algorithm matches the hand-edited data 89% of the time. The results show that most of the information for detecting outliers comes from predicted depth with secondary contributions from distance to the nearest sounding and longitude. A similar analysis using very high quality multibeam data shows that the automated algorithm matches the hand-edited data 93% of the time. Again, most of the information for detecting outliers comes from predicted depth secondary contributions from distance to the nearest sounding and longitude. 3) The third step in the process is to use the machine learning parameters, derived from the training data, to edit 12 million newly acquired single beam sounding data provided by the National Geospatial-Intelligence Agency. The output of the learning algorithm will be

  20. Challenges in the Verification of Reinforcement Learning Algorithms

    Science.gov (United States)

    Van Wesel, Perry; Goodloe, Alwyn E.

    2017-01-01

    Machine learning (ML) is increasingly being applied to a wide array of domains from search engines to autonomous vehicles. These algorithms, however, are notoriously complex and hard to verify. This work looks at the assumptions underlying machine learning algorithms as well as some of the challenges in trying to verify ML algorithms. Furthermore, we focus on the specific challenges of verifying reinforcement learning algorithms. These are highlighted using a specific example. Ultimately, we do not offer a solution to the complex problem of ML verification, but point out possible approaches for verification and interesting research opportunities.

  1. Galaxy Classification using Machine Learning

    Science.gov (United States)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  2. Machine Learning Methods to Predict Diabetes Complications.

    Science.gov (United States)

    Dagliati, Arianna; Marini, Simone; Sacchi, Lucia; Cogni, Giulia; Teliti, Marsida; Tibollo, Valentina; De Cata, Pasquale; Chiovato, Luca; Bellazzi, Riccardo

    2018-03-01

    One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical practice.

  3. Cascade Error Projection Learning Algorithm

    Science.gov (United States)

    Duong, T. A.; Stubberud, A. R.; Daud, T.

    1995-01-01

    A detailed mathematical analysis is presented for a new learning algorithm termed cascade error projection (CEP) and a general learning frame work. This frame work can be used to obtain the cascade correlation learning algorithm by choosing a particular set of parameters.

  4. Quantum machine learning: a classical perspective

    Science.gov (United States)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed. PMID:29434508

  5. Machine Learning Algorithms for the Forecasting of  Wastewater Quality Indicators

    Directory of Open Access Journals (Sweden)

    Francesco Granata

    2017-02-01

    Full Text Available Stormwater runoff is often contaminated by human activities. Stormwater discharge into  water bodies significantly contributes to environmental pollution. The choice of suitable treatment  technologies is dependent on the pollutant concentrations. Wastewater quality indicators such as  biochemical oxygen demand (BOD5, chemical oxygen demand (COD, total suspended solids (TSS,  and total dissolved solids (TDS give a measure of the main pollutants. The aim of this study is to  provide an indirect methodology for the estimation of the main wastewater quality indicators, based  on some characteristics of the drainage basin. The catchment is seen as a black box: the physical  processes of accumulation, washing, and transport of pollutants are not mathematically described.  Two models deriving from studies on artificial intelligence have been used in this research: Support  Vector Regression (SVR and Regression Trees (RT. Both the models showed robustness, reliability,  and high generalization capability. However, with reference to coefficient of determination R2 and  root‐mean square error, Support Vector Regression showed a better performance than Regression  Tree in predicting TSS, TDS, and COD. As regards BOD5, the two models showed a comparable  performance. Therefore, the considered machine learning algorithms may be useful for providing  an estimation of the values to be considered for the sizing of the treatment units in absence of direct  measures.

  6. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM...

  7. MoleculeNet: a benchmark for molecular machine learning.

    Science.gov (United States)

    Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay

    2018-01-14

    Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

  8. Assessment of various supervised learning algorithms using different performance metrics

    Science.gov (United States)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  9. Towards Machine Learning of Motor Skills

    Science.gov (United States)

    Peters, Jan; Schaal, Stefan; Schölkopf, Bernhard

    Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

  10. Risk assessment of atmospheric emissions using machine learning

    OpenAIRE

    Cervone, G.; Franzese, P.; Ezber, Y.; Boybeyi, Z.

    2008-01-01

    Supervised and unsupervised machine learning algorithms are used to perform statistical and logical analysis of several transport and dispersion model runs which simulate emissions from a fixed source under different atmospheric conditions.

    First, a clustering algorithm is used to automatically group the results of different transport and dispersion simulations according to specific cloud characteristics. Then, a symbolic classification algorithm is employed to find compl...

  11. Stochastic subset selection for learning with kernel machines.

    Science.gov (United States)

    Rhinelander, Jason; Liu, Xiaoping P

    2012-06-01

    Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.

  12. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  13. Distributed Extreme Learning Machine for Nonlinear Learning over Network

    Directory of Open Access Journals (Sweden)

    Songyan Huang

    2015-02-01

    Full Text Available Distributed data collection and analysis over a network are ubiquitous, especially over a wireless sensor network (WSN. To our knowledge, the data model used in most of the distributed algorithms is linear. However, in real applications, the linearity of systems is not always guaranteed. In nonlinear cases, the single hidden layer feedforward neural network (SLFN with radial basis function (RBF hidden neurons has the ability to approximate any continuous functions and, thus, may be used as the nonlinear learning system. However, confined by the communication cost, using the distributed version of the conventional algorithms to train the neural network directly is usually prohibited. Fortunately, based on the theorems provided in the extreme learning machine (ELM literature, we only need to compute the output weights of the SLFN. Computing the output weights itself is a linear learning problem, although the input-output mapping of the overall SLFN is still nonlinear. Using the distributed algorithmto cooperatively compute the output weights of the SLFN, we obtain a distributed extreme learning machine (dELM for nonlinear learning in this paper. This dELM is applied to the regression problem and classification problem to demonstrate its effectiveness and advantages.

  14. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  15. A Review of Related Work on Machine Learning in Semiconductor Manufacturing and Assembly Lines

    OpenAIRE

    Stanisavljevic, Darko; Spitzer, Michael

    2017-01-01

    This paper deals with applications of machine learning algorithms in manufacturing. Machine learning can be defined as a field of computer science that gives computers the ability to learn without explicitly developing the needed algorithms. Manufacturing is the production of merchandise by manual labour, machines and tools. The focus of this paper is on automatic production lines. The areas of interest of this paper are semiconductor manufacturing and production on assembly lines. The purpos...

  16. Predicting the concentration of residual methanol in industrial formalin using machine learning

    OpenAIRE

    Heidkamp, William

    2016-01-01

    In this thesis, a machine learning approach was used to develop a predictive model for residual methanol concentration in industrial formalin produced at the Akzo Nobel factory in Kristinehamn, Sweden. The MATLABTM computational environment supplemented with the Statistics and Machine LearningTM toolbox from the MathWorks were used to test various machine learning algorithms on the formalin production data from Akzo Nobel. As a result, the Gaussian Process Regression algorithm was found to pr...

  17. Pileup Mitigation with Machine Learning (PUMML)

    Science.gov (United States)

    Komiske, Patrick T.; Metodiev, Eric M.; Nachman, Benjamin; Schwartz, Matthew D.

    2017-12-01

    Pileup involves the contamination of the energy distribution arising from the primary collision of interest (leading vertex) by radiation from soft collisions (pileup). We develop a new technique for removing this contamination using machine learning and convolutional neural networks. The network takes as input the energy distribution of charged leading vertex particles, charged pileup particles, and all neutral particles and outputs the energy distribution of particles coming from leading vertex alone. The PUMML algorithm performs remarkably well at eliminating pileup distortion on a wide range of simple and complex jet observables. We test the robustness of the algorithm in a number of ways and discuss how the network can be trained directly on data.

  18. Using Machine Learning to Predict MCNP Bias

    Energy Technology Data Exchange (ETDEWEB)

    Grechanuk, Pavel Aleksandrovi [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2018-01-09

    For many real-world applications in radiation transport where simulations are compared to experimental measurements, like in nuclear criticality safety, the bias (simulated - experimental keff) in the calculation is an extremely important quantity used for code validation. The objective of this project is to accurately predict the bias of MCNP6 [1] criticality calculations using machine learning (ML) algorithms, with the intention of creating a tool that can complement the current nuclear criticality safety methods. In the latest release of MCNP6, the Whisper tool is available for criticality safety analysts and includes a large catalogue of experimental benchmarks, sensitivity profiles, and nuclear data covariance matrices. This data, coming from 1100+ benchmark cases, is used in this study of ML algorithms for criticality safety bias predictions.

  19. Machine Learning and Quantum Mechanics

    Science.gov (United States)

    Chapline, George

    The author has previously pointed out some similarities between selforganizing neural networks and quantum mechanics. These types of neural networks were originally conceived of as away of emulating the cognitive capabilities of the human brain. Recently extensions of these networks, collectively referred to as deep learning networks, have strengthened the connection between self-organizing neural networks and human cognitive capabilities. In this note we consider whether hardware quantum devices might be useful for emulating neural networks with human-like cognitive capabilities, or alternatively whether implementations of deep learning neural networks using conventional computers might lead to better algorithms for solving the many body Schrodinger equation.

  20. MLBCD: a machine learning tool for big clinical data.

    Science.gov (United States)

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  1. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    Science.gov (United States)

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  2. A machine learning model with human cognitive biases capable of learning from small and biased datasets.

    Science.gov (United States)

    Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro

    2018-05-09

    Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.

  3. Archetypal Analysis for Machine Learning

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai

    2010-01-01

    Archetypal analysis (AA) proposed by Cutler and Breiman in [1] estimates the principal convex hull of a data set. As such AA favors features that constitute representative ’corners’ of the data, i.e. distinct aspects or archetypes. We will show that AA enjoys the interpretability of clustering - ...... for K-means [2]. We demonstrate that the AA model is relevant for feature extraction and dimensional reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, text mining and collaborative filtering....

  4. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  5. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  6. Trends in Machine Learning for Signal Processing

    DEFF Research Database (Denmark)

    Adali, Tulay; Miller, David J.; Diamantaras, Konstantinos I.

    2011-01-01

    By putting the accent on learning from the data and the environment, the Machine Learning for SP (MLSP) Technical Committee (TC) provides the essential bridge between the machine learning and SP communities. While the emphasis in MLSP is on learning and data-driven approaches, SP defines the main...... applications of interest, and thus the constraints and requirements on solutions, which include computational efficiency, online adaptation, and learning with limited supervision/reference data....

  7. Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    There is a deep connection between machine learning and jet physics - after all, jets are defined by unsupervised learning algorithms. Jet physics has been a driving force for studying modern machine learning in high energy physics. Domain specific challenges require new techniques to make full use of the algorithms. A key focus is on understanding how and what the algorithms learn. Modern machine learning techniques for jet physics are demonstrated for classification, regression, and generation. In addition to providing powerful baseline performance, we show how to train complex models directly on data and to generate sparse stacked images with non-uniform granularity.

  8. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  9. Machine Learning in Medical Imaging.

    Science.gov (United States)

    Giger, Maryellen L

    2018-03-01

    Advances in both imaging and computers have synergistically led to a rapid rise in the potential use of artificial intelligence in various radiological imaging tasks, such as risk assessment, detection, diagnosis, prognosis, and therapy response, as well as in multi-omics disease discovery. A brief overview of the field is given here, allowing the reader to recognize the terminology, the various subfields, and components of machine learning, as well as the clinical potential. Radiomics, an expansion of computer-aided diagnosis, has been defined as the conversion of images to minable data. The ultimate benefit of quantitative radiomics is to (1) yield predictive image-based phenotypes of disease for precision medicine or (2) yield quantitative image-based phenotypes for data mining with other -omics for discovery (ie, imaging genomics). For deep learning in radiology to succeed, note that well-annotated large data sets are needed since deep networks are complex, computer software and hardware are evolving constantly, and subtle differences in disease states are more difficult to perceive than differences in everyday objects. In the future, machine learning in radiology is expected to have a substantial clinical impact with imaging examinations being routinely obtained in clinical practice, providing an opportunity to improve decision support in medical image interpretation. The term of note is decision support, indicating that computers will augment human decision making, making it more effective and efficient. The clinical impact of having computers in the routine clinical practice may allow radiologists to further integrate their knowledge with their clinical colleagues in other medical specialties and allow for precision medicine. Copyright © 2018. Published by Elsevier Inc.

  10. Gradient Evolution-based Support Vector Machine Algorithm for Classification

    Science.gov (United States)

    Zulvia, Ferani E.; Kuo, R. J.

    2018-03-01

    This paper proposes a classification algorithm based on a support vector machine (SVM) and gradient evolution (GE) algorithms. SVM algorithm has been widely used in classification. However, its result is significantly influenced by the parameters. Therefore, this paper aims to propose an improvement of SVM algorithm which can find the best SVMs’ parameters automatically. The proposed algorithm employs a GE algorithm to automatically determine the SVMs’ parameters. The GE algorithm takes a role as a global optimizer in finding the best parameter which will be used by SVM algorithm. The proposed GE-SVM algorithm is verified using some benchmark datasets and compared with other metaheuristic-based SVM algorithms. The experimental results show that the proposed GE-SVM algorithm obtains better results than other algorithms tested in this paper.

  11. Machine learning of network metrics in ATLAS Distributed Data Management

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00218873; The ATLAS collaboration; Toler, Wesley; Vamosi, Ralf; Bogado Garcia, Joaquin Ignacio

    2017-01-01

    The increasing volume of physics data poses a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from one of our ongoing automation efforts that focuses on network metrics. First, we describe our machine learning framework built atop the ATLAS Analytics Platform. This framework can automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our m...

  12. Machine learning of network metrics in ATLAS Distributed Data Management

    Science.gov (United States)

    Lassnig, Mario; Toler, Wesley; Vamosi, Ralf; Bogado, Joaquin; ATLAS Collaboration

    2017-10-01

    The increasing volume of physics data poses a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from one of our ongoing automation efforts that focuses on network metrics. First, we describe our machine learning framework built atop the ATLAS Analytics Platform. This framework can automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for networkaware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  13. Evaluating the diagnostic utility of applying a machine learning algorithm to diffusion tensor MRI measures in individuals with major depressive disorder.

    Science.gov (United States)

    Schnyer, David M; Clasen, Peter C; Gonzalez, Christopher; Beevers, Christopher G

    2017-06-30

    Using MRI to diagnose mental disorders has been a long-term goal. Despite this, the vast majority of prior neuroimaging work has been descriptive rather than predictive. The current study applies support vector machine (SVM) learning to MRI measures of brain white matter to classify adults with Major Depressive Disorder (MDD) and healthy controls. In a precisely matched group of individuals with MDD (n =25) and healthy controls (n =25), SVM learning accurately (74%) classified patients and controls across a brain map of white matter fractional anisotropy values (FA). The study revealed three main findings: 1) SVM applied to DTI derived FA maps can accurately classify MDD vs. healthy controls; 2) prediction is strongest when only right hemisphere white matter is examined; and 3) removing FA values from a region identified by univariate contrast as significantly different between MDD and healthy controls does not change the SVM accuracy. These results indicate that SVM learning applied to neuroimaging data can classify the presence versus absence of MDD and that predictive information is distributed across brain networks rather than being highly localized. Finally, MDD group differences revealed through typical univariate contrasts do not necessarily reveal patterns that provide accurate predictive information. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  14. Robust Matching Pursuit Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Zejian Yuan

    2018-01-01

    Full Text Available Extreme learning machine (ELM is a popular learning algorithm for single hidden layer feedforward networks (SLFNs. It was originally proposed with the inspiration from biological learning and has attracted massive attentions due to its adaptability to various tasks with a fast learning ability and efficient computation cost. As an effective sparse representation method, orthogonal matching pursuit (OMP method can be embedded into ELM to overcome the singularity problem and improve the stability. Usually OMP recovers a sparse vector by minimizing a least squares (LS loss, which is efficient for Gaussian distributed data, but may suffer performance deterioration in presence of non-Gaussian data. To address this problem, a robust matching pursuit method based on a novel kernel risk-sensitive loss (in short KRSLMP is first proposed in this paper. The KRSLMP is then applied to ELM to solve the sparse output weight vector, and the new method named the KRSLMP-ELM is developed for SLFN learning. Experimental results on synthetic and real-world data sets confirm the effectiveness and superiority of the proposed method.

  15. Diagnosis of major depressive disorder by combining multimodal information from heart rate dynamics and serum proteomics using machine-learning algorithm.

    Science.gov (United States)

    Kim, Eun Young; Lee, Min Young; Kim, Se Hyun; Ha, Kyooseob; Kim, Kwang Pyo; Ahn, Yong Min

    2017-06-02

    Major depressive disorder (MDD) is a systemic and multifactorial disorder that involves abnormalities in multiple biochemical pathways and the autonomic nervous system. This study applied a machine-learning method to classify MDD and control groups by incorporating data from serum proteomic analysis and heart rate variability (HRV) analysis for the identification of novel peripheral biomarkers. The study subjects consisted of 25 drug-free female MDD patients and 25 age- and sex-matched healthy controls. First, quantitative serum proteome profiles were analyzed by liquid chromatography-tandem mass spectrometry using pooled serum samples from 10 patients and 10 controls. Next, candidate proteins were quantified with multiple reaction monitoring (MRM) in 50 subjects. We also analyzed 22 linear and nonlinear HRV parameters in 50 subjects. Finally, we identified a combined biomarker panel consisting of proteins and HRV indexes using a support vector machine with recursive feature elimination. A separation between MDD and control groups was achieved using five parameters (apolipoprotein B, group-specific component, ceruloplasmin, RMSSD, and SampEn) at 80.1% classification accuracy. A combination of HRV and proteomic data achieved better classification accuracy. A high classification accuracy can be achieved by combining multimodal information from heart rate dynamics and serum proteomics in MDD. Our approach can be helpful for accurate clinical diagnosis of MDD. Further studies using larger, independent cohorts are needed to verify the role of these candidate biomarkers for MDD diagnosis. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Machine Learning Interface for Medical Image Analysis.

    Science.gov (United States)

    Zhang, Yi C; Kagen, Alexander C

    2017-10-01

    TensorFlow is a second-generation open-source machine learning software library with a built-in framework for implementing neural networks in wide variety of perceptual tasks. Although TensorFlow usage is well established with computer vision datasets, the TensorFlow interface with DICOM formats for medical imaging remains to be established. Our goal is to extend the TensorFlow API to accept raw DICOM images as input; 1513 DaTscan DICOM images were obtained from the Parkinson's Progression Markers Initiative (PPMI) database. DICOM pixel intensities were extracted and shaped into tensors, or n-dimensional arrays, to populate the training, validation, and test input datasets for machine learning. A simple neural network was constructed in TensorFlow to classify images into normal or Parkinson's disease groups. Training was executed over 1000 iterations for each cross-validation set. The gradient descent optimization and Adagrad optimization algorithms were used to minimize cross-entropy between the predicted and ground-truth labels. Cross-validation was performed ten times to produce a mean accuracy of 0.938 ± 0.047 (95 % CI 0.908-0.967). The mean sensitivity was 0.974 ± 0.043 (95 % CI 0.947-1.00) and mean specificity was 0.822 ± 0.207 (95 % CI 0.694-0.950). We extended the TensorFlow API to enable DICOM compatibility in the context of DaTscan image analysis. We implemented a neural network classifier that produces diagnostic accuracies on par with excellent results from previous machine learning models. These results indicate the potential role of TensorFlow as a useful adjunct diagnostic tool in the clinical setting.

  17. Introduction to Machine Learning: Class Notes 67577

    OpenAIRE

    Shashua, Amnon

    2009-01-01

    Introduction to Machine learning covering Statistical Inference (Bayes, EM, ML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering), and PAC learning (the Formal model, VC dimension, Double Sampling theorem).

  18. Building machine learning systems with Python

    CERN Document Server

    Coelho, Luis Pedro

    2015-01-01

    This book primarily targets Python developers who want to learn and use Python's machine learning capabilities and gain valuable insights from data to develop effective solutions for business problems.

  19. Machine learning for adaptive many-core machines a practical approach

    CERN Document Server

    Lopes, Noel

    2015-01-01

    The overwhelming data produced everyday and the increasing performance and cost requirements of applications?are transversal to a wide range of activities in society, from science to industry. In particular, the magnitude and complexity of the tasks that Machine Learning (ML) algorithms have to solve are driving the need to devise adaptive many-core machines that scale well with the volume of data, or in other words, can handle Big Data.This book gives a concise view on how to extend the applicability of well-known ML algorithms in Graphics Processing Unit (GPU) with data scalability in mind.

  20. Machine Learning Methods for Attack Detection in the Smart Grid.

    Science.gov (United States)

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  1. Predicting Solar Activity Using Machine-Learning Methods

    Science.gov (United States)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  2. 2015 International Conference on Machine Learning and Signal Processing

    CERN Document Server

    Woo, Wai; Sulaiman, Hamzah; Othman, Mohd; Saat, Mohd

    2016-01-01

    This book presents important research findings and recent innovations in the field of machine learning and signal processing. A wide range of topics relating to machine learning and signal processing techniques and their applications are addressed in order to provide both researchers and practitioners with a valuable resource documenting the latest advances and trends. The book comprises a careful selection of the papers submitted to the 2015 International Conference on Machine Learning and Signal Processing (MALSIP 2015), which was held on 15–17 December 2015 in Ho Chi Minh City, Vietnam with the aim of offering researchers, academicians, and practitioners an ideal opportunity to disseminate their findings and achievements. All of the included contributions were chosen by expert peer reviewers from across the world on the basis of their interest to the community. In addition to presenting the latest in design, development, and research, the book provides access to numerous new algorithms for machine learni...

  3. A review of supervised machine learning applied to ageing research.

    Science.gov (United States)

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  4. Parameter optimization of electrochemical machining process using black hole algorithm

    Science.gov (United States)

    Singh, Dinesh; Shukla, Rajkamal

    2017-12-01

    Advanced machining processes are significant as higher accuracy in machined component is required in the manufacturing industries. Parameter optimization of machining processes gives optimum control to achieve the desired goals. In this paper, electrochemical machining (ECM) process is considered to evaluate the performance of the considered process using black hole algorithm (BHA). BHA considers the fundamental idea of a black hole theory and it has less operating parameters to tune. The two performance parameters, material removal rate (MRR) and overcut (OC) are considered separately to get optimum machining parameter settings using BHA. The variations of process parameters with respect to the performance parameters are reported for better and effective understanding of the considered process using single objective at a time. The results obtained using BHA are found better while compared with results of other metaheuristic algorithms, such as, genetic algorithm (GA), artificial bee colony (ABC) and bio-geography based optimization (BBO) attempted by previous researchers.

  5. Learning as a Machine: Crossovers between Humans and Machines

    Science.gov (United States)

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  6. Investigations of Calorimeter Clustering in ATLAS using Machine Learning

    CERN Document Server

    AUTHOR|(CDS)2153685

    The Large Hadron Collider (LHC) at CERN is designed to search for new physics by colliding protons with a center-of-mass energy of 13 TeV. The ATLAS detector is a multipurpose particle detector built to record these proton-proton collisions. In order to improve sensitivity to new physics at the LHC, luminosity increases are planned for 2018 and beyond. With this greater luminosity comes an increase in the number of simultaneous proton-proton collisions per bunch crossing (pile-up). This extra pile- up has adverse effects on algorithms for clustering the ATLAS detector's calorimeter cells. These adverse effects stem from overlapping energy deposits originating from distinct particles and could lead to diffculties in accurately reconstructing events. Machine learning algorithms provide a new tool that has potential to clustering per- formance. Recent developments in computer science have given rise to new set of machine learning algorithms that, in many circumstances, out-perform more conven- tional algorithms....

  7. Machine learning for network-based malware detection

    DEFF Research Database (Denmark)

    Stevanovic, Matija

    and based on different, mutually complementary, principles of traffic analysis. The proposed approaches rely on machine learning algorithms (MLAs) for automated and resource-efficient identification of the patterns of malicious network traffic. We evaluated the proposed methods through extensive evaluations...

  8. What is the machine learning.

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. In this talk, I explore a procedure for identifying combinations of variables -- aided by physical intuition -- that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. I will demonstrate these features in both an easy to understand toy model and an idealized LHC resonance scenario.

  9. Teaching machine learning to design students

    NARCIS (Netherlands)

    Vlist, van der B.J.J.; van de Westelaken, H.F.M.; Bartneck, C.; Hu, J.; Ahn, R.M.C.; Barakova, E.I.; Delbressine, F.L.M.; Feijs, L.M.G.; Pan, Z.; Zhang, X.; El Rhalibi, A.

    2008-01-01

    Machine learning is a key technology to design and create intelligent systems, products, and related services. Like many other design departments, we are faced with the challenge to teach machine learning to design students, who often do not have an inherent affinity towards technology. We

  10. Considerations upon the Machine Learning Technologies

    OpenAIRE

    Alin Munteanu; Cristina Ofelia Sofran

    2006-01-01

    Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  11. Considerations upon the Machine Learning Technologies

    Directory of Open Access Journals (Sweden)

    Alin Munteanu

    2006-01-01

    Full Text Available Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  12. Boosting Learning Algorithm for Stock Price Forecasting

    Science.gov (United States)

    Wang, Chengzhang; Bai, Xiaoming

    2018-03-01

    To tackle complexity and uncertainty of stock market behavior, more studies have introduced machine learning algorithms to forecast stock price. ANN (artificial neural network) is one of the most successful and promising applications. We propose a boosting-ANN model in this paper to predict the stock close price. On the basis of boosting theory, multiple weak predicting machines, i.e. ANNs, are assembled to build a stronger predictor, i.e. boosting-ANN model. New error criteria of the weak studying machine and rules of weights updating are adopted in this study. We select technical factors from financial markets as forecasting input variables. Final results demonstrate the boosting-ANN model works better than other ones for stock price forecasting.

  13. Optimal Placement Algorithms for Virtual Machines

    OpenAIRE

    Bellur, Umesh; Rao, Chetan S; SD, Madhu Kumar

    2010-01-01

    Cloud computing provides a computing platform for the users to meet their demands in an efficient, cost-effective way. Virtualization technologies are used in the clouds to aid the efficient usage of hardware. Virtual machines (VMs) are utilized to satisfy the user needs and are placed on physical machines (PMs) of the cloud for effective usage of hardware resources and electricity in the cloud. Optimizing the number of PMs used helps in cutting down the power consumption by a substantial amo...

  14. Advances in Machine Learning and Data Mining for Astronomy

    Science.gov (United States)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  15. Machine vision systems using machine learning for industrial product inspection

    Science.gov (United States)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  16. Adaptive Learning Systems: Beyond Teaching Machines

    Science.gov (United States)

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  17. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  18. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  19. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution.

    Science.gov (United States)

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis.

  20. QUASI-STELLAR OBJECT SELECTION ALGORITHM USING TIME VARIABILITY AND MACHINE LEARNING: SELECTION OF 1620 QUASI-STELLAR OBJECT CANDIDATES FROM MACHO LARGE MAGELLANIC CLOUD DATABASE

    International Nuclear Information System (INIS)

    Kim, Dae-Won; Protopapas, Pavlos; Alcock, Charles; Trichas, Markos; Byun, Yong-Ik; Khardon, Roni

    2011-01-01

    We present a new quasi-stellar object (QSO) selection algorithm using a Support Vector Machine, a supervised classification method, on a set of extracted time series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars, and microlensing events using 58 known QSOs, 1629 variable stars, and 4288 non-variables in the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ∼80% of known QSOs with a 25% false-positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) data set, which consists of 40 million light curves, and found 1620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false-positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.

  1. Assessment of Machine Learning Algorithms for Automatic Benthic Cover Monitoring and Mapping Using Towed Underwater Video Camera and High-Resolution Satellite Images

    Directory of Open Access Journals (Sweden)

    Hassan Mohamed

    2018-05-01

    Full Text Available Benthic habitat monitoring is essential for many applications involving biodiversity, marine resource management, and the estimation of variations over temporal and spatial scales. Nevertheless, both automatic and semi-automatic analytical methods for deriving ecologically significant information from towed camera images are still limited. This study proposes a methodology that enables a high-resolution towed camera with a Global Navigation Satellite System (GNSS to adaptively monitor and map benthic habitats. First, the towed camera finishes a pre-programmed initial survey to collect benthic habitat videos, which can then be converted to geo-located benthic habitat images. Second, an expert labels a number of benthic habitat images to class habitats manually. Third, attributes for categorizing these images are extracted automatically using the Bag of Features (BOF algorithm. Fourth, benthic cover categories are detected automatically using Weighted Majority Voting (WMV ensembles for Support Vector Machines (SVM, K-Nearest Neighbor (K-NN, and Bagging (BAG classifiers. Fifth, WMV-trained ensembles can be used for categorizing more benthic cover images automatically. Finally, correctly categorized geo-located images can provide ground truth samples for benthic cover mapping using high-resolution satellite imagery. The proposed methodology was tested over Shiraho, Ishigaki Island, Japan, a heterogeneous coastal area. The WMV ensemble exhibited 89% overall accuracy for categorizing corals, sediments, seagrass, and algae species. Furthermore, the same WMV ensemble produced a benthic cover map using a Quickbird satellite image with 92.7% overall accuracy.

  2. In silico machine learning methods in drug development.

    Science.gov (United States)

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases.

  3. Using a vision cognitive algorithm to schedule virtual machines

    Directory of Open Access Journals (Sweden)

    Zhao Jiaqi

    2014-09-01

    Full Text Available Scheduling virtual machines is a major research topic for cloud computing, because it directly influences the performance, the operation cost and the quality of services. A large cloud center is normally equipped with several hundred thousand physical machines. The mission of the scheduler is to select the best one to host a virtual machine. This is an NPhard global optimization problem with grand challenges for researchers. This work studies the Virtual Machine (VM scheduling problem on the cloud. Our primary concern with VM scheduling is the energy consumption, because the largest part of a cloud center operation cost goes to the kilowatts used. We designed a scheduling algorithm that allocates an incoming virtual machine instance on the host machine, which results in the lowest energy consumption of the entire system. More specifically, we developed a new algorithm, called vision cognition, to solve the global optimization problem. This algorithm is inspired by the observation of how human eyes see directly the smallest/largest item without comparing them pairwisely. We theoretically proved that the algorithm works correctly and converges fast. Practically, we validated the novel algorithm, together with the scheduling concept, using a simulation approach. The adopted cloud simulator models different cloud infrastructures with various properties and detailed runtime information that can usually not be acquired from real clouds. The experimental results demonstrate the benefit of our approach in terms of reducing the cloud center energy consumption

  4. Machine learning techniques to examine large patient databases.

    Science.gov (United States)

    Meyfroidt, Geert; Güiza, Fabian; Ramon, Jan; Bruynooghe, Maurice

    2009-03-01

    Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential applications for these techniques in medicine is large, few medical doctors are familiar with their methodology, advantages and pitfalls. A general overview of machine learning techniques, with a more detailed discussion of some of these algorithms, is presented in this review.

  5. Advances in independent component analysis and learning machines

    CERN Document Server

    Bingham, Ella; Laaksonen, Jorma; Lampinen, Jouko

    2015-01-01

    In honour of Professor Erkki Oja, one of the pioneers of Independent Component Analysis (ICA), this book reviews key advances in the theory and application of ICA, as well as its influence on signal processing, pattern recognition, machine learning, and data mining. Examples of topics which have developed from the advances of ICA, which are covered in the book are: A unifying probabilistic model for PCA and ICA Optimization methods for matrix decompositions Insights into the FastICA algorithmUnsupervised deep learning Machine vision and image retrieval A review of developments in the t

  6. Comparison of adaptive neuro-fuzzy inference system (ANFIS) and Gaussian processes for machine learning (GPML) algorithms for the prediction of skin temperature in lower limb prostheses.

    Science.gov (United States)

    Mathur, Neha; Glesk, Ivan; Buis, Arjan

    2016-10-01

    Monitoring of the interface temperature at skin level in lower-limb prosthesis is notoriously complicated. This is due to the flexible nature of the interface liners used impeding the required consistent positioning of the temperature sensors during donning and doffing. Predicting the in-socket residual limb temperature by monitoring the temperature between socket and liner rather than skin and liner could be an important step in alleviating complaints on increased temperature and perspiration in prosthetic sockets. In this work, we propose to implement an adaptive neuro fuzzy inference strategy (ANFIS) to predict the in-socket residual limb temperature. ANFIS belongs to the family of fused neuro fuzzy system in which the fuzzy system is incorporated in a framework which is adaptive in nature. The proposed method is compared to our earlier work using Gaussian processes for machine learning. By comparing the predicted and actual data, results indicate that both the modeling techniques have comparable performance metrics and can be efficiently used for non-invasive temperature monitoring. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  7. Trends in extreme learning machines: a review.

    Science.gov (United States)

    Huang, Gao; Huang, Guang-Bin; Song, Shiji; You, Keyou

    2015-01-01

    Extreme learning machine (ELM) has gained increasing interest from various research fields recently. In this review, we aim to report the current state of the theoretical research and practical advances on this subject. We first give an overview of ELM from the theoretical perspective, including the interpolation theory, universal approximation capability, and generalization ability. Then we focus on the various improvements made to ELM which further improve its stability, sparsity and accuracy under general or specific conditions. Apart from classification and regression, ELM has recently been extended for clustering, feature selection, representational learning and many other learning tasks. These newly emerging algorithms greatly expand the applications of ELM. From implementation aspect, hardware implementation and parallel computation techniques have substantially sped up the training of ELM, making it feasible for big data processing and real-time reasoning. Due to its remarkable efficiency, simplicity, and impressive generalization performance, ELM have been applied in a variety of domains, such as biomedical engineering, computer vision, system identification, and control and robotics. In this review, we try to provide a comprehensive view of these advances in ELM together with its future perspectives.

  8. Some multigrid algorithms for SIMD machines

    Energy Technology Data Exchange (ETDEWEB)

    Dendy, J.E. Jr. [Los Alamos National Lab., NM (United States)

    1996-12-31

    Previously a semicoarsening multigrid algorithm suitable for use on SIMD architectures was investigated. Through the use of new software tools, the performance of this algorithm has been considerably improved. The method has also been extended to three space dimensions. The method performs well for strongly anisotropic problems and for problems with coefficients jumping by orders of magnitude across internal interfaces. The parallel efficiency of this method is analyzed, and its actual performance on the CM-5 is compared with its performance on the CRAY-YMP. A standard coarsening multigrid algorithm is also considered, and we compare its performance on these two platforms as well.

  9. A Review of Current Machine Learning Techniques Used in Manufacturing Diagnosis

    OpenAIRE

    Ademujimi , Toyosi ,; Brundage , Michael ,; Prabhu , Vittaldas ,

    2017-01-01

    Part 6: Intelligent Diagnostics and Maintenance Solutions; International audience; Artificial intelligence applications are increasing due to advances in data collection systems, algorithms, and affordability of computing power. Within the manufacturing industry, machine learning algorithms are often used for improving manufacturing system fault diagnosis. This study focuses on a review of recent fault diagnosis applications in manufacturing that are based on several prominent machine learnin...

  10. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  11. Introduction to machine learning: k-nearest neighbors.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-06-01

    Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance.

  12. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  13. Machine Learning Method Applied in Readout System of Superheated Droplet Detector

    Science.gov (United States)

    Liu, Yi; Sullivan, Clair Julia; d'Errico, Francesco

    2017-07-01

    Direct readability is one advantage of superheated droplet detectors in neutron dosimetry. Utilizing such a distinct characteristic, an imaging readout system analyzes image of the detector for neutron dose readout. To improve the accuracy and precision of algorithms in the imaging readout system, machine learning algorithms were developed. Deep learning neural network and support vector machine algorithms are applied and compared with generally used Hough transform and curvature analysis methods. The machine learning methods showed a much higher accuracy and better precision in recognizing circular gas bubbles.

  14. Machine learning of the reactor core loading pattern critical parameters

    International Nuclear Information System (INIS)

    Trontl, K.; Pevec, D.; Smuc, T.

    2007-01-01

    The usual approach to loading pattern optimization involves high degree of engineering judgment, a set of heuristic rules, an optimization algorithm and a computer code used for evaluating proposed loading patterns. The speed of the optimization process is highly dependent on the computer code used for the evaluation. In this paper we investigate the applicability of a machine learning model which could be used for fast loading pattern evaluation. We employed a recently introduced machine learning technique, Support Vector Regression (SVR), which has a strong theoretical background in statistical learning theory. Superior empirical performance of the method has been reported on difficult regression problems in different fields of science and technology. SVR is a data driven, kernel based, nonlinear modelling paradigm, in which model parameters are automatically determined by solving a quadratic optimization problem. The main objective of the work reported in this paper was to evaluate the possibility of applying SVR method for reactor core loading pattern modelling. The starting set of experimental data for training and testing of the machine learning algorithm was obtained using a two-dimensional diffusion theory reactor physics computer code. We illustrate the performance of the solution and discuss its applicability, i.e., complexity, speed and accuracy, with a projection to a more realistic scenario involving machine learning from the results of more accurate and time consuming three-dimensional core modelling code. (author)

  15. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    Science.gov (United States)

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  16. [Algorithms, machine intelligence, big data : general considerations].

    Science.gov (United States)

    Radermacher, F J

    2015-08-01

    We are experiencing astonishing developments in the areas of big data and artificial intelligence. They follow a pattern that we have now been observing for decades: according to Moore's Law,the performance and efficiency in the area of elementary arithmetic operations increases a thousand-fold every 20 years. Although we have not achieved the status where in the singular sense machines have become as "intelligent" as people, machines are becoming increasingly better. The Internet of Things has again helped to massively increase the efficiency of machines. Big data and suitable analytics do the same. If we let these processes simply continue, our civilization may be endangerd in many instances. If the "containment" of these processes succeeds in the context of a reasonable political global governance, a worldwide eco-social market economy, andan economy of green and inclusive markets, many desirable developments that are advantageous for our future may result. Then, at some point in time, the constant need for more and faster innovation may even stop. However, this is anything but certain. We are facing huge challenges.

  17. Prediction of Allogeneic Hematopoietic Stem-Cell Transplantation Mortality 100 Days After Transplantation Using a Machine Learning Algorithm: A European Group for Blood and Marrow Transplantation Acute Leukemia Working Party Retrospective Data Mining Study.

    Science.gov (United States)

    Shouval, Roni; Labopin, Myriam; Bondi, Ori; Mishan-Shamay, Hila; Shimoni, Avichai; Ciceri, Fabio; Esteve, Jordi; Giebel, Sebastian; Gorin, Norbert C; Schmid, Christoph; Polge, Emmanuelle; Aljurf, Mahmoud; Kroger, Nicolaus; Craddock, Charles; Bacigalupo, Andrea; Cornelissen, Jan J; Baron, Frederic; Unger, Ron; Nagler, Arnon; Mohty, Mohamad

    2015-10-01

    Allogeneic hematopoietic stem-cell transplantation (HSCT) is potentially curative for acute leukemia (AL), but carries considerable risk. Machine learning algorithms, which are part of the data mining (DM) approach, may serve for transplantation-related mortality risk prediction. This work is a retrospective DM study on a cohort of 28,236 adult HSCT recipients from the AL registry of the European Group for Blood and Marrow Transplantation. The primary objective was prediction of overall mortality (OM) at 100 days after HSCT. Secondary objectives were estimation of nonrelapse mortality, leukemia-free survival, and overall survival at 2 years. Donor, recipient, and procedural characteristics were analyzed. The alternating decision tree machine learning algorithm was applied for model development on 70% of the data set and validated on the remaining data. OM prevalence at day 100 was 13.9% (n=3,936). Of the 20 variables considered, 10 were selected by the model for OM prediction, and several interactions were discovered. By using a logistic transformation function, the crude score was transformed into individual probabilities for 100-day OM (range, 3% to 68%). The model's discrimination for the primary objective performed better than the European Group for Blood and Marrow Transplantation score (area under the receiver operating characteristics curve, 0.701 v 0.646; Prisk evaluation of patients with AL before HSCT, and is available online (http://bioinfo.lnx.biu.ac.il/∼bondi/web1.html). It is presented as a continuous probabilistic score for the prediction of day 100 OM, extending prediction to 2 years. The DM method has proved useful for clinical prediction in HSCT. © 2015 by American Society of Clinical Oncology.

  18. A comparative study of the SVM and K-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.

    Science.gov (United States)

    Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian

    2014-06-27

    Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.

  19. Multitemporal Modelling of Socio-Economic Wildfire Drivers in Central Spain between the 1980s and the 2000s: Comparing Generalized Linear Models to Machine Learning Algorithms.

    Science.gov (United States)

    Vilar, Lara; Gómez, Israel; Martínez-Vega, Javier; Echavarría, Pilar; Riaño, David; Martín, M Pilar

    2016-01-01

    The socio-economic factors are of key importance during all phases of wildfire management that include prevention, suppression and restoration. However, modeling these factors, at the proper spatial and temporal scale to understand fire regimes is still challenging. This study analyses socio-economic drivers of wildfire occurrence in central Spain. This site represents a good example of how human activities play a key role over wildfires in the European Mediterranean basin. Generalized Linear Models (GLM) and machine learning Maximum Entropy models (Maxent) predicted wildfire occurrence in the 1980s and also in the 2000s to identify changes between each period in the socio-economic drivers affecting wildfire occurrence. GLM base their estimation on wildfire presence-absence observations whereas Maxent on wildfire presence-only. According to indicators like sensitivity or commission error Maxent outperformed GLM in both periods. It achieved a sensitivity of 38.9% and a commission error of 43.9% for the 1980s, and 67.3% and 17.9% for the 2000s. Instead, GLM obtained 23.33, 64.97, 9.41 and 18.34%, respectively. However GLM performed steadier than Maxent in terms of the overall fit. Both models explained wildfires from predictors such as population density and Wildland Urban Interface (WUI), but differed in their relative contribution. As a result of the urban sprawl and an abandonment of rural areas, predictors like WUI and distance to roads increased their contribution to both models in the 2000s, whereas Forest-Grassland Interface (FGI) influence decreased. This study demonstrates that human component can be modelled with a spatio-temporal dimension to integrate it into wildfire risk assessment.

  20. Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports.

    Science.gov (United States)

    Kessler, R C; van Loo, H M; Wardenaar, K J; Bossarte, R M; Brenner, L A; Cai, T; Ebert, D D; Hwang, I; Li, J; de Jonge, P; Nierenberg, A A; Petukhova, M V; Rosellini, A J; Sampson, N A; Schoevers, R A; Wilcox, M A; Zaslavsky, A M

    2016-10-01

    Heterogeneity of major depressive disorder (MDD) illness course complicates clinical decision-making. Although efforts to use symptom profiles or biomarkers to develop clinically useful prognostic subtypes have had limited success, a recent report showed that machine-learning (ML) models developed from self-reports about incident episode characteristics and comorbidities among respondents with lifetime MDD in the World Health Organization World Mental Health (WMH) Surveys predicted MDD persistence, chronicity and severity with good accuracy. We report results of model validation in an independent prospective national household sample of 1056 respondents with lifetime MDD at baseline. The WMH ML models were applied to these baseline data to generate predicted outcome scores that were compared with observed scores assessed 10-12 years after baseline. ML model prediction accuracy was also compared with that of conventional logistic regression models. Area under the receiver operating characteristic curve based on ML (0.63 for high chronicity and 0.71-0.76 for the other prospective outcomes) was consistently higher than for the logistic models (0.62-0.70) despite the latter models including more predictors. A total of 34.6-38.1% of respondents with subsequent high persistence chronicity and 40.8-55.8% with the severity indicators were in the top 20% of the baseline ML-predicted risk distribution, while only 0.9% of respondents with subsequent hospitalizations and 1.5% with suicide attempts were in the lowest 20% of the ML-predicted risk distribution. These results confirm that clinically useful MDD risk-stratification models can be generated from baseline patient self-reports and that ML methods improve on conventional methods in developing such models.

  1. Study of Environmental Data Complexity using Extreme Learning Machine

    Science.gov (United States)

    Leuenberger, Michael; Kanevski, Mikhail

    2017-04-01

    The main goals of environmental data science using machine learning algorithm deal, in a broad sense, around the calibration, the prediction and the visualization of hidden relationship between input and output variables. In order to optimize the models and to understand the phenomenon under study, the characterization of the complexity (at different levels) should be taken into account. Therefore, the identification of the linear or non-linear behavior between input and output variables adds valuable information for the knowledge of the phenomenon complexity. The present research highlights and investigates the different issues that can occur when identifying the complexity (linear/non-linear) of environmental data using machine learning algorithm. In particular, the main attention is paid to the description of a self-consistent methodology for the use of Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. By applying two ELM models (with linear and non-linear activation functions) and by comparing their efficiency, quantification of the linearity can be evaluated. The considered approach is accompanied by simulated and real high dimensional and multivariate data case studies. In conclusion, the current challenges and future development in complexity quantification using environmental data mining are discussed. References - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.

  2. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    Science.gov (United States)

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  3. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2016-01-01

    Machine learning techniques relevant for nonlinearity mitigation, carrier recovery, and nanoscale device characterization are reviewed and employed. Markov Chain Monte Carlo in combination with Bayesian filtering is employed within the nonlinear state-space framework and demonstrated for parameter...

  4. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2015-01-01

    Techniques from the machine learning community are reviewed and employed for laser characterization, signal detection in the presence of nonlinear phase noise, and nonlinearity mitigation. Bayesian filtering and expectation maximization are employed within nonlinear state-space framework...

  5. Computer vision and machine learning for archaeology

    NARCIS (Netherlands)

    van der Maaten, L.J.P.; Boon, P.; Lange, G.; Paijmans, J.J.; Postma, E.

    2006-01-01

    Until now, computer vision and machine learning techniques barely contributed to the archaeological domain. The use of these techniques can support archaeologists in their assessment and classification of archaeological finds. The paper illustrates the use of computer vision techniques for

  6. Using Machine Learning for Land Suitability Classification

    African Journals Online (AJOL)

    User

    West African Journal of Applied Ecology, vol. ... evidence for the utility of machine learning methods in land suitability classification especially MCS methods. ... Artificial intelligence tools. ..... Numerical values of index for the various classes.

  7. Machine Learning and Conflict Prediction: A Use Case

    Directory of Open Access Journals (Sweden)

    Chris Perry

    2013-10-01

    Full Text Available For at least the last two decades, the international community in general and the United Nations specifically have attempted to develop robust, accurate and effective conflict early warning system for conflict prevention. One potential and promising component of integrated early warning systems lies in the field of machine learning. This paper aims at giving conflict analysis a basic understanding of machine learning methodology as well as to test the feasibility and added value of such an approach. The paper finds that the selection of appropriate machine learning methodologies can offer substantial improvements in accuracy and performance. It also finds that even at this early stage in testing machine learning on conflict prediction, full models offer more predictive power than simply using a prior outbreak of violence as the leading indicator of current violence. This suggests that a refined data selection methodology combined with strategic use of machine learning algorithms could indeed offer a significant addition to the early warning toolkit. Finally, the paper suggests a number of steps moving forward to improve upon this initial test methodology.

  8. Model-Agnostic Interpretability of Machine Learning

    OpenAIRE

    Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos

    2016-01-01

    Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred f...

  9. Implementing Machine Learning in the PCWG Tool

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    2016-12-13

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  10. Support vector machines optimization based theory, algorithms, and extensions

    CERN Document Server

    Deng, Naiyang; Zhang, Chunhua

    2013-01-01

    Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions presents an accessible treatment of the two main components of support vector machines (SVMs)-classification problems and regression problems. The book emphasizes the close connection between optimization theory and SVMs since optimization is one of the pillars on which SVMs are built.The authors share insight on many of their research achievements. They give a precise interpretation of statistical leaning theory for C-support vector classification. They also discuss regularized twi

  11. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  12. Machine learning approach for single molecule localisation microscopy.

    Science.gov (United States)

    Colabrese, Silvia; Castello, Marco; Vicidomini, Giuseppe; Del Bue, Alessio

    2018-04-01

    Single molecule localisation (SML) microscopy is a fundamental tool for biological discoveries; it provides sub-diffraction spatial resolution images by detecting and localizing "all" the fluorescent molecules labeling the structure of interest. For this reason, the effective resolution of SML microscopy strictly depends on the algorithm used to detect and localize the single molecules from the series of microscopy frames. To adapt to the different imaging conditions that can occur in a SML experiment, all current localisation algorithms request, from the microscopy users, the choice of different parameters. This choice is not always easy and their wrong selection can lead to poor performance. Here we overcome this weakness with the use of machine learning. We propose a parameter-free pipeline for SML learning based on support vector machine (SVM). This strategy requires a short supervised training that consists in selecting by the user few fluorescent molecules (∼ 10-20) from the frames under analysis. The algorithm has been extensively tested on both synthetic and real acquisitions. Results are qualitatively and quantitatively consistent with the state of the art in SML microscopy and demonstrate that the introduction of machine learning can lead to a new class of algorithms competitive and conceived from the user point of view.

  13. Automatic microseismic event picking via unsupervised machine learning

    Science.gov (United States)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  14. Splendidly blended: a machine learning set up for CDU control

    Science.gov (United States)

    Utzny, Clemens

    2017-06-01

    As the concepts of machine learning and artificial intelligence continue to grow in importance in the context of internet related applications it is still in its infancy when it comes to process control within the semiconductor industry. Especially the branch of mask manufacturing presents a challenge to the concepts of machine learning since the business process intrinsically induces pronounced product variability on the background of small plate numbers. In this paper we present the architectural set up of a machine learning algorithm which successfully deals with the demands and pitfalls of mask manufacturing. A detailed motivation of this basic set up followed by an analysis of its statistical properties is given. The machine learning set up for mask manufacturing involves two learning steps: an initial step which identifies and classifies the basic global CD patterns of a process. These results form the basis for the extraction of an optimized training set via balanced sampling. A second learning step uses this training set to obtain the local as well as global CD relationships induced by the manufacturing process. Using two production motivated examples we show how this approach is flexible and powerful enough to deal with the exacting demands of mask manufacturing. In one example we show how dedicated covariates can be used in conjunction with increased spatial resolution of the CD map model in order to deal with pathological CD effects at the mask boundary. The other example shows how the model set up enables strategies for dealing tool specific CD signature differences. In this case the balanced sampling enables a process control scheme which allows usage of the full tool park within the specified tight tolerance budget. Overall, this paper shows that the current rapid developments off the machine learning algorithms can be successfully used within the context of semiconductor manufacturing.

  15. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  16. Simulation-driven machine learning: Bearing fault classification

    Science.gov (United States)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  17. Machine learning techniques for optical communication system optimization

    DEFF Research Database (Denmark)

    Zibar, Darko; Wass, Jesper; Thrane, Jakob

    In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction.......In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction....

  18. Empirical tests of the Gradual Learning Algorithm

    NARCIS (Netherlands)

    Boersma, P.; Hayes, B.

    1999-01-01

    The Gradual Learning Algorithm (Boersma 1997) is a constraint ranking algorithm for learning Optimality-theoretic grammars. The purpose of this article is to assess the capabilities of the Gradual Learning Algorithm, particularly in comparison with the Constraint Demotion algorithm of Tesar and

  19. Empirical tests of the Gradual Learning Algorithm

    NARCIS (Netherlands)

    Boersma, P.; Hayes, B.

    2001-01-01

    The Gradual Learning Algorithm (Boersma 1997) is a constraint-ranking algorithm for learning optimality-theoretic grammars. The purpose of this article is to assess the capabilities of the Gradual Learning Algorithm, particularly in comparison with the Constraint Demotion algorithm of Tesar and

  20. Intelligent Machine Learning Approaches for Aerospace Applications

    Science.gov (United States)

    Sathyan, Anoop

    Machine Learning is a type of artificial intelligence that provides machines or networks the ability to learn from data without the need to explicitly program them. There are different kinds of machine learning techniques. This thesis discusses the applications of two of these approaches: Genetic Fuzzy Logic and Convolutional Neural Networks (CNN). Fuzzy Logic System (FLS) is a powerful tool that can be used for a wide variety of applications. FLS is a universal approximator that reduces the need for complex mathematics and replaces it with expert knowledge of the system to produce an input-output mapping using If-Then rules. The expert knowledge of a system can help in obtaining the parameters for small-scale FLSs, but for larger networks we will need to use sophisticated approaches that can automatically train the network to meet the design requirements. This is where Genetic Algorithms (GA) and EVE come into the picture. Both GA and EVE can tune the FLS parameters to minimize a cost function that is designed to meet the requirements of the specific problem. EVE is an artificial intelligence developed by Psibernetix that is trained to tune large scale FLSs. The parameters of an FLS can include the membership functions and rulebase of the inherent Fuzzy Inference Systems (FISs). The main issue with using the GFS is that the number of parameters in a FIS increase exponentially with the number of inputs thus making it increasingly harder to tune them. To reduce this issue, the FLSs discussed in this thesis consist of 2-input-1-output FISs in cascade (Chapter 4) or as a layer of parallel FISs (Chapter 7). We have obtained extremely good results using GFS for different applications at a reduced computational cost compared to other algorithms that are commonly used to solve the corresponding problems. In this thesis, GFSs have been designed for controlling an inverted double pendulum, a task allocation problem of clustering targets amongst a set of UAVs, a fire

  1. Outsmarting neural networks: an alternative paradigm for machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  2. Poster abstract: A machine learning approach for vehicle classification using passive infrared and ultrasonic sensors

    KAUST Repository

    Warriach, Ehsan Ullah; Claudel, Christian G.

    2013-01-01

    This article describes the implementation of four different machine learning techniques for vehicle classification in a dual ultrasonic/passive infrared traffic flow sensors. Using k-NN, Naive Bayes, SVM and KNN-SVM algorithms, we show that KNN

  3. Leave-two-out stability of ontology learning algorithm

    International Nuclear Information System (INIS)

    Wu, Jianzhang; Yu, Xiao; Zhu, Linli; Gao, Wei

    2016-01-01

    Ontology is a semantic analysis and calculation model, which has been applied to many subjects. Ontology similarity calculation and ontology mapping are employed as machine learning approaches. The purpose of this paper is to study the leave-two-out stability of ontology learning algorithm. Several leave-two-out stabilities are defined in ontology learning setting and the relationship among these stabilities are presented. Furthermore, the results manifested reveal that leave-two-out stability is a sufficient and necessary condition for ontology learning algorithm.

  4. Designing machines for lattice physics and algorithm investigation

    International Nuclear Information System (INIS)

    Fischler, M.; Atac, R.; Cook, A.

    1989-10-01

    Special-purpose computers are appropriate tools for the study of lattice gauge theory. While these machines deliver considerable processing power, it is also important to be able to program complex physics ideas and investigate algorithms on them. We examine features that facilitate coding of physics problems, and flexibility in algorithms. Appropriate balances among power, memory, communications and I/O capabilities are presented. 10 refs

  5. Research into Financial Position of Listed Companies following Classification via Extreme Learning Machine Based upon DE Optimization

    OpenAIRE

    Fu Yu; Mu Jiong; Duan Xu Liang

    2016-01-01

    By means of the model of extreme learning machine based upon DE optimization, this article particularly centers on the optimization thinking of such a model as well as its application effect in the field of listed company’s financial position classification. It proves that the improved extreme learning machine algorithm based upon DE optimization eclipses the traditional extreme learning machine algorithm following comparison. Meanwhile, this article also intends to introduce certain research...

  6. Multivariate Mapping of Environmental Data Using Extreme Learning Machines

    Science.gov (United States)

    Leuenberger, Michael; Kanevski, Mikhail

    2014-05-01

    In most real cases environmental data are multivariate, highly variable at several spatio-temporal scales, and are generated by nonlinear and complex phenomena. Mapping - spatial predictions of such data, is a challenging problem. Machine learning algorithms, being universal nonlinear tools, have demonstrated their efficiency in modelling of environmental spatial and space-time data (Kanevski et al. 2009). Recently, a new approach in machine learning - Extreme Learning Machine (ELM), has gained a great popularity. ELM is a fast and powerful approach being a part of the machine learning algorithm category. Developed by G.-B. Huang et al. (2006), it follows the structure of a multilayer perceptron (MLP) with one single-hidden layer feedforward neural networks (SLFNs). The learning step of classical artificial neural networks, like MLP, deals with the optimization of weights and biases by using gradient-based learning algorithm (e.g. back-propagation algorithm). Opposed to this optimization phase, which can fall into local minima, ELM generates randomly the weights between the input layer and the hidden layer and also the biases in the hidden layer. By this initialization, it optimizes just the weight vector between the hidden layer and the output layer in a single way. The main advantage of this algorithm is the speed of the learning step. In a theoretical context and by growing the number of hidden nodes, the algorithm can learn any set of training data with zero error. To avoid overfitting, cross-validation method or "true validation" (by randomly splitting data into training, validation and testing subsets) are recommended in order to find an optimal number of neurons. With its universal property and solid theoretical basis, ELM is a good machine learning algorithm which can push the field forward. The present research deals with an extension of ELM to multivariate output modelling and application of ELM to the real data case study - pollution of the sediments in

  7. The Dropout Learning Algorithm

    Science.gov (United States)

    Baldi, Pierre; Sadowski, Peter

    2014-01-01

    Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations. PMID:24771879

  8. Methodology for creating dedicated machine and algorithm on sunflower counting

    Science.gov (United States)

    Muracciole, Vincent; Plainchault, Patrick; Mannino, Maria-Rosaria; Bertrand, Dominique; Vigouroux, Bertrand

    2007-09-01

    In order to sell grain lots in European countries, seed industries need a government certification. This certification requests purity testing, seed counting in order to quantify specified seed species and other impurities in lots, and germination testing. These analyses are carried out within the framework of international trade according to the methods of the International Seed Testing Association. Presently these different analyses are still achieved manually by skilled operators. Previous works have already shown that seeds can be characterized by around 110 visual features (morphology, colour, texture), and thus have presented several identification algorithms. Until now, most of the works in this domain are computer based. The approach presented in this article is based on the design of dedicated electronic vision machine aimed to identify and sort seeds. This machine is composed of a FPGA (Field Programmable Gate Array), a DSP (Digital Signal Processor) and a PC bearing the GUI (Human Machine Interface) of the system. Its operation relies on the stroboscopic image acquisition of a seed falling in front of a camera. A first machine was designed according to this approach, in order to simulate all the vision chain (image acquisition, feature extraction, identification) under the Matlab environment. In order to perform this task into dedicated hardware, all these algorithms were developed without the use of the Matlab toolbox. The objective of this article is to present a design methodology for a special purpose identification algorithm based on distance between groups into dedicated hardware machine for seed counting.

  9. Machine Learning Phases of Strongly Correlated Fermions

    Directory of Open Access Journals (Sweden)

    Kelvin Ch’ng

    2017-08-01

    Full Text Available Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling. We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  10. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  11. Diagnosing Coronary Heart Disease using Ensemble Machine Learning

    OpenAIRE

    Kathleen H. Miao; Julia H. Miao; George J. Miao

    2016-01-01

    Globally, heart disease is the leading cause of death for both men and women. One in every four people is afflicted with and dies of heart disease. Early and accurate diagnoses of heart disease thus are crucial in improving the chances of long-term survival for patients and saving millions of lives. In this research, an advanced ensemble machine learning technology, utilizing an adaptive Boosting algorithm, is developed for accurate coronary heart disease diagnosis and outcome predictions. Th...

  12. A Machine Learning Perspective on Predictive Coding with PAQ

    OpenAIRE

    Knoll, Byron; de Freitas, Nando

    2011-01-01

    PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a sta...

  13. Machine Learning Classification of Buildings for Map Generalization

    Directory of Open Access Journals (Sweden)

    Jaeeun Lee

    2017-10-01

    Full Text Available A critical problem in mapping data is the frequent updating of large data sets. To solve this problem, the updating of small-scale data based on large-scale data is very effective. Various map generalization techniques, such as simplification, displacement, typification, elimination, and aggregation, must therefore be applied. In this study, we focused on the elimination and aggregation of the building layer, for which each building in a large scale was classified as “0-eliminated,” “1-retained,” or “2-aggregated.” Machine-learning classification algorithms were then used for classifying the buildings. The data of 1:1000 scale and 1:25,000 scale digital maps obtained from the National Geographic Information Institute were used. We applied to these data various machine-learning classification algorithms, including naive Bayes (NB, decision tree (DT, k-nearest neighbor (k-NN, and support vector machine (SVM. The overall accuracies of each algorithm were satisfactory: DT, 88.96%; k-NN, 88.27%; SVM, 87.57%; and NB, 79.50%. Although elimination is a direct part of the proposed process, generalization operations, such as simplification and aggregation of polygons, must still be performed for buildings classified as retained and aggregated. Thus, these algorithms can be used for building classification and can serve as preparatory steps for building generalization.

  14. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  15. A Machine LearningFramework to Forecast Wave Conditions

    Science.gov (United States)

    Zhang, Y.; James, S. C.; O'Donncha, F.

    2017-12-01

    Recently, significant effort has been undertaken to quantify and extract wave energy because it is renewable, environmental friendly, abundant, and often close to population centers. However, a major challenge is the ability to accurately and quickly predict energy production, especially across a 48-hour cycle. Accurate forecasting of wave conditions is a challenging undertaking that typically involves solving the spectral action-balance equation on a discretized grid with high spatial resolution. The nature of the computations typically demands high-performance computing infrastructure. Using a case-study site at Monterey Bay, California, a machine learning framework was trained to replicate numerically simulated wave conditions at a fraction of the typical computational cost. Specifically, the physics-based Simulating WAves Nearshore (SWAN) model, driven by measured wave conditions, nowcast ocean currents, and wind data, was used to generate training data for machine learning algorithms. The model was run between April 1st, 2013 and May 31st, 2017 generating forecasts at three-hour intervals yielding 11,078 distinct model outputs. SWAN-generated fields of 3,104 wave heights and a characteristic period could be replicated through simple matrix multiplications using the mapping matrices from machine learning algorithms. In fact, wave-height RMSEs from the machine learning algorithms (9 cm) were less than those for the SWAN model-verification exercise where those simulations were compared to buoy wave data within the model domain (>40 cm). The validated machine learning approach, which acts as an accurate surrogate for the SWAN model, can now be used to perform real-time forecasts of wave conditions for the next 48 hours using available forecasted boundary wave conditions, ocean currents, and winds. This solution has obvious applications to wave-energy generation as accurate wave conditions can be forecasted with over a three-order-of-magnitude reduction in

  16. Machine learning strategies for systems with invariance properties

    Science.gov (United States)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  17. (Machine) learning to do more with less

    Science.gov (United States)

    Cohen, Timothy; Freytsis, Marat; Ostdiek, Bryan

    2018-02-01

    Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard "fully supervised" approach (which relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called "weakly supervised" technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to a class of systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided on GitHub.

  18. Semi-supervised and unsupervised extreme learning machines.

    Science.gov (United States)

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.

  19. Cascade Error Projection: A New Learning Algorithm

    Science.gov (United States)

    Duong, T. A.; Stubberud, A. R.; Daud, T.; Thakoor, A. P.

    1995-01-01

    A new neural network architecture and a hardware implementable learning algorithm is proposed. The algorithm, called cascade error projection (CEP), handles lack of precision and circuit noise better than existing algorithms.

  20. Machine learning methods for metabolic pathway prediction

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2010-01-01

    Full Text Available Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.