WorldWideScience

Sample records for machine learning technique

  1. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  2. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2015-01-01

    Techniques from the machine learning community are reviewed and employed for laser characterization, signal detection in the presence of nonlinear phase noise, and nonlinearity mitigation. Bayesian filtering and expectation maximization are employed within nonlinear state-space framework...

  3. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2016-01-01

    Machine learning techniques relevant for nonlinearity mitigation, carrier recovery, and nanoscale device characterization are reviewed and employed. Markov Chain Monte Carlo in combination with Bayesian filtering is employed within the nonlinear state-space framework and demonstrated for parameter...

  4. MACHINE LEARNING TECHNIQUES USED IN BIG DATA

    Directory of Open Access Journals (Sweden)

    STEFANIA LOREDANA NITA

    2016-07-01

    Full Text Available The classical tools used in data analysis are not enough in order to benefit of all advantages of big data. The amount of information is too large for a complete investigation, and the possible connections and relations between data could be missed, because it is difficult or even impossible to verify all assumption over the information. Machine learning is a great solution in order to find concealed correlations or relationships between data, because it runs at scale machine and works very well with large data sets. The more data we have, the more the machine learning algorithm is useful, because it “learns” from the existing data and applies the found rules on new entries. In this paper, we present some machine learning algorithms and techniques used in big data.

  5. Machine learning techniques and drug design.

    Science.gov (United States)

    Gertrudes, J C; Maltarollo, V G; Silva, R A; Oliveira, P R; Honório, K M; da Silva, A B F

    2012-01-01

    The interest in the application of machine learning techniques (MLT) as drug design tools is growing in the last decades. The reason for this is related to the fact that the drug design is very complex and requires the use of hybrid techniques. A brief review of some MLT such as self-organizing maps, multilayer perceptron, bayesian neural networks, counter-propagation neural network and support vector machines is described in this paper. A comparison between the performance of the described methods and some classical statistical methods (such as partial least squares and multiple linear regression) shows that MLT have significant advantages. Nowadays, the number of studies in medicinal chemistry that employ these techniques has considerably increased, in particular the use of support vector machines. The state of the art and the future trends of MLT applications encompass the use of these techniques to construct more reliable QSAR models. The models obtained from MLT can be used in virtual screening studies as well as filters to develop/discovery new chemicals. An important challenge in the drug design field is the prediction of pharmacokinetic and toxicity properties, which can avoid failures in the clinical phases. Therefore, this review provides a critical point of view on the main MLT and shows their potential ability as a valuable tool in drug design.

  6. BENCHMARKING MACHINE LEARNING TECHNIQUES FOR SOFTWARE DEFECT DETECTION

    Directory of Open Access Journals (Sweden)

    Saiqa Aleem

    2015-06-01

    Full Text Available Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques help developers to retrieve useful information after the classification and enable them to analyse data from different perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction. Results showed most of the machine learning methods performed well on software bug datasets.

  7. Machine learning approximation techniques using dual trees

    OpenAIRE

    Ergashbaev, Denis

    2015-01-01

    This master thesis explores a dual-tree framework as applied to a particular class of machine learning problems that are collectively referred to as generalized n-body problems. It builds a new algorithm on top of it and improves existing Boosted OGE classifier.

  8. Machine learning techniques in dialogue act recognition

    Directory of Open Access Journals (Sweden)

    Mark Fišel

    2007-05-01

    Full Text Available This report addresses dialogue acts, their existing applications and techniques of automatically recognizing them, in Estonia as well as elsewhere. Three main applications are described: in dialogue systems to determine the intention of the speaker, in dialogue systems with machine translation to resolve ambiguities in the possible translation variants and in speech recognition to reduce word recognition error rate. Several recognition techniques are described on the surface level: how they work and how they are trained. A summary of the corresponding representation methods is provided for each technique. The paper also includes examples of applying the techniques to dialogue act recognition.The author comes to the conclusion that using the current evaluation metric it is impossible to compare dialogue act recognition techniques when these are applied to different dialogue act tag sets. Dialogue acts remain an open research area, with space and need for developing new recognition techniques and methods of evaluation.

  9. Comparison of Machine Learning Techniques for Target Detection

    NARCIS (Netherlands)

    Vink, J.P.; Haan, G. de

    2013-01-01

    This paper focuses on machine learning techniques for real-time detection. Although many supervised learning techniques have been described in the literature, no technique always performs best. Several comparative studies are available, but have not always been performedcarefully, leading to invalid

  10. Comparison of Machine Learning Techniques for Target Detection

    NARCIS (Netherlands)

    Vink, J.P.; Haan, G. de

    2013-01-01

    This paper focuses on machine learning techniques for real-time detection. Although many supervised learning techniques have been described in the literature, no technique always performs best. Several comparative studies are available, but have not always been performedcarefully, leading to invalid

  11. Modelling tick abundance using machine learning techniques and satellite imagery

    DEFF Research Database (Denmark)

    Kjær, Lene Jung; Korslund, L.; Kjelland, V.

    satellite images to run Boosted Regression Tree machine learning algorithms to predict overall distribution (presence/absence of ticks) and relative tick abundance of nymphs and larvae in southern Scandinavia. For nymphs, the predicted abundance had a positive correlation with observed abundance...... the predicted distribution of larvae was mostly even throughout Denmark, it was primarily around the coastlines in Norway and Sweden. Abundance was fairly low overall except in some fragmented patches corresponding to forested habitats in the region. Machine learning techniques allow us to predict for larger...... the collected ticks for pathogens and using the same machine learning techniques to develop prevalence maps of the ScandTick region....

  12. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  13. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  14. Machine Learning Techniques in Optimal Design

    Science.gov (United States)

    Cerbone, Giuseppe

    1992-01-01

    Many important applications can be formalized as constrained optimization tasks. For example, we are studying the engineering domain of two-dimensional (2-D) structural design. In this task, the goal is to design a structure of minimum weight that bears a set of loads. A solution to a design problem in which there is a single load (L) and two stationary support points (S1 and S2) consists of four members, E1, E2, E3, and E4 that connect the load to the support points is discussed. In principle, optimal solutions to problems of this kind can be found by numerical optimization techniques. However, in practice [Vanderplaats, 1984] these methods are slow and they can produce different local solutions whose quality (ratio to the global optimum) varies with the choice of starting points. Hence, their applicability to real-world problems is severely restricted. To overcome these limitations, we propose to augment numerical optimization by first performing a symbolic compilation stage to produce: (a) objective functions that are faster to evaluate and that depend less on the choice of the starting point and (b) selection rules that associate problem instances to a set of recommended solutions. These goals are accomplished by successive specializations of the problem class and of the associated objective functions. In the end, this process reduces the problem to a collection of independent functions that are fast to evaluate, that can be differentiated symbolically, and that represent smaller regions of the overall search space. However, the specialization process can produce a large number of sub-problems. This is overcome by deriving inductively selection rules which associate problems to small sets of specialized independent sub-problems. Each set of candidate solutions is chosen to minimize a cost function which expresses the tradeoff between the quality of the solution that can be obtained from the sub-problem and the time it takes to produce it. The overall solution

  15. Modern machine learning techniques and their applications in cartoon animation research

    CERN Document Server

    Yu, Jun

    2013-01-01

    The integration of machine learning techniques and cartoon animation research is fast becoming a hot topic. This book helps readers learn the latest machine learning techniques, including patch alignment framework; spectral clustering, graph cuts, and convex relaxation; ensemble manifold learning; multiple kernel learning; multiview subspace learning; and multiview distance metric learning. It then presents the applications of these modern machine learning techniques in cartoon animation research. With these techniques, users can efficiently utilize the cartoon materials to generate animations

  16. Machine learning techniques for razor triggers

    CERN Document Server

    Kolosova, Marina

    2015-01-01

    My project was focused on the development of a neural network which can predict if an event passes or not a razor trigger. Using synthetic data containing jets and missing transverse energy we built and trained a razor network by supervised learning. We accomplished a ∼ 91% agreement between the output of the neural network and the target while the other 10% was due to the noise of the neural network. We could apply such networks during the L1 trigger using neuromorhic hardware. Neuromorphic chips are electronic systems that function in a way similar to an actual brain, they are faster than GPUs or CPUs, but they can only be used with spiking neural networks.

  17. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  18. Analysing CMS transfers using Machine Learning techniques

    CERN Document Server

    Diotalevi, Tommaso

    2016-01-01

    LHC experiments transfer more than 10 PB/week between all grid sites using the FTS transfer service. In particular, CMS manages almost 5 PB/week of FTS transfers with PhEDEx (Physics Experiment Data Export). FTS sends metrics about each transfer (e.g. transfer rate, duration, size) to a central HDFS storage at CERN. The work done during these three months, here as a Summer Student, involved the usage of ML techniques, using a CMS framework called DCAFPilot, to process this new data and generate predictions of transfer latencies on all links between Grid sites. This analysis will provide, as a future service, the necessary information in order to proactively identify and maybe fix latency issued transfer over the WLCG.

  19. Machine Learning

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  20. Decision Support System for Diabetes Mellitus through Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Tarik A. Rashid

    2016-07-01

    Full Text Available recently, the diseases of diabetes mellitus have grown into extremely feared problems that can have damaging effects on the health condition of their sufferers globally. In this regard, several machine learning models have been used to predict and classify diabetes types. Nevertheless, most of these models attempted to solve two problems; categorizing patients in terms of diabetic types and forecasting blood surge rate of patients. This paper presents an automatic decision support system for diabetes mellitus through machine learning techniques by taking into account the above problems, plus, reflecting the skills of medical specialists who believe that there is a great relationship between patient’s symptoms with some chronic diseases and the blood sugar rate. Data sets are collected from Layla Qasim Clinical Center in Kurdistan Region, then, the data is cleaned and proposed using feature selection techniques such as Sequential Forward Selection and the Correlation Coefficient, finally, the refined data is fed into machine learning models for prediction, classification, and description purposes. This system enables physicians and doctors to provide diabetes mellitus (DM patients good health treatments and recommendations.

  1. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    Science.gov (United States)

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  2. Data mining practical machine learning tools and techniques

    CERN Document Server

    Witten, Ian H

    2005-01-01

    As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same

  3. Estimation of Alpine Skier Posture Using Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Bojan Nemec

    2014-10-01

    Full Text Available High precision Global Navigation Satellite System (GNSS measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier’s neck. A key issue is how to estimate other more relevant parameters of the skier’s body, like the center of mass (COM and ski trajectories. Previously, these parameters were estimated by modeling the skier’s body with an inverted-pendulum model that oversimplified the skier’s body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier’s body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing.

  4. Classification of Phishing Email Using Random Forest Machine Learning Technique

    Directory of Open Access Journals (Sweden)

    Andronicus A. Akinyelu

    2014-01-01

    Full Text Available Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars have been lost by many companies and individuals. In 2012, an online report put the loss due to phishing attack at about $1.5 billion. This global impact of phishing attacks will continue to be on the increase and thus requires more efficient phishing detection techniques to curb the menace. This paper investigates and reports the use of random forest machine learning algorithm in classification of phishing attacks, with the major objective of developing an improved phishing email classifier with better prediction accuracy and fewer numbers of features. From a dataset consisting of 2000 phishing and ham emails, a set of prominent phishing email features (identified from the literature were extracted and used by the machine learning algorithm with a resulting classification accuracy of 99.7% and low false negative (FN and false positive (FP rates.

  5. Machine learning techniques applied to system characterization and equalization

    DEFF Research Database (Denmark)

    Zibar, Darko; Thrane, Jakob; Wass, Jesper

    2016-01-01

    Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals.......Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals....

  6. Machine Learning Techniques in Diagnosis of Pulmonary Embolism

    Directory of Open Access Journals (Sweden)

    Goksu Berikol

    2016-04-01

    Full Text Available Aim: Pulmonary embolism (PE, is a high mortality disease which clinical suspicion and a variety of diagnostic laboratory and imagingresults have a high importance in diagnose. Anticoagulation and fybrinolytic treatments are hard to decide in some cases therefore early diagnose is important in emergency medicine.Material and Method: The study was designed retrospectively based on the records of the 201 patients who were presenting to Emergency Department with pulmonary complaints including dyspnea and chest pain between January 2010 and October 2013. Results: Machine learning techniques were used for calculating the success in diagnosing PE. The success rate of the classification tree method for detection of PE was 95%, which was higher than that of KNN classification (75% and Naive Bayes Classification (88.5%. Discussion: Classification tree and Bayesian method can be selected ones to diagnose or define possibility of pulmonary embolism in emergency centers with limited study tests and for the patients difficultly diagnosed.

  7. Improving Alzheimer's disease diagnosis with machine learning techniques.

    Science.gov (United States)

    Trambaiolli, Lucas R; Lorena, Ana C; Fraga, Francisco J; Kanda, Paulo A M; Anghinah, Renato; Nitrini, Ricardo

    2011-07-01

    There is not a specific test to diagnose Alzheimer's disease (AD). Its diagnosis should be based upon clinical history, neuropsychological and laboratory tests, neuroimaging and electroencephalography (EEG). Therefore, new approaches are necessary to enable earlier and more accurate diagnosis and to follow treatment results. In this study we used a Machine Learning (ML) technique, named Support Vector Machine (SVM), to search patterns in EEG epochs to differentiate AD patients from controls. As a result, we developed a quantitative EEG (qEEG) processing method for automatic differentiation of patients with AD from normal individuals, as a complement to the diagnosis of probable dementia. We studied EEGs from 19 normal subjects (14 females/5 males, mean age 71.6 years) and 16 probable mild to moderate symptoms AD patients (14 females/2 males, mean age 73.4 years. The results obtained from analysis of EEG epochs were accuracy 79.9% and sensitivity 83.2%. The analysis considering the diagnosis of each individual patient reached 87.0% accuracy and 91.7% sensitivity.

  8. Machine learning in Python essential techniques for predictive analysis

    CERN Document Server

    Bowles, Michael

    2015-01-01

    Learn a simpler and more effective way to analyze data and predict outcomes with Python Machine Learning in Python shows you how to successfully analyze data using only two core machine learning algorithms, and how to apply them using Python. By focusing on two algorithm families that effectively predict outcomes, this book is able to provide full descriptions of the mechanisms at work, and the examples that illustrate the machinery with specific, hackable code. The algorithms are explained in simple terms with no complex math and applied using Python, with guidance on algorithm selection, d

  9. Machine learning techniques for energy optimization in mobile embedded systems

    Science.gov (United States)

    Donohoo, Brad Kyoshi

    Mobile smartphones and other portable battery operated embedded systems (PDAs, tablets) are pervasive computing devices that have emerged in recent years as essential instruments for communication, business, and social interactions. While performance, capabilities, and design are all important considerations when purchasing a mobile device, a long battery lifetime is one of the most desirable attributes. Battery technology and capacity has improved over the years, but it still cannot keep pace with the power consumption demands of today's mobile devices. This key limiter has led to a strong research emphasis on extending battery lifetime by minimizing energy consumption, primarily using software optimizations. This thesis presents two strategies that attempt to optimize mobile device energy consumption with negligible impact on user perception and quality of service (QoS). The first strategy proposes an application and user interaction aware middleware framework that takes advantage of user idle time between interaction events of the foreground application to optimize CPU and screen backlight energy consumption. The framework dynamically classifies mobile device applications based on their received interaction patterns, then invokes a number of different power management algorithms to adjust processor frequency and screen backlight levels accordingly. The second strategy proposes the usage of machine learning techniques to learn a user's mobile device usage pattern pertaining to spatiotemporal and device contexts, and then predict energy-optimal data and location interface configurations. By learning where and when a mobile device user uses certain power-hungry interfaces (3G, WiFi, and GPS), the techniques, which include variants of linear discriminant analysis, linear logistic regression, non-linear logistic regression, and k-nearest neighbor, are able to dynamically turn off unnecessary interfaces at runtime in order to save energy.

  10. Using machine learning techniques to differentiate acute coronary syndrome

    Directory of Open Access Journals (Sweden)

    Sougand Setareh

    2015-02-01

    Full Text Available Backgroud: Acute coronary syndrome (ACS is an unstable and dynamic process that includes unstable angina, ST elevation myocardial infarction, and non-ST elevation myocardial infarction. Despite recent technological advances in early diognosis of ACS, differentiating between different types of coronary diseases in the early hours of admission is controversial. The present study was aimed to accurately differentiate between various coronary events, using machine learning techniques. Such methods, as a subset of artificial intelligence, include algorithms that allow computers to learn and play a major role in treatment decisions. Methods: 1902 patients diagnosed with ACS and admitted to hospital were selected according to Euro Heart Survey on ACS. Patients were classified based on decision tree J48. Bagging aggregation algorithms was implemented to increase the efficiency of algorithm. Results: The performance of classifiers was estimated and compared based on their accuracy computed from confusion matrix. The accuracy rates of decision tree and bagging algorithm were calculated to be 91.74% and 92.53%, respectively. Conclusion: The proposed methods used in this study proved to have the ability to identify various ACS. In addition, using matrix of confusion, an acceptable number of subjects with acute coronary syndrome were identified in each class.

  11. Short-term wind speed predictions with machine learning techniques

    Science.gov (United States)

    Ghorbani, M. A.; Khatibi, R.; FazeliFard, M. H.; Naghipour, L.; Makarynskyy, O.

    2016-02-01

    Hourly wind speed forecasting is presented by a modeling study with possible applications to practical problems including farming wind energy, aircraft safety and airport operations. Modeling techniques employed in this paper for such short-term predictions are based on the machine learning techniques of artificial neural networks (ANNs) and genetic expression programming (GEP). Recorded values of wind speed were used, which comprised 8 years of collected data at the Kersey site, Colorado, USA. The January data over the first 7 years (2005-2011) were used for model training; and the January data for 2012 were used for model testing. A number of model structures were investigated for the validation of the robustness of these two techniques. The prediction results were compared with those of a multiple linear regression (MLR) method and with the Persistence method developed for the data. The model performances were evaluated using the correlation coefficient, root mean square error, Nash-Sutcliffe efficiency coefficient and Akaike information criterion. The results indicate that forecasting wind speed is feasible using past records of wind speed alone, but the maximum lead time for the data was found to be 14 h. The results show that different techniques would lead to different results, where the choice between them is not easy. Thus, decision making has to be informed of these modeling results and decisions should be arrived at on the basis of an understanding of inherent uncertainties. The results show that both GEP and ANN are equally credible selections and even MLR should not be dismissed, as it has its uses.

  12. Forecasting daily and monthly exchange rates with machine learning techniques

    OpenAIRE

    Papadimitriou, Theophilos; Gogas, Periklis; Plakandaras, Vasilios

    2013-01-01

    We combine signal processing to machine learning methodologies by introducing a hybrid Ensemble Empirical Mode Decomposition (EEMD), Multivariate Adaptive Regression Splines (MARS) and Support Vector Regression (SVR) model in order to forecast the monthly and daily Euro (EUR)/United States Dollar (USD), USD/Japanese Yen (JPY), Australian Dollar (AUD)/Norwegian Krone (NOK), New Zealand Dollar (NZD)/Brazilian Real (BRL) and South African Rand (ZAR)/Philippine Peso (PHP) exchange rates. After th...

  13. Kernel-based machine learning techniques for infrasound signal classification

    Science.gov (United States)

    Tuma, Matthias; Igel, Christian; Mialle, Pierrick

    2014-05-01

    Infrasound monitoring is one of four remote sensing technologies continuously employed by the CTBTO Preparatory Commission. The CTBTO's infrasound network is designed to monitor the Earth for potential evidence of atmospheric or shallow underground nuclear explosions. Upon completion, it will comprise 60 infrasound array stations distributed around the globe, of which 47 were certified in January 2014. Three stages can be identified in CTBTO infrasound data processing: automated processing at the level of single array stations, automated processing at the level of the overall global network, and interactive review by human analysts. At station level, the cross correlation-based PMCC algorithm is used for initial detection of coherent wavefronts. It produces estimates for trace velocity and azimuth of incoming wavefronts, as well as other descriptive features characterizing a signal. Detected arrivals are then categorized into potentially treaty-relevant versus noise-type signals by a rule-based expert system. This corresponds to a binary classification task at the level of station processing. In addition, incoming signals may be grouped according to their travel path in the atmosphere. The present work investigates automatic classification of infrasound arrivals by kernel-based pattern recognition methods. It aims to explore the potential of state-of-the-art machine learning methods vis-a-vis the current rule-based and task-tailored expert system. To this purpose, we first address the compilation of a representative, labeled reference benchmark dataset as a prerequisite for both classifier training and evaluation. Data representation is based on features extracted by the CTBTO's PMCC algorithm. As classifiers, we employ support vector machines (SVMs) in a supervised learning setting. Different SVM kernel functions are used and adapted through different hyperparameter optimization routines. The resulting performance is compared to several baseline classifiers. All

  14. Analysis Of Machine Learning Techniques By Using Blogger Data

    Directory of Open Access Journals (Sweden)

    Gowsalya.R,

    2014-04-01

    Full Text Available Blogs are the recent fast progressing media which depends on information system and technological advancement. The mass media is not much developed for the developing countries are in government terms and their schemes are developed based on governmental concepts, so blogs are provided for knowledge and ideas sharing. This article has highlighted and performed simulations from obtained information, 100 instances of Bloggers by using Weka 3. 6 Tool, and by applying many machine learning algorithms and analyzed with the values of accuracy, precision, recall and F-measure for getting future tendency anticipation of users to blogging and using in strategical areas. Keywords -

  15. MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques

    Directory of Open Access Journals (Sweden)

    Cerqueira Fabio R

    2012-10-01

    Full Text Available Abstract Background The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. Results Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. Conclusion Our approach not only enhances the computational performance, and

  16. A comparison of machine learning techniques for predicting downstream acid mine drainage

    CSIR Research Space (South Africa)

    van Zyl, TL

    2014-07-01

    Full Text Available Canadian Symposium on Remote Sensing (IGARSS) 2014, Quebec, Canada, 13-18 July 2014 A comparison of machine learning techniques for predicting downstream acid mine drainage Terence L van Zyl EOSIT, Meraka Institute, CSIR, Pretoria, South Africa...

  17. Use of machine learning techniques for modeling of snow depth

    Directory of Open Access Journals (Sweden)

    G. V. Ayzel

    2017-01-01

    Full Text Available Snow exerts significant regulating effect on the land hydrological cycle since it controls intensity of heat and water exchange between the soil-vegetative cover and the atmosphere. Estimating of a spring flood runoff or a rain-flood on mountainous rivers requires understanding of the snow cover dynamics on a watershed. In our work, solving a problem of the snow cover depth modeling is based on both available databases of hydro-meteorological observations and easily accessible scientific software that allows complete reproduction of investigation results and further development of this theme by scientific community. In this research we used the daily observational data on the snow cover and surface meteorological parameters, obtained at three stations situated in different geographical regions: Col de Porte (France, Sodankyla (Finland, and Snoquamie Pass (USA.Statistical modeling of the snow cover depth is based on a complex of freely distributed the present-day machine learning models: Decision Trees, Adaptive Boosting, Gradient Boosting. It is demonstrated that use of combination of modern machine learning methods with available meteorological data provides the good accuracy of the snow cover modeling. The best results of snow cover depth modeling for every investigated site were obtained by the ensemble method of gradient boosting above decision trees – this model reproduces well both, the periods of snow cover accumulation and its melting. The purposeful character of learning process for models of the gradient boosting type, their ensemble character, and use of combined redundancy of a test sample in learning procedure makes this type of models a good and sustainable research tool. The results obtained can be used for estimating the snow cover characteristics for river basins where hydro-meteorological information is absent or insufficient.

  18. Machine Learning Techniques for Prediction of Early Childhood Obesity.

    Science.gov (United States)

    Dugan, T M; Mukhopadhyay, S; Carroll, A; Downs, S

    2015-01-01

    This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two.

  19. The impact of machine learning techniques in the study of bipolar disorder: A systematic review.

    Science.gov (United States)

    Librenza-Garcia, Diego; Kotzian, Bruno Jaskulski; Yang, Jessica; Mwangi, Benson; Cao, Bo; Pereira Lima, Luiza Nunes; Bermudez, Mariane Bagatin; Boeira, Manuela Vianna; Kapczinski, Flávio; Passos, Ives Cavalcante

    2017-07-18

    Machine learning techniques provide new methods to predict diagnosis and clinical outcomes at an individual level. We aim to review the existing literature on the use of machine learning techniques in the assessment of subjects with bipolar disorder. We systematically searched PubMed, Embase and Web of Science for articles published in any language up to January 2017. We found 757 abstracts and included 51 studies in our review. Most of the included studies used multiple levels of biological data to distinguish the diagnosis of bipolar disorder from other psychiatric disorders or healthy controls. We also found studies that assessed the prediction of clinical outcomes and studies using unsupervised machine learning to build more consistent clinical phenotypes of bipolar disorder. We concluded that given the clinical heterogeneity of samples of patients with BD, machine learning techniques may provide clinicians and researchers with important insights in fields such as diagnosis, personalized treatment and prognosis orientation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques

    Science.gov (United States)

    Tout, Hicham

    2013-01-01

    The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning…

  1. Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques

    Science.gov (United States)

    Tout, Hicham

    2013-01-01

    The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning…

  2. FRC Separatrix inference using machine-learning techniques

    Science.gov (United States)

    Romero, Jesus; Roche, Thomas; the TAE Team

    2016-10-01

    As Field Reversed Configuration (FRC) devices approach lifetimes exceeding the characteristic time of conductive structures external to the plasma, plasma stabilization cannot be achieved solely by the flux conserving effect of the external structures, and active control systems are then necessary. An essential component of such control systems is a reconstruction method for the plasma separatrix suitable for real time. We report on a method to infer the separatrix in an FRC using the information of magnetic probes located externally to the plasma. The method uses machine learning methods, namely Bayesian inference of Gaussian Processes, to obtain the most likely plasma current density distribution given the measurements of magnetic field external to the plasma. From the current sources, flux function and in particular separatrix are easily computed. The reconstruction method is non iterative and hence suitable for deterministic real time applications. Validation results with numerical simulations and application to separatrix inference of C-2U plasma discharges will be presented.

  3. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  4. Machine learning techniques for gait biometric recognition using the ground reaction force

    CERN Document Server

    Mason, James Eric; Woungang, Isaac

    2016-01-01

    This book focuses on how machine learning techniques can be used to analyze and make use of one particular category of behavioral biometrics known as the gait biometric. A comprehensive Ground Reaction Force (GRF)-based Gait Biometrics Recognition framework is proposed and validated by experiments. In addition, an in-depth analysis of existing recognition techniques that are best suited for performing footstep GRF-based person recognition is also proposed, as well as a comparison of feature extractors, normalizers, and classifiers configurations that were never directly compared with one another in any previous GRF recognition research. Finally, a detailed theoretical overview of many existing machine learning techniques is presented, leading to a proposal of two novel data processing techniques developed specifically for the purpose of gait biometric recognition using GRF. This book · introduces novel machine-learning-based temporal normalization techniques · bridges research gaps concerning the effect of ...

  5. Arabic Keyphrase Extraction using Linguistic knowledge and Machine Learning Techniques

    CERN Document Server

    El-shishtawy, Tarek

    2012-01-01

    In this paper, a supervised learning technique for extracting keyphrases of Arabic documents is presented. The extractor is supplied with linguistic knowledge to enhance its efficiency instead of relying only on statistical information such as term frequency and distance. During analysis, an annotated Arabic corpus is used to extract the required lexical features of the document words. The knowledge also includes syntactic rules based on part of speech tags and allowed word sequences to extract the candidate keyphrases. In this work, the abstract form of Arabic words is used instead of its stem form to represent the candidate terms. The Abstract form hides most of the inflections found in Arabic words. The paper introduces new features of keyphrases based on linguistic knowledge, to capture titles and subtitles of a document. A simple ANOVA test is used to evaluate the validity of selected features. Then, the learning model is built using the LDA - Linear Discriminant Analysis - and training documents. Althou...

  6. Resistance gene identification from Larimichthys crocea with machine learning techniques

    Science.gov (United States)

    Cai, Yinyin; Liao, Zhijun; Ju, Ying; Liu, Juan; Mao, Yong; Liu, Xiangrong

    2016-12-01

    The research on resistance genes (R-gene) plays a vital role in bioinformatics as it has the capability of coping with adverse changes in the external environment, which can form the corresponding resistance protein by transcription and translation. It is meaningful to identify and predict R-gene of Larimichthys crocea (L.Crocea). It is friendly for breeding and the marine environment as well. Large amounts of L.Crocea’s immune mechanisms have been explored by biological methods. However, much about them is still unclear. In order to break the limited understanding of the L.Crocea’s immune mechanisms and to detect new R-gene and R-gene-like genes, this paper came up with a more useful combination prediction method, which is to extract and classify the feature of available genomic data by machine learning. The effectiveness of feature extraction and classification methods to identify potential novel R-gene was evaluated, and different statistical analyzes were utilized to explore the reliability of prediction method, which can help us further understand the immune mechanisms of L.Crocea against pathogens. In this paper, a webserver called LCRG-Pred is available at http://server.malab.cn/rg_lc/.

  7. Reconstructing muscle activation during normal walking: a comparison of symbolic and connectionist machine learning techniques

    NARCIS (Netherlands)

    Heller, Ben W.; Veltink, Peter H.; Rijkhoff, Nico J.M.; Rutten, Wim L.C.; Andrews, Brian J.

    1993-01-01

    One symbolic (rule-based inductive learning) and one connectionist (neural network) machine learning technique were used to reconstruct muscle activation patterns from kinematic data measured during normal human walking at several speeds. The activation patterns (or desired outputs) consisted of sur

  8. Reconstructing muscle activation during normal walking: a comparison of symbolic and connectionist machine learning techniques

    NARCIS (Netherlands)

    Heller, Ben W.; Veltink, Petrus H.; Rijkhoff, N.J.M.; Rijkhoff, Nico J.M.; Rutten, Wim; Andrews, Brian J.

    1993-01-01

    One symbolic (rule-based inductive learning) and one connectionist (neural network) machine learning technique were used to reconstruct muscle activation patterns from kinematic data measured during normal human walking at several speeds. The activation patterns (or desired outputs) consisted of

  9. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  10. DIAGNOSIS OF DIABETIC RETINOPATHY USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    R. Priya

    2013-07-01

    Full Text Available Diabetic retinopathy (DR is an eye disease caused by the complication of diabetes and we should detect it early for effective treatment. As diabetes progresses, the vision of a patient may start to deteriorate and lead to diabetic retinopathy. As a result, two groups were identified, namely non-proliferative diabetic retinopathy (NPDR and proliferative diabetic retinopathy (PDR. In this paper, to diagnose diabetic retinopathy, three models like Probabilistic Neural network (PNN, Bayesian Classification and Support vector machine (SVM are described and their performances are compared. The amount of the disease spread in the retina can be identified by extracting the features of the retina. The features like blood vessels, haemmoraghes of NPDR image and exudates of PDR image are extracted from the raw images using the image processing techniques and fed to the classifier for classification. A total of 350 fundus images were used, out of which 100 were used for training and 250 images were used for testing. Experimental results show that PNN has an accuracy of 89.6 % Bayes Classifier has an accuracy of 94.4% and SVM has an accuracy of 97.6%. This infers that the SVM model outperforms all other models. Also our system is also run on 130 images available from “DIARETDB0: Evaluation Database and Methodology for Diabetic Retinopathy” and the results show that PNN has an accuracy of 87.69% Bayes Classifier has an accuracy of 90.76% and SVM has an accuracy of 95.38%.

  11. Locomotion training of legged robots using hybrid machine learning techniques

    Science.gov (United States)

    Simon, William E.; Doerschuk, Peggy I.; Zhang, Wen-Ran; Li, Andrew L.

    1995-01-01

    In this study artificial neural networks and fuzzy logic are used to control the jumping behavior of a three-link uniped robot. The biped locomotion control problem is an increment of the uniped locomotion control. Study of legged locomotion dynamics indicates that a hierarchical controller is required to control the behavior of a legged robot. A structured control strategy is suggested which includes navigator, motion planner, biped coordinator and uniped controllers. A three-link uniped robot simulation is developed to be used as the plant. Neurocontrollers were trained both online and offline. In the case of on-line training, a reinforcement learning technique was used to train the neurocontroller to make the robot jump to a specified height. After several hundred iterations of training, the plant output achieved an accuracy of 7.4%. However, when jump distance and body angular momentum were also included in the control objectives, training time became impractically long. In the case of off-line training, a three-layered backpropagation (BP) network was first used with three inputs, three outputs and 15 to 40 hidden nodes. Pre-generated data were presented to the network with a learning rate as low as 0.003 in order to reach convergence. The low learning rate required for convergence resulted in a very slow training process which took weeks to learn 460 examples. After training, performance of the neurocontroller was rather poor. Consequently, the BP network was replaced by a Cerebeller Model Articulation Controller (CMAC) network. Subsequent experiments described in this document show that the CMAC network is more suitable to the solution of uniped locomotion control problems in terms of both learning efficiency and performance. A new approach is introduced in this report, viz., a self-organizing multiagent cerebeller model for fuzzy-neural control of uniped locomotion is suggested to improve training efficiency. This is currently being evaluated for a possible

  12. Machine and deep learning techniques in heavy-ion collisions with ALICE arXiv

    CERN Document Server

    INSPIRE-00382877

    Over the last years, machine learning tools have been successfully applied to a wealth of problems in high-energy physics. A typical example is the classification of physics objects. Supervised machine learning methods allow for significant improvements in classification problems by taking into account observable correlations and by learning the optimal selection from examples, e.g. from Monte Carlo simulations. Even more promising is the usage of deep learning techniques. Methods like deep convolutional networks might be able to catch features from low-level parameters that are not exploited by default cut-based methods. These ideas could be particularly beneficial for measurements in heavy-ion collisions, because of the very large multiplicities. Indeed, machine learning methods potentially perform much better in systems with a large number of degrees of freedom compared to cut-based methods. Moreover, many key heavy-ion observables are most interesting at low transverse momentum where the underlying event ...

  13. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques.

    Science.gov (United States)

    Wang, Guanjin; Lam, Kin-Man; Deng, Zhaohong; Choi, Kup-Sze

    2015-08-01

    Bladder cancer is a common cancer in genitourinary malignancy. For muscle invasive bladder cancer, surgical removal of the bladder, i.e. radical cystectomy, is in general the definitive treatment which, unfortunately, carries significant morbidities and mortalities. Accurate prediction of the mortality of radical cystectomy is therefore needed. Statistical methods have conventionally been used for this purpose, despite the complex interactions of high-dimensional medical data. Machine learning has emerged as a promising technique for handling high-dimensional data, with increasing application in clinical decision support, e.g. cancer prediction and prognosis. Its ability to reveal the hidden nonlinear interactions and interpretable rules between dependent and independent variables is favorable for constructing models of effective generalization performance. In this paper, seven machine learning methods are utilized to predict the 5-year mortality of radical cystectomy, including back-propagation neural network (BPN), radial basis function (RBFN), extreme learning machine (ELM), regularized ELM (RELM), support vector machine (SVM), naive Bayes (NB) classifier and k-nearest neighbour (KNN), on a clinicopathological dataset of 117 patients of the urology unit of a hospital in Hong Kong. The experimental results indicate that RELM achieved the highest average prediction accuracy of 0.8 at a fast learning speed. The research findings demonstrate the potential of applying machine learning techniques to support clinical decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. A novel method to estimate model uncertainty using machine learning techniques

    NARCIS (Netherlands)

    Solomatine, D.P.; Lal Shrestha, D.

    2009-01-01

    A novel method is presented for model uncertainty estimation using machine learning techniques and its application in rainfall runoff modeling. In this method, first, the probability distribution of the model error is estimated separately for different hydrological situations and second, the

  15. Exploring Machine Learning Techniques Using Patient Interactions in Online Health Forums to Classify Drug Safety

    Science.gov (United States)

    Chee, Brant Wah Kwong

    2011-01-01

    This dissertation explores the use of personal health messages collected from online message forums to predict drug safety using natural language processing and machine learning techniques. Drug safety is defined as any drug with an active safety alert from the US Food and Drug Administration (FDA). It is believed that this is the first…

  16. Exploring Machine Learning Techniques Using Patient Interactions in Online Health Forums to Classify Drug Safety

    Science.gov (United States)

    Chee, Brant Wah Kwong

    2011-01-01

    This dissertation explores the use of personal health messages collected from online message forums to predict drug safety using natural language processing and machine learning techniques. Drug safety is defined as any drug with an active safety alert from the US Food and Drug Administration (FDA). It is believed that this is the first…

  17. Machine learning in image steganalysis

    CERN Document Server

    Schaathun, Hans Georg

    2012-01-01

    "The only book to look at steganalysis from the perspective of machine learning theory, and to apply the common technique of machine learning to the particular field of steganalysis; ideal for people working in both disciplines"--

  18. Classification of the Regional Ionospheric Disturbance Based on Machine Learning Techniques

    Science.gov (United States)

    Terzi, Merve Begum; Arikan, Orhan; Karatay, Secil; Arikan, Feza; Gulyaeva, Tamara

    2016-08-01

    In this study, Total Electron Content (TEC) estimated from GPS receivers is used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. For the automated classification of regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. Performance of developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing developed classification technique to Global Ionospheric Map (GIM) TEC data, which is provided by the NASA Jet Propulsion Laboratory (JPL), it is shown that SVM can be a suitable learning method to detect anomalies in TEC variations.

  19. Down syndrome detection from facial photographs using machine learning techniques

    Science.gov (United States)

    Zhao, Qian; Rosenbaum, Kenneth; Sze, Raymond; Zand, Dina; Summar, Marshall; Linguraru, Marius George

    2013-02-01

    Down syndrome is the most commonly occurring chromosomal condition; one in every 691 babies in United States is born with it. Patients with Down syndrome have an increased risk for heart defects, respiratory and hearing problems and the early detection of the syndrome is fundamental for managing the disease. Clinically, facial appearance is an important indicator in diagnosing Down syndrome and it paves the way for computer-aided diagnosis based on facial image analysis. In this study, we propose a novel method to detect Down syndrome using photography for computer-assisted image-based facial dysmorphology. Geometric features based on facial anatomical landmarks, local texture features based on the Contourlet transform and local binary pattern are investigated to represent facial characteristics. Then a support vector machine classifier is used to discriminate normal and abnormal cases; accuracy, precision and recall are used to evaluate the method. The comparison among the geometric, local texture and combined features was performed using the leave-one-out validation. Our method achieved 97.92% accuracy with high precision and recall for the combined features; the detection results were higher than using only geometric or texture features. The promising results indicate that our method has the potential for automated assessment for Down syndrome from simple, noninvasive imaging data.

  20. Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.

    Directory of Open Access Journals (Sweden)

    Akshay Amolik

    2015-12-01

    Full Text Available Sentiment analysis is basically concerned with analysis of emotions and opinions from text. We can refer sentiment analysis as opinion mining. Sentiment analysis finds and justifies the sentiment of the person with respect to a given source of content. Social media contain huge amount of the sentiment data in the form of tweets, blogs, and updates on the status, posts, etc. Sentiment analysis of this largely generated data is very useful to express the opinion of the mass. Twitter sentiment analysis is tricky as compared to broad sentiment analysis because of the slang words and misspellings and repeated characters. We know that the maximum length of each tweet in Twitter is 140 characters. So it is very important to identify correct sentiment of each word. In our project we are proposing a highly accurate model of sentiment analysis of tweets with respect to latest reviews of upcoming Bollywood or Hollywood movies. With the help of feature vector and classifiers such as Support vector machine and Naïve Bayes, we are correctly classifying these tweets as positive, negative and neutral to give sentiment of each tweet.

  1. State of the Art Review for Applying Computational Intelligence and Machine Learning Techniques to Portfolio Optimisation

    CERN Document Server

    Hurwitz, Evan

    2009-01-01

    Computational techniques have shown much promise in the field of Finance, owing to their ability to extract sense out of dauntingly complex systems. This paper reviews the most promising of these techniques, from traditional computational intelligence methods to their machine learning siblings, with particular view to their application in optimising the management of a portfolio of financial instruments. The current state of the art is assessed, and prospective further work is assessed and recommended

  2. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review.

    Science.gov (United States)

    Dallora, Ana Luiza; Eivazzadeh, Shahryar; Mendes, Emilia; Berglund, Johan; Anderberg, Peter

    2017-01-01

    Dementia is a complex disorder characterized by poor outcomes for the patients and high costs of care. After decades of research little is known about its mechanisms. Having prognostic estimates about dementia can help researchers, patients and public entities in dealing with this disorder. Thus, health data, machine learning and microsimulation techniques could be employed in developing prognostic estimates for dementia. The goal of this paper is to present evidence on the state of the art of studies investigating and the prognosis of dementia using machine learning and microsimulation techniques. To achieve our goal we carried out a systematic literature review, in which three large databases-Pubmed, Socups and Web of Science were searched to select studies that employed machine learning or microsimulation techniques for the prognosis of dementia. A single backward snowballing was done to identify further studies. A quality checklist was also employed to assess the quality of the evidence presented by the selected studies, and low quality studies were removed. Finally, data from the final set of studies were extracted in summary tables. In total 37 papers were included. The data summary results showed that the current research is focused on the investigation of the patients with mild cognitive impairment that will evolve to Alzheimer's disease, using machine learning techniques. Microsimulation studies were concerned with cost estimation and had a populational focus. Neuroimaging was the most commonly used variable. Prediction of conversion from MCI to AD is the dominant theme in the selected studies. Most studies used ML techniques on Neuroimaging data. Only a few data sources have been recruited by most studies and the ADNI database is the one most commonly used. Only two studies have investigated the prediction of epidemiological aspects of Dementia using either ML or MS techniques. Finally, care should be taken when interpreting the reported accuracy of ML

  3. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    CERN Document Server

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  4. Remotely sensed data assimilation technique to develop machine learning models for use in water management

    Science.gov (United States)

    Zaman, Bushra

    Increasing population and water conflicts are making water management one of the most important issues of the present world. It has become absolutely necessary to find ways to manage water more efficiently. Technological advancement has introduced various techniques for data acquisition and analysis, and these tools can be used to address some of the critical issues that challenge water resource management. This research used learning machine techniques and information acquired through remote sensing, to solve problems related to soil moisture estimation and crop identification on large spatial scales. In this dissertation, solutions were proposed in three problem areas that can be important in the decision making process related to water management in irrigated systems. A data assimilation technique was used to build a learning machine model that generated soil moisture estimates commensurate with the scale of the data. The research was taken further by developing a multivariate machine learning algorithm to predict root zone soil moisture both in space and time. Further, a model was developed for supervised classification of multi-spectral reflectance data using a multi-class machine learning algorithm. The procedure was designed for classifying crops but the model is data dependent and can be used with other datasets and hence can be applied to other landcover classification problems. The dissertation compared the performance of relevance vector and the support vector machines in estimating soil moisture. A multivariate relevance vector machine algorithm was tested in the spatio-temporal prediction of soil moisture, and the multi-class relevance vector machine model was used for classifying different crop types. It was concluded that the classification scheme may uncover important data patterns contributing greatly to knowledge bases, and to scientific and medical research. The results for the soil moisture models would give a rough idea to farmers

  5. ISOLATED SPEECH RECOGNITION SYSTEM FOR TAMIL LANGUAGE USING STATISTICAL PATTERN MATCHING AND MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    VIMALA C.

    2015-05-01

    Full Text Available In recent years, speech technology has become a vital part of our daily lives. Various techniques have been proposed for developing Automatic Speech Recognition (ASR system and have achieved great success in many applications. Among them, Template Matching techniques like Dynamic Time Warping (DTW, Statistical Pattern Matching techniques such as Hidden Markov Model (HMM and Gaussian Mixture Models (GMM, Machine Learning techniques such as Neural Networks (NN, Support Vector Machine (SVM, and Decision Trees (DT are most popular. The main objective of this paper is to design and develop a speaker-independent isolated speech recognition system for Tamil language using the above speech recognition techniques. The background of ASR system, the steps involved in ASR, merits and demerits of the conventional and machine learning algorithms and the observations made based on the experiments are presented in this paper. For the above developed system, highest word recognition accuracy is achieved with HMM technique. It offered 100% accuracy during training process and 97.92% for testing process.

  6. Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners.

    Science.gov (United States)

    Monaghan, Jessica J M; Goehring, Tobias; Yang, Xin; Bolner, Federico; Wang, Shangqiguo; Wright, Matthew C M; Bleeck, Stefan

    2017-03-01

    Machine-learning based approaches to speech enhancement have recently shown great promise for improving speech intelligibility for hearing-impaired listeners. Here, the performance of three machine-learning algorithms and one classical algorithm, Wiener filtering, was compared. Two algorithms based on neural networks were examined, one using a previously reported feature set and one using a feature set derived from an auditory model. The third machine-learning approach was a dictionary-based sparse-coding algorithm. Speech intelligibility and quality scores were obtained for participants with mild-to-moderate hearing impairments listening to sentences in speech-shaped noise and multi-talker babble following processing with the algorithms. Intelligibility and quality scores were significantly improved by each of the three machine-learning approaches, but not by the classical approach. The largest improvements for both speech intelligibility and quality were found by implementing a neural network using the feature set based on auditory modeling. Furthermore, neural network based techniques appeared more promising than dictionary-based, sparse coding in terms of performance and ease of implementation.

  7. Retrieval of Similar Objects in Simulation Data Using Machine Learning Techniques

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E; Cheung, S-C; Kamath, C

    2003-06-19

    Comparing the output of a physics simulation with an experiment is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. This paper describes our early experiences with this problem by considering the slightly simpler problem of finding objects in a image that are similar to a given query object. Focusing on a dataset from a fluid mixing problem, we report on our experiments using classification techniques from machine learning to retrieve the objects of interest in the simulation data. The early results reported in this paper suggest that machine learning techniques can retrieve more objects that are similar to the query than distance-based similarity methods.

  8. Hybrid Machine Learning Technique for Forecasting Dhaka Stock Market Timing Decisions

    OpenAIRE

    Shipra Banik; Khodadad Khan, A. F. M.; Mohammad Anwer

    2014-01-01

    Forecasting stock market has been a difficult job for applied researchers owing to nature of facts which is very noisy and time varying. However, this hypothesis has been featured by several empirical experiential studies and a number of researchers have efficiently applied machine learning techniques to forecast stock market. This paper studied stock prediction for the use of investors. It is always true that investors typically obtain loss because of uncertain investment purposes and unsigh...

  9. An Empirical Study of Machine Learning Techniques for Classifying Emotional States from EEG Data

    OpenAIRE

    2012-01-01

    With the great advancement in robot technology, smart human-robot interaction is considered to be the most wanted success by the researchers these days. If a robot can identify emotions and intentions of a human interacting with it, that would make robots more useful. Electroencephalography (EEG) is considered one effective way of recording emotions and motivations of a human using brain. Various machine learning techniques are used successfully to classify EEG data accurately. K-Nearest Neig...

  10. Machine Learning Techniques for Optical Performance Monitoring from Directly Detected PDM-QAM Signals

    DEFF Research Database (Denmark)

    Thrane, Jakob; Wass, Jesper; Piels, Molly

    2017-01-01

    Linear signal processing algorithms are effective in dealing with linear transmission channel and linear signal detection, while the nonlinear signal processing algorithms, from the machine learning community, are effective in dealing with nonlinear transmission channel and nonlinear signal...... detection. In this paper, a brief overview of the various machine learning methods and their application in optical communication is presented and discussed. Moreover, supervised machine learning methods, such as neural networks and support vector machine, are experimentally demonstrated for in-band optical...

  11. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  12. Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

    Science.gov (United States)

    Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

    2017-07-01

    Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.

  13. Machine learning techniques for astrophysical modelling and photometric redshift estimation of quasars in optical sky surveys

    CERN Document Server

    Kumar, N Daniel

    2008-01-01

    Machine learning techniques are utilised in several areas of astrophysical research today. This dissertation addresses the application of ML techniques to two classes of problems in astrophysics, namely, the analysis of individual astronomical phenomena over time and the automated, simultaneous analysis of thousands of objects in large optical sky surveys. Specifically investigated are (1) techniques to approximate the precise orbits of the satellites of Jupiter and Saturn given Earth-based observations as well as (2) techniques to quickly estimate the distances of quasars observed in the Sloan Digital Sky Survey. Learning methods considered include genetic algorithms, particle swarm optimisation, artificial neural networks, and radial basis function networks. The first part of this dissertation demonstrates that GAs and PSOs can both be efficiently used to model functions that are highly non-linear in several dimensions. It is subsequently demonstrated in the second part that ANNs and RBFNs can be used as ef...

  14. Developing Fire Detection Algorithms by Geostationary Orbiting Platforms and Machine Learning Techniques

    Science.gov (United States)

    Salvador, Pablo; Sanz, Julia; Garcia, Miguel; Casanova, Jose Luis

    2016-08-01

    Fires in general and forest fires specific are a major concern in terms of economical and biological loses. Remote sensing technologies have been focusing on developing several algorithms, adapted to a large kind of sensors, platforms and regions in order to obtain hotspots as faster as possible. The aim of this study is to establish an automatic methodology to develop hotspots detection algorithms with Spinning Enhanced Visible and Infrared Imager (SEVIRI) sensor on board Meteosat Second Generation platform (MSG) based on machine learning techniques that can be exportable to others geostationary platforms and sensors and to any area of the Earth. The sensitivity (SE), specificity (SP) and accuracy (AC) parameters have been analyzed in order to develop the final machine learning algorithm taking into account the preferences and final use of the predicted data.

  15. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop comp

  16. Prediction of activity type in preschool children using machine learning techniques.

    Science.gov (United States)

    Hagenbuchner, Markus; Cliff, Dylan P; Trost, Stewart G; Van Tuc, Nguyen; Peoples, Gregory E

    2015-07-01

    Recent research has shown that machine learning techniques can accurately predict activity classes from accelerometer data in adolescents and adults. The purpose of this study is to develop and test machine learning models for predicting activity type in preschool-aged children. Participants completed 12 standardised activity trials (TV, reading, tablet game, quiet play, art, treasure hunt, cleaning up, active game, obstacle course, bicycle riding) over two laboratory visits. Eleven children aged 3-6 years (mean age=4.8±0.87; 55% girls) completed the activity trials while wearing an ActiGraph GT3X+ accelerometer on the right hip. Activities were categorised into five activity classes: sedentary activities, light activities, moderate to vigorous activities, walking, and running. A standard feed-forward Artificial Neural Network and a Deep Learning Ensemble Network were trained on features in the accelerometer data used in previous investigations (10th, 25th, 50th, 75th and 90th percentiles and the lag-one autocorrelation). Overall recognition accuracy for the standard feed forward Artificial Neural Network was 69.7%. Recognition accuracy for sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running was 82%, 79%, 64%, 36% and 46%, respectively. In comparison, overall recognition accuracy for the Deep Learning Ensemble Network was 82.6%. For sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running recognition accuracy was 84%, 91%, 79%, 73% and 73%, respectively. Ensemble machine learning approaches such as Deep Learning Ensemble Network can accurately predict activity type from accelerometer data in preschool children. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  17. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  18. Classification of Cytochrome P450 1A2 Inhibitors and Non-Inhibitors by Machine Learning Techniques

    DEFF Research Database (Denmark)

    Vasanthanathan, Poongavanam; Taboureau, Olivier; Oostenbrink, Chris

    2009-01-01

    of CYP1A2 inhibitors and non-inhibitors. Training and test sets consisted of about 400 and 7000 compounds, respectively. Various machine learning techniques, like binary QSAR, support vector machine (SVM), random forest, kappa nearest neighbors (kNN), and decision tree methods were used to develop...

  19. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  20. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    Science.gov (United States)

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  1. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    Science.gov (United States)

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  2. Machine Learning for Security

    CERN Document Server

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  3. GPR Signal Characterization for Automated Landmine and UXO Detection Based on Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Xavier Núñez-Nieto

    2014-10-01

    Full Text Available Landmine clearance is an ongoing problem that currently affects millions of people around the world. This study evaluates the effectiveness of ground penetrating radar (GPR in demining and unexploded ordnance detection using 2.3-GHz and 1-GHz high-frequency antennas. An automated detection tool based on machine learning techniques is also presented with the aim of automatically detecting underground explosive artifacts. A GPR survey was conducted on a designed scenario that included the most commonly buried items in historic battle fields, such as mines, projectiles and mortar grenades. The buried targets were identified using both frequencies, although the higher vertical resolution provided by the 2.3-GHz antenna allowed for better recognition of the reflection patterns. The targets were also detected automatically using machine learning techniques. Neural networks and logistic regression algorithms were shown to be able to discriminate between potential targets and clutter. The neural network had the most success, with accuracies ranging from 89% to 92% for the 1-GHz and 2.3-GHz antennas, respectively.

  4. A New Profile Learning Model for Recommendation System based on Machine Learning Technique

    Directory of Open Access Journals (Sweden)

    Shereen H. Ali

    2016-03-01

    Full Text Available Recommender systems (RSs have been used to successfully address the information overload problem by providing personalized and targeted recommendations to the end users. RSs are software tools and techniques providing suggestions for items to be of use to a user, hence, they typically apply techniques and methodologies from Data Mining. The main contribution of this paper is to introduce a new user profile learning model to promote the recommendation accuracy of vertical recommendation systems. The proposed profile learning model employs the vertical classifier that has been used in multi classification module of the Intelligent Adaptive Vertical Recommendation (IAVR system to discover the user’s area of interest, and then build the user’s profile accordingly. Experimental results have proven the effectiveness of the proposed profile learning model, which accordingly will promote the recommendation accuracy.

  5. Acronym Disambiguation in Spanish Electronic Health Narratives Using Machine Learning Techniques.

    Science.gov (United States)

    Rubio-López, Ignacio; Costumero, Roberto; Ambit, Héctor; Gonzalo-Martín, Consuelo; Menasalvas, Ernestina; Rodríguez González, Alejandro

    2017-01-01

    Electronic Health Records (EHRs) are now being massively used in hospitals what has motivated current developments of new methods to process clinical narratives (unstructured data) making it possible to perform context-based searches. Current approaches to process the unstructured texts in EHRs are based in applying text mining or natural language processing (NLP) techniques over the data. In particular Named Entity Recognition (NER) is of paramount importance to retrieve specific biomedical concepts from the text providing the semantic type of the concept retrieved. However, it is very common that clinical notes contain lots of acronyms that cannot be identified by NER processes and even if they are identified, an acronym may correspond to several meanings, so disambiguation of the found term is needed. In this work we provide an approach to perform acronym disambiguation in Spanish EHR using machine learning techniques.

  6. A hybrid stock trading framework integrating technical analysis with machine learning techniques

    Directory of Open Access Journals (Sweden)

    Rajashree Dash

    2016-03-01

    Full Text Available In this paper, a novel decision support system using a computational efficient functional link artificial neural network (CEFLANN and a set of rules is proposed to generate the trading decisions more effectively. Here the problem of stock trading decision prediction is articulated as a classification problem with three class values representing the buy, hold and sell signals. The CEFLANN network used in the decision support system produces a set of continuous trading signals within the range 0–1 by analyzing the nonlinear relationship exists between few popular technical indicators. Further the output trading signals are used to track the trend and to produce the trading decision based on that trend using some trading rules. The novelty of the approach is to engender the profitable stock trading decision points through integration of the learning ability of CEFLANN neural network with the technical analysis rules. For assessing the potential use of the proposed method, the model performance is also compared with some other machine learning techniques such as Support Vector Machine (SVM, Naive Bayesian model, K nearest neighbor model (KNN and Decision Tree (DT model.

  7. Markerless gating for lung cancer radiotherapy based on machine learning techniques

    Energy Technology Data Exchange (ETDEWEB)

    Lin Tong; Li Ruijiang; Tang Xiaoli; Jiang, Steve B [Department of Radiation Oncology, University of California San Diego, La Jolla, CA 92093 (United States); Dy, Jennifer G [Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 (United States)], E-mail: sbjiang@ucsd.edu

    2009-03-21

    In lung cancer radiotherapy, radiation to a mobile target can be delivered by respiratory gating, for which we need to know whether the target is inside or outside a predefined gating window at any time point during the treatment. This can be achieved by tracking one or more fiducial markers implanted inside or near the target, either fluoroscopically or electromagnetically. However, the clinical implementation of marker tracking is limited for lung cancer radiotherapy mainly due to the risk of pneumothorax. Therefore, gating without implanted fiducial markers is a promising clinical direction. We have developed several template-matching methods for fluoroscopic marker-less gating. Recently, we have modeled the gating problem as a binary pattern classification problem, in which principal component analysis (PCA) and support vector machine (SVM) are combined to perform the classification task. Following the same framework, we investigated different combinations of dimensionality reduction techniques (PCA and four nonlinear manifold learning methods) and two machine learning classification methods (artificial neural networks-ANN and SVM). Performance was evaluated on ten fluoroscopic image sequences of nine lung cancer patients. We found that among all combinations of dimensionality reduction techniques and classification methods, PCA combined with either ANN or SVM achieved a better performance than the other nonlinear manifold learning methods. ANN when combined with PCA achieves a better performance than SVM in terms of classification accuracy and recall rate, although the target coverage is similar for the two classification methods. Furthermore, the running time for both ANN and SVM with PCA is within tolerance for real-time applications. Overall, ANN combined with PCA is a better candidate than other combinations we investigated in this work for real-time gated radiotherapy.

  8. A First Look at creating mock catalogs with machine learning techniques

    CERN Document Server

    Xu, Xiaoying; Trac, Hy; Schneider, Jeff; Poczos, Barnabas; Ntampaka, Michelle

    2013-01-01

    We investigate machine learning (ML) techniques for predicting the number of galaxies (N_gal) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N_gal. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test 2 algorithms: support vector machines (SVM) and k-nearest-neighbour (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N_gal by training our algorithms on the following 6 halo properties: number of particles, M_200, \\sigma_v, v_max, half-mass radius and spin. For Millennium, our predicted N_gal values have a mea...

  9. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling

    Science.gov (United States)

    Goetz, J. N.; Brenning, A.; Petschko, H.; Leopold, P.

    2015-08-01

    Statistical and now machine learning prediction methods have been gaining popularity in the field of landslide susceptibility modeling. Particularly, these data driven approaches show promise when tackling the challenge of mapping landslide prone areas for large regions, which may not have sufficient geotechnical data to conduct physically-based methods. Currently, there is no best method for empirical susceptibility modeling. Therefore, this study presents a comparison of traditional statistical and novel machine learning models applied for regional scale landslide susceptibility modeling. These methods were evaluated by spatial k-fold cross-validation estimation of the predictive performance, assessment of variable importance for gaining insights into model behavior and by the appearance of the prediction (i.e. susceptibility) map. The modeling techniques applied were logistic regression (GLM), generalized additive models (GAM), weights of evidence (WOE), the support vector machine (SVM), random forest classification (RF), and bootstrap aggregated classification trees (bundling) with penalized discriminant analysis (BPLDA). These modeling methods were tested for three areas in the province of Lower Austria, Austria. The areas are characterized by different geological and morphological settings. Random forest and bundling classification techniques had the overall best predictive performances. However, the performances of all modeling techniques were for the majority not significantly different from each other; depending on the areas of interest, the overall median estimated area under the receiver operating characteristic curve (AUROC) differences ranged from 2.9 to 8.9 percentage points. The overall median estimated true positive rate (TPR) measured at a 10% false positive rate (FPR) differences ranged from 11 to 15pp. The relative importance of each predictor was generally different between the modeling methods. However, slope angle, surface roughness and plan

  10. Hybrid machine learning technique for forecasting Dhaka stock market timing decisions.

    Science.gov (United States)

    Banik, Shipra; Khodadad Khan, A F M; Anwer, Mohammad

    2014-01-01

    Forecasting stock market has been a difficult job for applied researchers owing to nature of facts which is very noisy and time varying. However, this hypothesis has been featured by several empirical experiential studies and a number of researchers have efficiently applied machine learning techniques to forecast stock market. This paper studied stock prediction for the use of investors. It is always true that investors typically obtain loss because of uncertain investment purposes and unsighted assets. This paper proposes a rough set model, a neural network model, and a hybrid neural network and rough set model to find optimal buy and sell of a share on Dhaka stock exchange. Investigational findings demonstrate that our proposed hybrid model has higher precision than the single rough set model and the neural network model. We believe this paper findings will help stock investors to decide about optimal buy and/or sell time on Dhaka stock exchange.

  11. Discussion on "Techniques for Massive-Data Machine Learning in Astronomy" by A. Gray

    CERN Document Server

    Ball, Nicholas M

    2011-01-01

    Astronomy is increasingly encountering two fundamental truths: (1) The field is faced with the task of extracting useful information from extremely large, complex, and high dimensional datasets; (2) The techniques of astroinformatics and astrostatistics are the only way to make this tractable, and bring the required level of sophistication to the analysis. Thus, an approach which provides these tools in a way that scales to these datasets is not just desirable, it is vital. The expertise required spans not just astronomy, but also computer science, statistics, and informatics. As a computer scientist and expert in machine learning, Alex's contribution of expertise and a large number of fast algorithms designed to scale to large datasets, is extremely welcome. We focus in this discussion on the questions raised by the practical application of these algorithms to real astronomical datasets. That is, what is needed to maximally leverage their potential to improve the science return? This is not a trivial task. W...

  12. Fast supersymmetry phenomenology at the Large Hadron Collider using machine learning techniques

    CERN Document Server

    Buckley, A; White, M J

    2011-01-01

    A pressing problem for supersymmetry (SUSY) phenomenologists is how to incorporate Large Hadron Collider search results into parameter fits designed to measure or constrain the SUSY parameters. Owing to the computational expense of fully simulating lots of points in a generic SUSY space to aid the calculation of the likelihoods, the limits published by experimental collaborations are frequently interpreted in slices of reduced parameter spaces. For example, both ATLAS and CMS have presented results in the Constrained Minimal Supersymmetric Model (CMSSM) by fixing two of four parameters, and generating a coarse grid in the remaining two. We demonstrate that by generating a grid in the full space of the CMSSM, one can interpolate between the output of an LHC detector simulation using machine learning techniques, thus obtaining a superfast likelihood calculator for LHC-based SUSY parameter fits. We further investigate how much training data is required to obtain usable results, finding that approximately 2000 po...

  13. Quantum-state anomaly detection for arbitrary errors using a machine-learning technique

    Science.gov (United States)

    Hara, Satoshi; Ono, Takafumi; Okamoto, Ryo; Washio, Takashi; Takeuchi, Shigeki

    2016-10-01

    The accurate detection of small deviations in given density matrice is important for quantum information processing, which is a difficult task because of the intrinsic fluctuation in density matrices reconstructed using a limited number of experiments. We previously proposed a method for decoherence error detection using a machine-learning technique [S. Hara, T. Ono, R. Okamoto, T. Washio, and S. Takeuchi, Phys. Rev. A 89, 022104 (2014), 10.1103/PhysRevA.89.022104]. However, the previous method is not valid when the errors are just changes in phase. Here, we propose a method that is valid for arbitrary errors in density matrices. The performance of the proposed method is verified using both numerical simulation data and real experimental data.

  14. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  15. Influence of Heartwood on Wood Density and Pulp Properties Explained by Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Carla Iglesias

    2017-01-01

    Full Text Available The aim of this work is to develop a tool to predict some pulp properties e.g., pulp yield, Kappa number, ISO brightness (ISO 2470:2008, fiber length and fiber width, using the sapwood and heartwood proportion in the raw-material. For this purpose, Acacia melanoxylon trees were collected from four sites in Portugal. Percentage of sapwood and heartwood, area and the stem eccentricity (in N-S and E-W directions were measured on transversal stem sections of A. melanoxylon R. Br. The relative position of the samples with respect to the total tree height was also considered as an input variable. Different configurations were tested until the maximum correlation coefficient was achieved. A classical mathematical technique (multiple linear regression and machine learning methods (classification and regression trees, multi-layer perceptron and support vector machines were tested. Classification and regression trees (CART was the most accurate model for the prediction of pulp ISO brightness (R = 0.85. The other parameters could be predicted with fair results (R = 0.64–0.75 by CART. Hence, the proportion of heartwood and sapwood is a relevant parameter for pulping and pulp properties, and should be taken as a quality trait when assessing a pulpwood resource.

  16. A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Xu Xiaoying; Ho, Shirley; Trac, Hy; Schneider, Jeff; Ntampaka, Michelle [McWilliams Center for Cosmology, Department of Physics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (United States); Poczos, Barnabas [School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (United States)

    2013-08-01

    We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.

  17. A First Look at Creating Mock Catalogs with Machine Learning Techniques

    Science.gov (United States)

    Xu, Xiaoying; Ho, Shirley; Trac, Hy; Schneider, Jeff; Poczos, Barnabas; Ntampaka, Michelle

    2013-08-01

    We investigate machine learning (ML) techniques for predicting the number of galaxies (N gal) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N gal. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N gal by training our algorithms on the following six halo properties: number of particles, M 200, σ v , v max, half-mass radius, and spin. For Millennium, our predicted N gal values have a mean-squared error (MSE) of ~0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to ~5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N gal. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M star, low M star). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.

  18. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    Science.gov (United States)

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2016-07-19

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning.

  19. Multivariate Time Series Forecasting of Crude Palm Oil Price Using Machine Learning Techniques

    Science.gov (United States)

    Kanchymalay, Kasturi; Salim, N.; Sukprasert, Anupong; Krishnan, Ramesh; Raba'ah Hashim, Ummi

    2017-08-01

    The aim of this paper was to study the correlation between crude palm oil (CPO) price, selected vegetable oil prices (such as soybean oil, coconut oil, and olive oil, rapeseed oil and sunflower oil), crude oil and the monthly exchange rate. Comparative analysis was then performed on CPO price forecasting results using the machine learning techniques. Monthly CPO prices, selected vegetable oil prices, crude oil prices and monthly exchange rate data from January 1987 to February 2017 were utilized. Preliminary analysis showed a positive and high correlation between the CPO price and soy bean oil price and also between CPO price and crude oil price. Experiments were conducted using multi-layer perception, support vector regression and Holt Winter exponential smoothing techniques. The results were assessed by using criteria of root mean square error (RMSE), means absolute error (MAE), means absolute percentage error (MAPE) and Direction of accuracy (DA). Among these three techniques, support vector regression(SVR) with Sequential minimal optimization (SMO) algorithm showed relatively better results compared to multi-layer perceptron and Holt Winters exponential smoothing method.

  20. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  1. Current breathomics-a review on data pre-processing techniques and machine learning in metabolomics breath analysis

    DEFF Research Database (Denmark)

    Smolinska, A.; Hauschild, A. C.; Fijten, R. R. R.

    2014-01-01

    been extensively developed. Yet, the application of machine learning methods for fingerprinting VOC profiles in the breathomics is still in its infancy. Therefore, in this paper, we describe the current state of the art in data pre-processing and multivariate analysis of breathomics data. We start...... different conditions (e.g. disease stage, treatment). Independently of the utilized analytical method, the most important question, 'which VOCs are discriminatory?', remains the same. Answers can be given by several modern machine learning techniques (multivariate statistics) and, therefore, are the focus...

  2. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  3. Analysed potential of big data and supervised machine learning techniques in effectively forecasting travel times from fused data

    Directory of Open Access Journals (Sweden)

    Ivana Šemanjski

    2015-12-01

    Full Text Available Travel time forecasting is an interesting topic for many ITS services. Increased availability of data collection sensors increases the availability of the predictor variables but also highlights the high processing issues related to this big data availability. In this paper we aimed to analyse the potential of big data and supervised machine learning techniques in effectively forecasting travel times. For this purpose we used fused data from three data sources (Global Positioning System vehicles tracks, road network infrastructure data and meteorological data and four machine learning techniques (k-nearest neighbours, support vector machines, boosting trees and random forest. To evaluate the forecasting results we compared them in-between different road classes in the context of absolute values, measured in minutes, and the mean squared percentage error. For the road classes with the high average speed and long road segments, machine learning techniques forecasted travel times with small relative error, while for the road classes with the small average speeds and segment lengths this was a more demanding task. All three data sources were proven itself to have a high impact on the travel time forecast accuracy and the best results (taking into account all road classes were achieved for the k-nearest neighbours and random forest techniques.

  4. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  5. Machine learning techniques in disease forecasting: a case study on rice blast prediction

    Directory of Open Access Journals (Sweden)

    Kapoor Amar S

    2006-11-01

    Full Text Available Abstract Background Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases. Results Six significant weather variables were selected as predictor variables. Two series of models (cross-location and cross-year were developed and validated using a five-fold cross validation procedure. For cross-year models, the conventional multiple regression (REG approach achieved an average correlation coefficient (r of 0.50, which increased to 0.60 and percent mean absolute error (%MAE decreased from 65.42 to 52.24 when back-propagation neural network (BPNN was used. With generalized regression neural network (GRNN, the r increased to 0.70 and %MAE also improved to 46.30, which further increased to r = 0.77 and %MAE = 36.66 when support vector machine (SVM based method was used. Similarly, cross-location validation achieved r = 0.48, 0.56 and 0.66 using REG, BPNN and GRNN respectively, with their corresponding %MAE as 77.54, 66.11 and 58.26. The SVM-based method outperformed all the three approaches by further increasing r to 0.74 with improvement in %MAE to 44.12. Overall, this SVM-based prediction approach will open new vistas in the area of forecasting plant diseases of various crops. Conclusion Our case study demonstrated that SVM is better than existing machine learning techniques and conventional REG approaches in forecasting plant diseases. In this direction, we have also

  6. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. (©)RSNA, 2017.

  7. Phase segmentation of X-ray computer tomography rock images using machine learning techniques: an accuracy and performance study

    Science.gov (United States)

    Chauhan, Swarup; Rühaak, Wolfram; Anbergen, Hauke; Kabdenov, Alen; Freise, Marcus; Wille, Thorsten; Sass, Ingo

    2016-07-01

    Performance and accuracy of machine learning techniques to segment rock grains, matrix and pore voxels from a 3-D volume of X-ray tomographic (XCT) grayscale rock images was evaluated. The segmentation and classification capability of unsupervised (k-means, fuzzy c-means, self-organized maps), supervised (artificial neural networks, least-squares support vector machines) and ensemble classifiers (bragging and boosting) were tested using XCT images of andesite volcanic rock, Berea sandstone, Rotliegend sandstone and a synthetic sample. The averaged porosity obtained for andesite (15.8 ± 2.5 %), Berea sandstone (16.3 ± 2.6 %), Rotliegend sandstone (13.4 ± 7.4 %) and the synthetic sample (48.3 ± 13.3 %) is in very good agreement with the respective laboratory measurement data and varies by a factor of 0.2. The k-means algorithm is the fastest of all machine learning algorithms, whereas a least-squares support vector machine is the most computationally expensive. Metrics entropy, purity, mean square root error, receiver operational characteristic curve and 10 K-fold cross-validation were used to determine the accuracy of unsupervised, supervised and ensemble classifier techniques. In general, the accuracy was found to be largely affected by the feature vector selection scheme. As it is always a trade-off between performance and accuracy, it is difficult to isolate one particular machine learning algorithm which is best suited for the complex phase segmentation problem. Therefore, our investigation provides parameters that can help in selecting the appropriate machine learning techniques for phase segmentation.

  8. A methodology for automated CPA extraction using liver biopsy image analysis and machine learning techniques.

    Science.gov (United States)

    Tsipouras, Markos G; Giannakeas, Nikolaos; Tzallas, Alexandros T; Tsianou, Zoe E; Manousou, Pinelopi; Hall, Andrew; Tsoulos, Ioannis; Tsianos, Epameinondas

    2017-03-01

    Collagen proportional area (CPA) extraction in liver biopsy images provides the degree of fibrosis expansion in liver tissue, which is the most characteristic histological alteration in hepatitis C virus (HCV). Assessment of the fibrotic tissue is currently based on semiquantitative staging scores such as Ishak and Metavir. Since its introduction as a fibrotic tissue assessment technique, CPA calculation based on image analysis techniques has proven to be more accurate than semiquantitative scores. However, CPA has yet to reach everyday clinical practice, since the lack of standardized and robust methods for computerized image analysis for CPA assessment have proven to be a major limitation. The current work introduces a three-stage fully automated methodology for CPA extraction based on machine learning techniques. Specifically, clustering algorithms have been employed for background-tissue separation, as well as for fibrosis detection in liver tissue regions, in the first and the third stage of the methodology, respectively. Due to the existence of several types of tissue regions in the image (such as blood clots, muscle tissue, structural collagen, etc.), classification algorithms have been employed to identify liver tissue regions and exclude all other non-liver tissue regions from CPA computation. For the evaluation of the methodology, 79 liver biopsy images have been employed, obtaining 1.31% mean absolute CPA error, with 0.923 concordance correlation coefficient. The proposed methodology is designed to (i) avoid manual threshold-based and region selection processes, widely used in similar approaches presented in the literature, and (ii) minimize CPA calculation time. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  10. Taxi Time Prediction at Charlotte Airport Using Fast-Time Simulation and Machine Learning Techniques

    Science.gov (United States)

    Lee, Hanbong

    2016-01-01

    Accurate taxi time prediction is required for enabling efficient runway scheduling that can increase runway throughput and reduce taxi times and fuel consumptions on the airport surface. Currently NASA and American Airlines are jointly developing a decision-support tool called Spot and Runway Departure Advisor (SARDA) that assists airport ramp controllers to make gate pushback decisions and improve the overall efficiency of airport surface traffic. In this presentation, we propose to use Linear Optimized Sequencing (LINOS), a discrete-event fast-time simulation tool, to predict taxi times and provide the estimates to the runway scheduler in real-time airport operations. To assess its prediction accuracy, we also introduce a data-driven analytical method using machine learning techniques. These two taxi time prediction methods are evaluated with actual taxi time data obtained from the SARDA human-in-the-loop (HITL) simulation for Charlotte Douglas International Airport (CLT) using various performance measurement metrics. Based on the taxi time prediction results, we also discuss how the prediction accuracy can be affected by the operational complexity at this airport and how we can improve the fast time simulation model before implementing it with an airport scheduling algorithm in a real-time environment.

  11. Evaluating machine-learning techniques for recruitment forecasting of seven North East Atlantic fish species

    KAUST Repository

    Fernandes, José Antonio

    2015-01-01

    The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenario exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitment of seven fish species of North East Atlantic (anchovy, sardine, mackerel, horse mackerel, hake, blue whiting and albacore), using spawning, environmental and climatic data. In addition, the use of the probabilistic flexible naive Bayes classifier (FNBC) is proposed as modelling approach in order to reduce uncertainty for fisheries management purposes. Those improvements aim is to improve probability estimations of each possible outcome (low, medium and high recruitment) based in kernel density estimation, which is crucial for informed management decision making with high uncertainty. Finally, a comparison between goodness-of-fit and generalization power is provided, in order to assess the reliability of the final forecasting models. It is found that in most cases the proposed methodology provides useful information for management whereas the case of horse mackerel is an example of the limitations of the approach. The proposed improvements allow for a better probabilistic estimation of the different scenarios, i.e. to reduce the uncertainty in the provided forecasts.

  12. Predicting Software Faults in Large Space Systems using Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Bhekisipho Twala

    2011-07-01

    Full Text Available Recently, the use of machine learning (ML algorithms has proven to be of great practical value in solving a variety of engineering problems including the prediction of failure, fault, and defect-proneness as the space system software becomes complex. One of the most active areas of recent research in ML has been the use of ensemble classifiers. How ML techniques (or classifiers could be used to predict software faults in space systems, including many aerospace systems is shown, and further use ensemble individual classifiers by having them vote for the most popular class to improve system software fault-proneness prediction. Benchmarking results on four NASA public datasets show the Naive Bayes classifier as more robust software fault prediction while most ensembles with a decision tree classifier as one of its components achieve higher accuracy rates.Defence Science Journal, 2011, 61(4, pp.306-316, DOI:http://dx.doi.org/10.14429/dsj.61.1088

  13. Machine Learning Techniques Applied to Sensor Data Correction in Building Technologies

    Energy Technology Data Exchange (ETDEWEB)

    Smith, Matt K [ORNL; Castello, Charles C [ORNL; New, Joshua Ryan [ORNL

    2013-01-01

    Since commercial and residential buildings account for nearly half of the United States' energy consumption, making them more energy-efficient is a vital part of the nation's overall energy strategy. Sensors play an important role in this research by collecting data needed to analyze performance of components, systems, and whole-buildings. Given this reliance on sensors, ensuring that sensor data are valid is a crucial problem. Solutions being researched are machine learning techniques, namely: artificial neural networks and Bayesian Networks. Types of data investigated in this study are: (1) temperature; (2) humidity; (3) refrigerator energy consumption; (4) heat pump liquid pressure; and (5) water flow. These data are taken from Oak Ridge National Laboratory's (ORNL) ZEBRAlliance research project which is composed of four single-family homes in Oak Ridge, TN. Results show that for the temperature, humidity, pressure, and flow sensors, data can mostly be predicted with root-mean-square error (RMSE) of less than 10% of the respective sensor's mean value. Results for the energy sensor are not as good; RMSE are centered about 100% of the mean value and are often well above 200%. Bayesian networks have RSME of less than 5% of the respective sensor's mean value, but took substantially longer to train.

  14. Estimating gypsum equirement under no-till based on machine learning technique

    Directory of Open Access Journals (Sweden)

    Alaine Margarete Guimarães

    Full Text Available Chemical stratification occurs under no-till systems, including pH, considering that higher levels are formed from the soil surface towards the deeper layers. The subsoil acidity is a limiting factor of the yield. Gypsum has been suggested when subsoil acidity limits the crops root growth, i.e., when the calcium (Ca level is low and/or the aluminum (Al level is toxic in the subsoil layers. However, there are doubts about the more efficient methods to estimate the gypsum requirement. This study was carried out to develop numerical models to estimate the gypsum requirement in soils under no-till system by the use of Machine Learning techniques. Computational analyses of the dataset were made applying the M5'Rules algorithm, based on regression models. The dataset comprised of soil chemical properties collected from experiments under no-till that received gypsum rates on the soil surface, throughout eight years after the application, in Southern Brazil. The results showed that the numerical models generated by rule induction M5'Rules algorithm were positively useful contributing for estimate the gypsum requirements under no-till. The models showed that Ca saturation in the effective cation exchange capacity (ECEC was a more important attribute than Al saturation to estimate gypsum requirement in no-till soils.

  15. Modelling and analysing track cycling Omnium performances using statistical and machine learning techniques.

    Science.gov (United States)

    Ofoghi, Bahadorreza; Zeleznikow, John; Dwyer, Dan; Macmahon, Clare

    2013-01-01

    This article describes the utilisation of an unsupervised machine learning technique and statistical approaches (e.g., the Kolmogorov-Smirnov test) that assist cycling experts in the crucial decision-making processes for athlete selection, training, and strategic planning in the track cycling Omnium. The Omnium is a multi-event competition that will be included in the summer Olympic Games for the first time in 2012. Presently, selectors and cycling coaches make decisions based on experience and intuition. They rarely have access to objective data. We analysed both the old five-event (first raced internationally in 2007) and new six-event (first raced internationally in 2011) Omniums and found that the addition of the elimination race component to the Omnium has, contrary to expectations, not favoured track endurance riders. We analysed the Omnium data and also determined the inter-relationships between different individual events as well as between those events and the final standings of riders. In further analysis, we found that there is no maximum ranking (poorest performance) in each individual event that riders can afford whilst still winning a medal. We also found the required times for riders to finish the timed components that are necessary for medal winning. The results of this study consider the scoring system of the Omnium and inform decision-making toward successful participation in future major Omnium competitions.

  16. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  17. Accuracy comparison among different machine learning techniques for detecting malicious codes

    Science.gov (United States)

    Narang, Komal

    2016-03-01

    In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.

  18. Analysis and design of machine learning techniques evolutionary solutions for regression, prediction, and control problems

    CERN Document Server

    Stalph, Patrick

    2014-01-01

    Manipulating or grasping objects seems like a trivial task for humans, as these are motor skills of everyday life. Nevertheless, motor skills are not easy to learn for humans and this is also an active research topic in robotics. However, most solutions are optimized for industrial applications and, thus, few are plausible explanations for human learning. The fundamental challenge, that motivates Patrick Stalph, originates from the cognitive science: How do humans learn their motor skills? The author makes a connection between robotics and cognitive sciences by analyzing motor skill learning using implementations that could be found in the human brain – at least to some extent. Therefore three suitable machine learning algorithms are selected – algorithms that are plausible from a cognitive viewpoint and feasible for the roboticist. The power and scalability of those algorithms is evaluated in theoretical simulations and more realistic scenarios with the iCub humanoid robot. Convincing results confirm the...

  19. Forecasting the NOK/USD Exchange Rate with Machine Learning Techniques

    OpenAIRE

    Theophilos Papadimitriou; Periklis Gogas; Vasilios Plakandaras

    2013-01-01

    In this paper, we approximate the empirical findings of Papadamou and Markopoulos (2012) on the NOK/USD exchange rate under a Machine Learning (ML) framework. By applying Support Vector Regression (SVR) on a general monetary exchange rate model and a Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS) to extract model structure, we test for the validity of popular monetary exchange rate models. We reach to mixed results since the coefficient sign of interest rate differential is in favor o...

  20. Application of Machine Learning Techniques for Amplitude and Phase Noise Characterization

    DEFF Research Database (Denmark)

    Zibar, Darko; de Carvalho, Luis Henrique Hecker; Piels, Molly

    2015-01-01

    In this paper, tools from machine learning community, such as Bayesian filtering and expectation maximization parameter estimation, are presented and employed for laser amplitude and phase noise characterization. We show that phase noise estimation based on Bayesian filtering outperforms...... conventional time-domain approach in the presence of moderate measurement noise. Additionally, carrier synchronization based on Bayesian filtering, in combination with expectation maximization, is demonstrated for the first time experimentally....

  1. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  2. Machine Learning Techniques for the Detection of Shockable Rhythms in Automated External Defibrillators

    Science.gov (United States)

    Irusta, Unai; Morgado, Eduardo; Aramendi, Elisabete; Ayala, Unai; Wik, Lars; Kramer-Johansen, Jo; Eftestøl, Trygve; Alonso-Atienza, Felipe

    2016-01-01

    Early recognition of ventricular fibrillation (VF) and electrical therapy are key for the survival of out-of-hospital cardiac arrest (OHCA) patients treated with automated external defibrillators (AED). AED algorithms for VF-detection are customarily assessed using Holter recordings from public electrocardiogram (ECG) databases, which may be different from the ECG seen during OHCA events. This study evaluates VF-detection using data from both OHCA patients and public Holter recordings. ECG-segments of 4-s and 8-s duration were analyzed. For each segment 30 features were computed and fed to state of the art machine learning (ML) algorithms. ML-algorithms with built-in feature selection capabilities were used to determine the optimal feature subsets for both databases. Patient-wise bootstrap techniques were used to evaluate algorithm performance in terms of sensitivity (Se), specificity (Sp) and balanced error rate (BER). Performance was significantly better for public data with a mean Se of 96.6%, Sp of 98.8% and BER 2.2% compared to a mean Se of 94.7%, Sp of 96.5% and BER 4.4% for OHCA data. OHCA data required two times more features than the data from public databases for an accurate detection (6 vs 3). No significant differences in performance were found for different segment lengths, the BER differences were below 0.5-points in all cases. Our results show that VF-detection is more challenging for OHCA data than for data from public databases, and that accurate VF-detection is possible with segments as short as 4-s. PMID:27441719

  3. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  4. Applying a Machine Learning Technique to Classification of Japanese Pressure Patterns

    Directory of Open Access Journals (Sweden)

    H Kimura

    2009-04-01

    Full Text Available In climate research, pressure patterns are often very important. When a climatologists need to know the days of a specific pressure pattern, for example "low pressure in Western areas of Japan and high pressure in Eastern areas of Japan (Japanese winter-type weather," they have to visually check a huge number of surface weather charts. To overcome this problem, we propose an automatic classification system using a support vector machine (SVM, which is a machine-learning method. We attempted to classify pressure patterns into two classes: "winter type" and "non-winter type". For both training datasets and test datasets, we used the JRA-25 dataset from 1981 to 2000. An experimental evaluation showed that our method obtained a greater than 0.8 F-measure. We noted that variations in results were based on differences in training datasets.

  5. The application of machine learning techniques as an adjunct to clinical decision making in alcohol dependence treatment.

    Science.gov (United States)

    Connor, J P; Symons, M; Feeney, G F X; Young, R McD; Wiles, J

    2007-01-01

    With few exceptions, research in the addictive sciences has relied on linear statistics and methodologies. Addiction involves a complex array of nonlinear behaviors. This study applies two machine learning techniques, Bayesian and decision tree classifiers, in the assessment of outcome of an alcohol dependence treatment program. These nonlinear approaches are compared to a standard linear analysis. Seventy-three alcohol-dependent subjects undertaking a 12-week cognitive-behavioral therapy (CBT) program and 66 subjects undertaking an identical program but also prescribed the relapse prevention agent Acamprosate were employed in this study. Demographic, alcohol use, dependence severity, craving, health-related quality of life, and psychological measures at baseline were used to predict abstinence at 12 weeks. Decision trees had a 77% predictive accuracy across both data sets, Bayesian networks 73%, and discriminant analysis 42%. Combined with clinical experience, machine learning approaches offer promise in understanding the complex relationships that underlie treatment outcome for abstinence-based alcohol treatment programs.

  6. Recognition of Mould Colony on Unhulled Paddy Based on Computer Vision using Conventional Machine-learning and Deep Learning Techniques

    Science.gov (United States)

    Sun, Ke; Wang, Zhengjie; Tu, Kang; Wang, Shaojin; Pan, Leiqing

    2016-11-01

    To investigate the potential of conventional and deep learning techniques to recognize the species and distribution of mould in unhulled paddy, samples were inoculated and cultivated with five species of mould, and sample images were captured. The mould recognition methods were built using support vector machine (SVM), back-propagation neural network (BPNN), convolutional neural network (CNN), and deep belief network (DBN) models. An accuracy rate of 100% was achieved by using the DBN model to identify the mould species in the sample images based on selected colour-histogram parameters, followed by the SVM and BPNN models. A pitch segmentation recognition method combined with different classification models was developed to recognize the mould colony areas in the image. The accuracy rates of the SVM and CNN models for pitch classification were approximately 90% and were higher than those of the BPNN and DBN models. The CNN and DBN models showed quicker calculation speeds for recognizing all of the pitches segmented from a single sample image. Finally, an efficient uniform CNN pitch classification model for all five types of sample images was built. This work compares multiple classification models and provides feasible recognition methods for mouldy unhulled paddy recognition.

  7. mlpy: Machine Learning Python

    CERN Document Server

    Albanese, Davide; Merler, Stefano; Riccadonna, Samantha; Jurman, Giuseppe; Furlanello, Cesare

    2012-01-01

    mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu.

  8. mlpy: Machine Learning Python

    OpenAIRE

    Albanese, Davide; Visintainer, Roberto; Merler, Stefano; Riccadonna, Samantha; Jurman, Giuseppe; Furlanello, Cesare

    2012-01-01

    mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu.

  9. Machine learning for medical images analysis.

    Science.gov (United States)

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods.

  10. Applying machine learning and image feature extraction techniques to the problem of cerebral aneurysm rupture

    Directory of Open Access Journals (Sweden)

    Steren Chabert

    2017-01-01

    Full Text Available Cerebral aneurysm is a cerebrovascular disorder characterized by a bulging in a weak area in the wall of an artery that supplies blood to the brain. It is relevant to understand the mechanisms leading to the apparition of aneurysms, their growth and, more important, leading to their rupture. The purpose of this study is to study the impact on aneurysm rupture of the combination of different parameters, instead of focusing on only one factor at a time as is frequently found in the literature, using machine learning and feature extraction techniques. This discussion takes relevance in the context of the complex decision that the physicians have to take to decide which therapy to apply, as each intervention bares its own risks, and implies to use a complex ensemble of resources (human resources, OR, etc. in hospitals always under very high work load. This project has been raised in our actual working team, composed of interventional neuroradiologist, radiologic technologist, informatics engineers and biomedical engineers, from Valparaiso public Hospital, Hospital Carlos van Buren, and from Universidad de Valparaíso – Facultad de Ingeniería and Facultad de Medicina. This team has been working together in the last few years, and is now participating in the implementation of an “interdisciplinary platform for innovation in health”, as part of a bigger project leaded by Universidad de Valparaiso (PMI UVA1402. It is relevant to emphasize that this project is made feasible by the existence of this network between physicians and engineers, and by the existence of data already registered in an orderly manner, structured and recorded in digital format. The present proposal arises from the description in nowadays literature that the actual indicators, whether based on morphological description of the aneurysm, or based on characterization of biomechanical factor or others, these indicators were shown not to provide sufficient information in order

  11. Higgs Machine Learning Challenge 2014

    CERN Multimedia

    Olivier, A-P; Bourdarios, C ; LAL / Orsay; Goldfarb, S ; University of Michigan

    2014-01-01

    High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics. We are now exploring new ways to improve the cross-fertilization of the two fields by setting up a data challenge, following the footsteps of, among others, the astrophysics community (dark matter and galaxy zoo challenges) and neurobiology (connectomics and decoding the human brain). The organization committee consists of ATLAS physicists and machine learning researchers. The Challenge will run from Monday 12th to September 2014.

  12. Machine Learning Exciton Dynamics

    CERN Document Server

    Häse, Florian; Pyzer-Knapp, Edward; Aspuru-Guzik, Alán

    2015-01-01

    Obtaining the exciton dynamics of large photosynthetic complexes by using mixed quantum mechanics/molecular mechanics (QM/MM) is computationally demanding. We propose a machine learning technique, multi-layer perceptrons, as a tool to reduce the time required to compute excited state energies. With this approach we predict time-dependent density functional theory (TDDFT) excited state energies of bacteriochlorophylls in the Fenna-Matthews-Olson (FMO) complex. Additionally we compute spectral densities and exciton populations from the predictions. Different methods to determine multi-layer perceptron training sets are introduced, leading to several initial data selections. In addition, we compute spectral densities and exciton populations. Once multi-layer perceptrons are trained, predicting excited state energies was found to be significantly faster than the corresponding QM/MM calculations. We showed that multi-layer perceptrons can successfully reproduce the energies of QM/MM calculations to a high degree o...

  13. Machine Learning Techniques for Single Nucleotide Polymorphism—Disease Classification Models in Schizophrenia

    Directory of Open Access Journals (Sweden)

    Cristian R. Munteanu

    2010-07-01

    Full Text Available Single nucleotide polymorphisms (SNPs can be used as inputs in disease computational studies such as pattern searching and classification models. Schizophrenia is an example of a complex disease with an important social impact. The multiple causes of this disease create the need of new genetic or proteomic patterns that can diagnose patients using biological information. This work presents a computational study of disease machine learning classification models using only single nucleotide polymorphisms at the HTR2A and DRD3 genes from Galician (Northwest Spain schizophrenic patients. These classification models establish for the first time, to the best knowledge of the authors, a relationship between the sequence of the nucleic acid molecule and schizophrenia (Quantitative Genotype – Disease Relationships that can automatically recognize schizophrenia DNA sequences and correctly classify between 78.3–93.8% of schizophrenia subjects when using datasets which include simulated negative subjects and a linear artificial neural network.

  14. From analog timers to the era of machine learning: The case of the transient hot-wire technique

    Science.gov (United States)

    Assael, Yannis M.; Antoniadis, Konstantinos D.; Assael, Marc J.

    2017-07-01

    In this work, we demonstrate how interdisciplinary knowledge can provide solutions to elusive challenges and advance science. As an example, we used the application of the THW in the measurement of the thermal conductivity of solids. To obtain a solution of the equations by FEM, about 10 h were required. By employing tools from the field of machine learning and computer science like a) automating the manual pipeline using a custom framework, b) using efficiently, Bayesian Optimisation to estimate the optimal thermal properties value, and c) applying further task specific optimisations, this time was reduced to 3 min, which is acceptable, and thus the technique can be easier used.

  15. Application of Machine Learning Techniques to High-Dimensional Clinical Data to Forecast Postoperative Complications.

    Directory of Open Access Journals (Sweden)

    Paul Thottakkara

    Full Text Available To compare performance of risk prediction models for forecasting postoperative sepsis and acute kidney injury.Retrospective single center cohort study of adult surgical patients admitted between 2000 and 2010.50,318 adult patients undergoing major surgery.We evaluated the performance of logistic regression, generalized additive models, naïve Bayes and support vector machines for forecasting postoperative sepsis and acute kidney injury. We assessed the impact of feature reduction techniques on predictive performance. Model performance was determined using the area under the receiver operating characteristic curve, accuracy, and positive predicted value. The results were reported based on a 70/30 cross validation procedure where the data were randomly split into 70% used for training the model and the 30% for validation.The areas under the receiver operating characteristic curve for different models ranged between 0.797 and 0.858 for acute kidney injury and between 0.757 and 0.909 for severe sepsis. Logistic regression, generalized additive model, and support vector machines had better performance compared to Naïve Bayes model. Generalized additive models additionally accounted for non-linearity of continuous clinical variables as depicted in their risk patterns plots. Reducing the input feature space with LASSO had minimal effect on prediction performance, while feature extraction using principal component analysis improved performance of the models.Generalized additive models and support vector machines had good performance as risk prediction model for postoperative sepsis and AKI. Feature extraction using principal component analysis improved the predictive performance of all models.

  16. Estimating Fractional Shrub Cover Using Simulated EnMAP Data: A Comparison of Three Machine Learning Regression Techniques

    Directory of Open Access Journals (Sweden)

    Marcel Schwieder

    2014-04-01

    Full Text Available Anthropogenic interventions in natural and semi-natural ecosystems often lead to substantial changes in their functioning and may ultimately threaten ecosystem service provision. It is, therefore, necessary to monitor these changes in order to understand their impacts and to support management decisions that help ensuring sustainability. Remote sensing has proven to be a valuable tool for these purposes, and especially hyperspectral sensors are expected to provide valuable data for quantitative characterization of land change processes. In this study, simulated EnMAP data were used for mapping shrub cover fractions along a gradient of shrub encroachment, in a study region in southern Portugal. We compared three machine learning regression techniques: Support Vector Regression (SVR; Random Forest Regression (RF; and Partial Least Squares Regression (PLSR. Additionally, we compared the influence of training sample size on the prediction performance. All techniques showed reasonably good results when trained with large samples, while SVR always outperformed the other algorithms. The best model was applied to produce a fractional shrub cover map for the whole study area. The predicted patterns revealed a gradient of shrub cover between regions affected by special agricultural management schemes for nature protection and areas without land use incentives. Our results highlight the value of EnMAP data in combination with machine learning regression techniques for monitoring gradual land change processes.

  17. Machine learning phases of matter

    Science.gov (United States)

    Carrasquilla, Juan; Melko, Roger G.

    2017-02-01

    Condensed-matter physics is the study of the collective behaviour of infinitely complex assemblies of electrons, nuclei, magnetic moments, atoms or qubits. This complexity is reflected in the size of the state space, which grows exponentially with the number of particles, reminiscent of the `curse of dimensionality' commonly encountered in machine learning. Despite this curse, the machine learning community has developed techniques with remarkable abilities to recognize, classify, and characterize complex sets of data. Here, we show that modern machine learning architectures, such as fully connected and convolutional neural networks, can identify phases and phase transitions in a variety of condensed-matter Hamiltonians. Readily programmable through modern software libraries, neural networks can be trained to detect multiple types of order parameter, as well as highly non-trivial states with no conventional order, directly from raw state configurations sampled with Monte Carlo.

  18. Machine Learning with Distances

    Science.gov (United States)

    2015-02-16

    and demonstrated their usefulness in experiments. 1 Introduction The goal of machine learning is to find useful knowledge behind data. Many machine...212, 172]. However, direct divergence approximators still suffer from the curse of dimensionality. A possible cure for this problem is to combine them...obtain the global optimal solution or even a good local solution without any prior knowledge . For this reason, we decided to introduce the unit-norm

  19. Automatic segmentation of airway tree based on local intensity filter and machine learning technique in 3D chest CT volume.

    Science.gov (United States)

    Meng, Qier; Kitasaka, Takayuki; Nimura, Yukitaka; Oda, Masahiro; Ueno, Junji; Mori, Kensaku

    2017-02-01

    Airway segmentation plays an important role in analyzing chest computed tomography (CT) volumes for computerized lung cancer detection, emphysema diagnosis and pre- and intra-operative bronchoscope navigation. However, obtaining a complete 3D airway tree structure from a CT volume is quite a challenging task. Several researchers have proposed automated airway segmentation algorithms basically based on region growing and machine learning techniques. However, these methods fail to detect the peripheral bronchial branches, which results in a large amount of leakage. This paper presents a novel approach for more accurate extraction of the complex airway tree. This proposed segmentation method is composed of three steps. First, Hessian analysis is utilized to enhance the tube-like structure in CT volumes; then, an adaptive multiscale cavity enhancement filter is employed to detect the cavity-like structure with different radii. In the second step, support vector machine learning will be utilized to remove the false positive (FP) regions from the result obtained in the previous step. Finally, the graph-cut algorithm is used to refine the candidate voxels to form an integrated airway tree. A test dataset including 50 standard-dose chest CT volumes was used for evaluating our proposed method. The average extraction rate was about 79.1 % with the significantly decreased FP rate. A new method of airway segmentation based on local intensity structure and machine learning technique was developed. The method was shown to be feasible for airway segmentation in a computer-aided diagnosis system for a lung and bronchoscope guidance system.

  20. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  1. Machine Learning Markets

    CERN Document Server

    Storkey, Amos

    2011-01-01

    Prediction markets show considerable promise for developing flexible mechanisms for machine learning. Here, machine learning markets for multivariate systems are defined, and a utility-based framework is established for their analysis. This differs from the usual approach of defining static betting functions. It is shown that such markets can implement model combination methods used in machine learning, such as product of expert and mixture of expert approaches as equilibrium pricing models, by varying agent utility functions. They can also implement models composed of local potentials, and message passing methods. Prediction markets also allow for more flexible combinations, by combining multiple different utility functions. Conversely, the market mechanisms implement inference in the relevant probabilistic models. This means that market mechanism can be utilized for implementing parallelized model building and inference for probabilistic modelling.

  2. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.

  3. Clojure for machine learning

    CERN Document Server

    Wali, Akhil

    2014-01-01

    A book that brings out the strengths of Clojure programming that have to facilitate machine learning. Each topic is described in substantial detail, and examples and libraries in Clojure are also demonstrated.This book is intended for Clojure developers who want to explore the area of machine learning. Basic understanding of the Clojure programming language is required, but thorough acquaintance with the standard Clojure library or any libraries are not required. Familiarity with theoretical concepts and notation of mathematics and statistics would be an added advantage.

  4. Multivariate Cross-Classification: Applying machine learning techniques to characterize abstraction in neural representations

    Directory of Open Access Journals (Sweden)

    Jonas eKaplan

    2015-03-01

    Full Text Available Here we highlight an emerging trend in the use of machine learning classifiers to test for abstraction across patterns of neural activity. When a classifier algorithm is trained on data from one cognitive context, and tested on data from another, conclusions can be drawn about the role of a given brain region in representing information that abstracts across those cognitive contexts. We call this kind of analysis Multivariate Cross-Classification (MVCC, and review several domains where it has recently made an impact. MVCC has been important in establishing correspondences among neural patterns across cognitive domains, including motor-perception matching and cross-sensory matching. It has been used to test for similarity between neural patterns evoked by perception and those generated from memory. Other work has used MVCC to investigate the similarity of representations for semantic categories across different kinds of stimulus presentation, and in the presence of different cognitive demands. We use these examples to demonstrate the power of MVCC as a tool for investigating neural abstraction and discuss some important methodological issues related to its application.

  5. Hand Gesture recognition and classification by Discriminant and Principal Component Analysis using Machine Learning techniques

    Directory of Open Access Journals (Sweden)

    Sauvik Das Gupta

    2012-12-01

    Full Text Available This paper deals with the recognition of different hand gestures through machine learning approaches and principal component analysis. A Bio-Medical signal amplifier is built after doing a software simulation with the help of NI Multisim. At first a couple of surface electrodes are used to obtain the Electro-Myo-Gram (EMG signals from the hands. These signals from the surface electrodes have to be amplified with the help of the Bio-Medical Signal amplifier. The Bio-Medical Signal amplifier used is basically an Instrumentation amplifier made with the help of IC AD 620.The output from the Instrumentation amplifier is then filtered with the help of a suitable Band-Pass Filter. The output from the Band Pass filter is then fed to an Analog to Digital Converter (ADC which in this case is the NI USB 6008.The data from the ADC is then fed into a suitable algorithm which helps in recognition of the different hand gestures. The algorithm analysis is done in MATLAB. The results shown in this paper show a close to One-hundred per cent (100% classification result for three given hand gestures.

  6. submitter Studies of CMS data access patterns with machine learning techniques

    CERN Document Server

    De Luca, Silvia

    This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy ove...

  7. Machine learning techniques in searches for tbar th in the h → bbar b decay channel

    Science.gov (United States)

    Santos, R.; Nguyen, M.; Webster, J.; Ryu, S.; Adelman, J.; Chekanov, S.; Zhou, J.

    2017-04-01

    Study of the production of pairs of top quarks in association with a Higgs boson is one of the primary goals of the Large Hadron Collider over the next decade, as measurements of this process may help us to understand whether the uniquely large mass of the top quark plays a special role in electroweak symmetry breaking. Higgs bosons decay predominantly to bbar b, yielding signatures for the signal that are similar to tbar t + jets with heavy flavor. Though particularly challenging to study due to the similar kinematics between signal and background events, such final states (tbar t bbar b) are an important channel for studying the top quark Yukawa coupling. This paper presents a systematic study of machine learning (ML) methods for detecting tbar th in the h → bbar b decay channel. Among the eight ML methods tested, we show that two models, extreme gradient boosted trees and neural network models, outperform alternative methods. We further study the effectiveness of ML algorithms by investigating the impact of feature set and data size, as well as the structure of the models. While extended feature set and larger training sets expectedly lead to improvement of performance, shallow models deliver comparable or better performance than their deeper counterparts. Our study suggests that ensembles of trees and neurons, not necessarily deep, work effectively for the problem of tbar th detection.

  8. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  9. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  10. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  11. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  12. Applied genetic programming and machine learning

    CERN Document Server

    Iba, Hitoshi; Paul, Topon Kumar

    2009-01-01

    What do financial data prediction, day-trading rule development, and bio-marker selection have in common? They are just a few of the tasks that could potentially be resolved with genetic programming and machine learning techniques. Written by leaders in this field, Applied Genetic Programming and Machine Learning delineates the extension of Genetic Programming (GP) for practical applications. Reflecting rapidly developing concepts and emerging paradigms, this book outlines how to use machine learning techniques, make learning operators that efficiently sample a search space, navigate the searc

  13. High Classification Rates for Continuous Cow Activity Recognition using Low-cost GPS Positioning Sensors and Standard Machine Learning Techniques

    DEFF Research Database (Denmark)

    Godsk, Torben; Kjærgaard, Mikkel Baun

    2011-01-01

    In precision livestock farming, spotting cows in need of extra attention due to health or welfare issues are essential, since the time a farmer can devote to each animal is decreasing due to growing herd sizes and increasing efficiency demands. Often, the symptoms of health and welfare state...... activities. By preprocessing the raw cow position data, we obtain high classification rates using standard machine learning techniques to recognize cow activities. Our objectives were to (i) determine to what degree it is possible to robustly recognize cow activities from GPS positioning data, using low......-cost GPS receivers; and (ii) determine which types of activities can be classified, and what robustness to expect within the different classes. To provide data for this study low-cost GPS receivers were mounted on 14 dairy cows on grass for a day while they were observed from a distance...

  14. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  15. Machine Learning at Scale

    OpenAIRE

    Izrailev, Sergei; Stanley, Jeremy M.

    2014-01-01

    It takes skill to build a meaningful predictive model even with the abundance of implementations of modern machine learning algorithms and readily available computing resources. Building a model becomes challenging if hundreds of terabytes of data need to be processed to produce the training data set. In a digital advertising technology setting, we are faced with the need to build thousands of such models that predict user behavior and power advertising campaigns in a 24/7 chaotic real-time p...

  16. Improving the Performance of Machine Learning Based Multi Attribute Face Recognition Algorithm Using Wavelet Based Image Decomposition Technique

    Directory of Open Access Journals (Sweden)

    S. Sakthivel

    2011-01-01

    Full Text Available Problem statement: Recognizing a face based attributes is an easy task for a human to perform; it is closely automated and requires little mental effort. A computer, on the other hand, has no innate ability to recognize a face or a facial feature and must be programmed with an algorithm to do so. Generally, to recognize a face, different kinds of the facial features were used separately or in a combined manner. In the previous work, we have developed a machine learning based multi attribute face recognition algorithm and evaluated it different set of weights to each input attribute and performance wise it is low compared to proposed wavelet decomposition technique. Approach: In this study, wavelet decomposition technique has been applied as a preprocessing technique to enhance the input face images in order to reduce the loss of classification performance due to changes in facial appearance. The Experiment was specifically designed to investigate the gain in robustness against illumination and facial expression changes. Results: In this study, a wavelet based image decomposition technique has been proposed to enhance the performance by 8.54 percent of the previously designed system. Conclusion: The proposed model has been tested on face images with difference in expression and illumination condition with a dataset obtained from face image databases from Olivetti Research Laboratory.

  17. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  18. International Conference on Extreme Learning Machine 2015

    CERN Document Server

    Mao, Kezhi; Wu, Jonathan; Lendasse, Amaury; ELM 2015; Theory, Algorithms and Applications (I); Theory, Algorithms and Applications (II)

    2016-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2015, which was held in Hangzhou, China, December 15-17, 2015. This conference brought together researchers and engineers to share and exchange R&D experience on both theoretical studies and practical applications of the Extreme Learning Machine (ELM) technique and brain learning. This book covers theories, algorithms ad applications of ELM. It gives readers a glance of the most recent advances of ELM. .

  19. Classification and Ranking of Fermi LAT Gamma-ray Sources from the 3FGL Catalog using Machine Learning Techniques

    CERN Document Server

    Parkinson, P M Saz; Yu, P L H; Salvetti, D; Marelli, M; Falcone, A D

    2016-01-01

    We apply a number of statistical and machine learning techniques to classify and rank gamma-ray sources from the Third Fermi Large Area Telescope (LAT) Source Catalog (3FGL), according to their likelihood of falling into the two major classes of gamma-ray emitters: pulsars (PSR) or Active Galactic Nuclei (AGN). Using 1904 3FGL sources that have been identified/associated with AGN (1738) and PSR (166), we train (using 70% of our sample) and test (using 30%) our algorithms and find that the best overall accuracy (>96%) is obtained with the Random Forest (RF) technique, while using a logistic regression (LR) algorithm results in only marginally lower accuracy. We apply the same techniques on a sub-sample of 142 known gamma-ray pulsars to classify them into two major subcategories: young (YNG) and millisecond pulsars (MSP). Once more, the RF algorithm has the best overall accuracy (~90%), while a boosted LR analysis comes a close second. We apply our two best models (RF and LR) to the entire 3FGL catalog, providi...

  20. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  1. Quantum-Enhanced Machine Learning.

    Science.gov (United States)

    Dunjko, Vedran; Taylor, Jacob M; Briegel, Hans J

    2016-09-23

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  2. Quantum-Enhanced Machine Learning

    Science.gov (United States)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  3. Quantum adiabatic machine learning

    CERN Document Server

    Pudenz, Kristen L

    2011-01-01

    We develop an approach to machine learning and anomaly detection via quantum adiabatic evolution. In the training phase we identify an optimal set of weak classifiers, to form a single strong classifier. In the testing phase we adiabatically evolve one or more strong classifiers on a superposition of inputs in order to find certain anomalous elements in the classification space. Both the training and testing phases are executed via quantum adiabatic evolution. We apply and illustrate this approach in detail to the problem of software verification and validation.

  4. Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques

    Science.gov (United States)

    2007-08-01

    XTAG grammar used by FERGUS is a bidirectional (or reversible) grammar that has been used for parsing as well ( Schabes and Joshi, 1988). The use of a...answer- ing (Wang et al., 2007), and syntactic parsing for resource-poor languages (Chiang et al., 2006). Shieber and Schabes (1990a,b) propose that... Schabes , 1990b; Bos, 2005; Zettlemoyer and Collins, 2007). In the future, we would like to devise learning algorithms similar to WASP that construct

  5. Stacked Extreme Learning Machines.

    Science.gov (United States)

    Zhou, Hongming; Huang, Guang-Bin; Lin, Zhiping; Wang, Han; Soh, Yeng Chai

    2015-09-01

    Extreme learning machine (ELM) has recently attracted many researchers' interest due to its very fast learning speed, good generalization ability, and ease of implementation. It provides a unified solution that can be used directly to solve regression, binary, and multiclass classification problems. In this paper, we propose a stacked ELMs (S-ELMs) that is specially designed for solving large and complex data problems. The S-ELMs divides a single large ELM network into multiple stacked small ELMs which are serially connected. The S-ELMs can approximate a very large ELM network with small memory requirement. To further improve the testing accuracy on big data problems, the ELM autoencoder can be implemented during each iteration of the S-ELMs algorithm. The simulation results show that the S-ELMs even with random hidden nodes can achieve similar testing accuracy to support vector machine (SVM) while having low memory requirements. With the help of ELM autoencoder, the S-ELMs can achieve much better testing accuracy than SVM and slightly better accuracy than deep belief network (DBN) with much faster training speed.

  6. Feature extraction and classification for EEG signals using wavelet transform and machine learning techniques.

    Science.gov (United States)

    Amin, Hafeez Ullah; Malik, Aamir Saeed; Ahmad, Rana Fayyaz; Badruddin, Nasreen; Kamel, Nidal; Hussain, Muhammad; Chooi, Weng-Tink

    2015-03-01

    This paper describes a discrete wavelet transform-based feature extraction scheme for the classification of EEG signals. In this scheme, the discrete wavelet transform is applied on EEG signals and the relative wavelet energy is calculated in terms of detailed coefficients and the approximation coefficients of the last decomposition level. The extracted relative wavelet energy features are passed to classifiers for the classification purpose. The EEG dataset employed for the validation of the proposed method consisted of two classes: (1) the EEG signals recorded during the complex cognitive task--Raven's advance progressive metric test and (2) the EEG signals recorded in rest condition--eyes open. The performance of four different classifiers was evaluated with four performance measures, i.e., accuracy, sensitivity, specificity and precision values. The accuracy was achieved above 98 % by the support vector machine, multi-layer perceptron and the K-nearest neighbor classifiers with approximation (A4) and detailed coefficients (D4), which represent the frequency range of 0.53-3.06 and 3.06-6.12 Hz, respectively. The findings of this study demonstrated that the proposed feature extraction approach has the potential to classify the EEG signals recorded during a complex cognitive task by achieving a high accuracy rate.

  7. Prediction of Driver's Intention of Lane Change by Augmenting Sensor Information Using Machine Learning Techniques.

    Science.gov (United States)

    Kim, Il-Hwan; Bong, Jae-Hwan; Park, Jooyoung; Park, Shinsuk

    2017-06-10

    Driver assistance systems have become a major safety feature of modern passenger vehicles. The advanced driver assistance system (ADAS) is one of the active safety systems to improve the vehicle control performance and, thus, the safety of the driver and the passengers. To use the ADAS for lane change control, rapid and correct detection of the driver's intention is essential. This study proposes a novel preprocessing algorithm for the ADAS to improve the accuracy in classifying the driver's intention for lane change by augmenting basic measurements from conventional on-board sensors. The information on the vehicle states and the road surface condition is augmented by using an artificial neural network (ANN) models, and the augmented information is fed to a support vector machine (SVM) to detect the driver's intention with high accuracy. The feasibility of the developed algorithm was tested through driving simulator experiments. The results show that the classification accuracy for the driver's intention can be improved by providing an SVM model with sufficient driving information augmented by using ANN models of vehicle dynamics.

  8. Automated Classification of Heritage Buildings for As-Built Bim Using Machine Learning Techniques

    Science.gov (United States)

    Bassier, M.; Vergauwen, M.; Van Genechten, B.

    2017-08-01

    Semantically rich three dimensional models such as Building Information Models (BIMs) are increasingly used in digital heritage. They provide the required information to varying stakeholders during the different stages of the historic buildings life cyle which is crucial in the conservation process. The creation of as-built BIM models is based on point cloud data. However, manually interpreting this data is labour intensive and often leads to misinterpretations. By automatically classifying the point cloud, the information can be proccesed more effeciently. A key aspect in this automated scan-to-BIM process is the classification of building objects. In this research we look to automatically recognise elements in existing buildings to create compact semantic information models. Our algorithm efficiently extracts the main structural components such as floors, ceilings, roofs, walls and beams despite the presence of significant clutter and occlusions. More specifically, Support Vector Machines (SVM) are proposed for the classification. The algorithm is evaluated using real data of a variety of existing buildings. The results prove that the used classifier recognizes the objects with both high precision and recall. As a result, entire data sets are reliably labelled at once. The approach enables experts to better document and process heritage assets.

  9. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    Science.gov (United States)

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  10. Reverse engineering smart card malware using side channel analysis with machine learning techniques

    CSIR Research Space (South Africa)

    Djonon Tsague, Hippolyte

    2016-12-01

    Full Text Available by evaluating its power consumption only. Besides well-studied methods from side channel analysis, we apply a combination of dimensionality reduction techniques in the form of PCA and LDA models to compress the large amount of data generated while preserving...

  11. CLASSIFICATION AND RANKING OF FERMI LAT GAMMA-RAY SOURCES FROM THE 3FGL CATALOG USING MACHINE LEARNING TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Saz Parkinson, P. M. [Department of Physics, The University of Hong Kong, Pokfulam Road, Hong Kong (China); Xu, H.; Yu, P. L. H. [Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong (China); Salvetti, D.; Marelli, M. [INAF—Istituto di Astrofisica Spaziale e Fisica Cosmica Milano, via E. Bassini 15, I-20133, Milano (Italy); Falcone, A. D. [Department of Astronomy and Astrophysics, The Pennsylvania State University, University Park, PA 16802 (United States)

    2016-03-20

    We apply a number of statistical and machine learning techniques to classify and rank gamma-ray sources from the Third Fermi Large Area Telescope Source Catalog (3FGL), according to their likelihood of falling into the two major classes of gamma-ray emitters: pulsars (PSR) or active galactic nuclei (AGNs). Using 1904 3FGL sources that have been identified/associated with AGNs (1738) and PSR (166), we train (using 70% of our sample) and test (using 30%) our algorithms and find that the best overall accuracy (>96%) is obtained with the Random Forest (RF) technique, while using a logistic regression (LR) algorithm results in only marginally lower accuracy. We apply the same techniques on a subsample of 142 known gamma-ray pulsars to classify them into two major subcategories: young (YNG) and millisecond pulsars (MSP). Once more, the RF algorithm has the best overall accuracy (∼90%), while a boosted LR analysis comes a close second. We apply our two best models (RF and LR) to the entire 3FGL catalog, providing predictions on the likely nature of unassociated sources, including the likely type of pulsar (YNG or MSP). We also use our predictions to shed light on the possible nature of some gamma-ray sources with known associations (e.g., binaries, supernova remnants/pulsar wind nebulae). Finally, we provide a list of plausible X-ray counterparts for some pulsar candidates, obtained using Swift, Chandra, and XMM. The results of our study will be of interest both for in-depth follow-up searches (e.g., pulsar) at various wavelengths and for broader population studies.

  12. Classification and Ranking of Fermi LAT Gamma-ray Sources from the 3FGL Catalog using Machine Learning Techniques

    Science.gov (United States)

    Saz Parkinson, P. M.; Xu, H.; Yu, P. L. H.; Salvetti, D.; Marelli, M.; Falcone, A. D.

    2016-03-01

    We apply a number of statistical and machine learning techniques to classify and rank gamma-ray sources from the Third Fermi Large Area Telescope Source Catalog (3FGL), according to their likelihood of falling into the two major classes of gamma-ray emitters: pulsars (PSR) or active galactic nuclei (AGNs). Using 1904 3FGL sources that have been identified/associated with AGNs (1738) and PSR (166), we train (using 70% of our sample) and test (using 30%) our algorithms and find that the best overall accuracy (>96%) is obtained with the Random Forest (RF) technique, while using a logistic regression (LR) algorithm results in only marginally lower accuracy. We apply the same techniques on a subsample of 142 known gamma-ray pulsars to classify them into two major subcategories: young (YNG) and millisecond pulsars (MSP). Once more, the RF algorithm has the best overall accuracy (∼90%), while a boosted LR analysis comes a close second. We apply our two best models (RF and LR) to the entire 3FGL catalog, providing predictions on the likely nature of unassociated sources, including the likely type of pulsar (YNG or MSP). We also use our predictions to shed light on the possible nature of some gamma-ray sources with known associations (e.g., binaries, supernova remnants/pulsar wind nebulae). Finally, we provide a list of plausible X-ray counterparts for some pulsar candidates, obtained using Swift, Chandra, and XMM. The results of our study will be of interest both for in-depth follow-up searches (e.g., pulsar) at various wavelengths and for broader population studies.

  13. A Classical Fuzzy Approach for Software Effort Estimation on Machine Learning Technique

    OpenAIRE

    S.Malathi; Sridhar, S.

    2011-01-01

    Software Cost Estimation with resounding reliability, productivity and development effort is a challenging and onerous task. This has incited the software community to give much needed thrust and delve into extensive research in software effort estimation for evolving sophisticated methods. Estimation by analogy is one of the expedient techniques in software effort estimation field. However, the methodology utilized for the estimation of software effort by analogy is not able to handle the ca...

  14. Predicting the academic success of architecture students by pre-enrolment requirement: using machine-learning techniques

    Directory of Open Access Journals (Sweden)

    Ralph Olusola Aluko

    2016-12-01

    Full Text Available In recent years, there has been an increase in the number of applicants seeking admission into architecture programmes. As expected, prior academic performance (also referred to as pre-enrolment requirement is a major factor considered during the process of selecting applicants. In the present study, machine learning models were used to predict academic success of architecture students based on information provided in prior academic performance. Two modeling techniques, namely K-nearest neighbour (k-NN and linear discriminant analysis were applied in the study. It was found that K-nearest neighbour (k-NN outperforms the linear discriminant analysis model in terms of accuracy. In addition, grades obtained in mathematics (at ordinary level examinations had a significant impact on the academic success of undergraduate architecture students. This paper makes a modest contribution to the ongoing discussion on the relationship between prior academic performance and academic success of undergraduate students by evaluating this proposition. One of the issues that emerges from these findings is that prior academic performance can be used as a predictor of academic success in undergraduate architecture programmes. Overall, the developed k-NN model can serve as a valuable tool during the process of selecting new intakes into undergraduate architecture programmes in Nigeria.

  15. Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review.

    Science.gov (United States)

    Bashiri, Azadeh; Ghazisaeedi, Marjan; Safdari, Reza; Shahmoradi, Leila; Ehtesham, Hamide

    2017-02-01

    Today, despite the many advances in early detection of diseases, cancer patients have a poor prognosis and the survival rates in them are low. Recently, microarray technologies have been used for gathering thousands data about the gene expression level of cancer cells. These types of data are the main indicators in survival prediction of cancer. This study highlights the improvement of survival prediction based on gene expression data by using machine learning techniques in cancer patients. This review article was conducted by searching articles between 2000 to 2016 in scientific databases and e-Journals. We used keywords such as machine learning, gene expression data, survival and cancer. Studies have shown the high accuracy and effectiveness of gene expression data in comparison with clinical data in survival prediction. Because of bewildering and high volume of such data, studies have highlighted the importance of machine learning algorithms such as Artificial Neural Networks (ANN) to find out the distinctive signatures of gene expression in cancer patients. These algorithms improve the efficiency of probing and analyzing gene expression in cancer profiles for survival prediction of cancer. By attention to the capabilities of machine learning techniques in proteomics and genomics applications, developing clinical decision support systems based on these methods for analyzing gene expression data can prevent potential errors in survival estimation, provide appropriate and individualized treatments to patients and improve the prognosis of cancers.

  16. Applying machine learning techniques for forecasting flexibility of virtual power plants

    DEFF Research Database (Denmark)

    MacDougall, Pamela; Kosek, Anna Magdalena; Bindner, Henrik W.

    2016-01-01

    hidden layer artificial neural network (ANN). Both techniques are used to model a relationship between the aggregator portfolio state and requested ramp power to the longevity of the delivered flexibility. Using validated individual household models, a smart controlled aggregated virtual power plant...... is simulated. A hierarchical market-based supply-demand matching control mechanism is used to steer the heating devices in the virtual power plant. For both the training and validation set of clusters, a random number of households, between 200 and 2000, is generated with day ahead profile scaled accordingly...

  17. Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems

    Energy Technology Data Exchange (ETDEWEB)

    Song, Shuaiwen; Tallent, Nathan R.; Vishnu, Abhinav

    2013-09-23

    Future exascale systems must be optimized for both power and performance at scale in order to achieve DOE’s goal of a sustained petaflop within 20 Megawatts by 2022 [1]. Massive parallelism of the future systems combined with complex memory hierarchies will form a barrier to efficient application and architecture design. These challenges are exacerbated with emerging complex architectures such as GPGPUs and Intel Xeon Phi as parallelism increases orders of magnitude and system power consumption can easily triple or quadruple. Therefore, we need techniques that can reduce the search space for optimization, isolate power-performance bottlenecks, identify root causes for software/hardware inefficiency, and effectively direct runtime scheduling.

  18. Computer aided prognosis for cell death categorization and prediction in vivo using quantitative ultrasound and machine learning techniques.

    Science.gov (United States)

    Gangeh, M J; Hashim, A; Giles, A; Sannachi, L; Czarnota, G J

    2016-12-01

    At present, a one-size-fits-all approach is typically used for cancer therapy in patients. This is mainly because there is no current imaging-based clinical standard for the early assessment and monitoring of cancer treatment response. Here, the authors have developed, for the first time, a complete computer-aided-prognosis (CAP) system based on multiparametric quantitative ultrasound (QUS) spectroscopy methods in association with texture descriptors and advanced machine learning techniques. This system was used to noninvasively categorize and predict cell death levels in fibrosarcoma mouse tumors treated using ultrasound-stimulated microbubbles as novel endothelial-cell radiosensitizers. Sarcoma xenograft tumor-bearing mice were treated using ultrasound-stimulated microbubbles, alone or in combination with x-ray radiation therapy, as a new antivascular treatment. Therapy effects were assessed at 2-3, 24, and 72 h after treatment using a high-frequency ultrasound. Two-dimensional spectral parametric maps were generated using the power spectra of the raw radiofrequency echo signal. Subsequently, the distances between "pretreatment" and "post-treatment" scans were computed as an indication of treatment efficacy, using a kernel-based metric on textural features extracted from 2D parametric maps. A supervised learning paradigm was used to either categorize cell death levels as low, medium, or high using a classifier, or to "continuously" predict the levels of cell death using a regressor. The developed CAP system performed at a high level for the classification of cell death levels. The area under curve of the receiver operating characteristic was 0.87 for the classification of cell death levels to both low/medium and medium/high levels. Moreover, the prediction of cell death levels using the proposed CAP system achieved a good correlation (r = 0.68,  p course of therapy to enable switching to more efficacious treatments.

  19. A Classical Fuzzy Approach for Software Effort Estimation on Machine Learning Technique

    CERN Document Server

    Malathi, S

    2011-01-01

    Software Cost Estimation with resounding reliability,productivity and development effort is a challenging and onerous task. This has incited the software community to give much needed thrust and delve into extensive research in software effort estimation for evolving sophisticated methods. Estimation by analogy is one of the expedient techniques in software effort estimation field. However, the methodology utilized for the estimation of software effort by analogy is not able to handle the categorical data in an explicit and precise manner. A new approach has been developed in this paper to estimate software effort for projects represented by categorical or numerical data using reasoning by analogy and fuzzy approach. The existing historical data sets, analyzed with fuzzy logic, produce accurate results in comparison to the data set analyzed with the earlier methodologies.

  20. A Classical Fuzzy Approach for Software Effort Estimation on Machine Learning Technique

    Directory of Open Access Journals (Sweden)

    S.Malathi

    2011-11-01

    Full Text Available Software Cost Estimation with resounding reliability, productivity and development effort is a challenging and onerous task. This has incited the software community to give much needed thrust and delve into extensive research in software effort estimation for evolving sophisticated methods. Estimation by analogy is one of the expedient techniques in software effort estimation field. However, the methodology utilized for the estimation of software effort by analogy is not able to handle the categorical data in an explicit and precise manner. A new approach has been developed in this paper to estimate software effort for projects represented by categorical or numerical data using reasoning by analogy and fuzzy approach. The existing historical datasets, analyzed with fuzzy logic, produce accurate results in comparison to the dataset analyzed with the earlier methodologies.

  1. Wastewater quality monitoring system using sensor fusion and machine learning techniques.

    Science.gov (United States)

    Qin, Xusong; Gao, Furong; Chen, Guohua

    2012-03-15

    A multi-sensor water quality monitoring system incorporating an UV/Vis spectrometer and a turbidimeter was used to monitor the Chemical Oxygen Demand (COD), Total Suspended Solids (TSS) and Oil & Grease (O&G) concentrations of the effluents from the Chinese restaurant on campus and an electrocoagulation-electroflotation (EC-EF) pilot plant. In order to handle the noise and information unbalance in the fused UV/Vis spectra and turbidity measurements during the calibration model building, an improved boosting method, Boosting-Iterative Predictor Weighting-Partial Least Squares (Boosting-IPW-PLS), was developed in the present study. The Boosting-IPW-PLS method incorporates IPW into boosting scheme to suppress the quality-irrelevant variables by assigning small weights, and builds up the models for the wastewater quality predictions based on the weighted variables. The monitoring system was tested in the field with satisfactory results, underlying the potential of this technique for the online monitoring of water quality. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. HPC Usage Behavior Analysis and Performance Estimation with Machine Learning Techniques

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Hao [ORNL; You, Haihang [ORNL; Hadri, Bilel [ORNL; Fahey, Mark R [ORNL

    2012-01-01

    Most researchers with little high performance computing (HPC) experience have difficulties productively using the supercomputing resources. To address this issue, we investigated usage behaviors of the world s fastest academic Kraken supercomputer, and built a knowledge-based recommendation system to improve user productivity. Six clustering techniques, along with three cluster validation measures, were implemented to investigate the underlying patterns of usage behaviors. Besides manually defining a category for very large job submissions, six behavior categories were identified, which cleanly separated the data intensive jobs and computational intensive jobs. Then, job statistics of each behavior category were used to develop a knowledge-based recommendation system that can provide users with instructions about choosing appropriate software packages, setting job parameter values, and estimating job queuing time and runtime. Experiments were conducted to evaluate the performance of the proposed recommendation system, which included 127 job submissions by users from different research fields. Great feedback indicated the usefulness of the provided information. The average runtime estimation accuracy of 64.2%, with 28.9% job termination rate, was achieved in the experiments, which almost doubled the average accuracy in the Kraken dataset.

  3. Classifying depression patients and normal subjects using machine learning techniques and nonlinear features from EEG signal.

    Science.gov (United States)

    Hosseinifard, Behshad; Moradi, Mohammad Hassan; Rostami, Reza

    2013-03-01

    Diagnosing depression in the early curable stages is very important and may even save the life of a patient. In this paper, we study nonlinear analysis of EEG signal for discriminating depression patients and normal controls. Forty-five unmedicated depressed patients and 45 normal subjects were participated in this study. Power of four EEG bands and four nonlinear features including detrended fluctuation analysis (DFA), higuchi fractal, correlation dimension and lyapunov exponent were extracted from EEG signal. For discriminating the two groups, k-nearest neighbor, linear discriminant analysis and logistic regression as the classifiers are then used. Highest classification accuracy of 83.3% is obtained by correlation dimension and LR classifier among other nonlinear features. For further improvement, all nonlinear features are combined and applied to classifiers. A classification accuracy of 90% is achieved by all nonlinear features and LR classifier. In all experiments, genetic algorithm is employed to select the most important features. The proposed technique is compared and contrasted with the other reported methods and it is demonstrated that by combining nonlinear features, the performance is enhanced. This study shows that nonlinear analysis of EEG can be a useful method for discriminating depressed patients and normal subjects. It is suggested that this analysis may be a complementary tool to help psychiatrists for diagnosing depressed patients. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  4. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  5. Learning thermodynamics with Boltzmann machines

    Science.gov (United States)

    Torlai, Giacomo; Melko, Roger G.

    2016-10-01

    A Boltzmann machine is a stochastic neural network that has been extensively used in the layers of deep architectures for modern machine learning applications. In this paper, we develop a Boltzmann machine that is capable of modeling thermodynamic observables for physical systems in thermal equilibrium. Through unsupervised learning, we train the Boltzmann machine on data sets constructed with spin configurations importance sampled from the partition function of an Ising Hamiltonian at different temperatures using Monte Carlo (MC) methods. The trained Boltzmann machine is then used to generate spin states, for which we compare thermodynamic observables to those computed by direct MC sampling. We demonstrate that the Boltzmann machine can faithfully reproduce the observables of the physical system. Further, we observe that the number of neurons required to obtain accurate results increases as the system is brought close to criticality.

  6. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  7. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  8. Machine learning for healthcare technologies

    CERN Document Server

    Clifton, David A

    2016-01-01

    This book brings together chapters on the state-of-the-art in machine learning (ML) as it applies to the development of patient-centred technologies, with a special emphasis on 'big data' and mobile data.

  9. Statistical and machine learning approaches for network analysis

    CERN Document Server

    Dehmer, Matthias

    2012-01-01

    Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internation

  10. Machine Learning for Biological Trajectory Classification Applications

    Science.gov (United States)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  11. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  12. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  13. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  14. Machine Learning Phases of Strongly Correlated Fermions

    Directory of Open Access Journals (Sweden)

    Kelvin Ch’ng

    2017-08-01

    Full Text Available Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling. We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  15. Machine Learning Phases of Strongly Correlated Fermions

    Science.gov (United States)

    Ch'ng, Kelvin; Carrasquilla, Juan; Melko, Roger G.; Khatami, Ehsan

    2017-07-01

    Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling). We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  16. Multidsciplinary Approaches to Coastal Adaptation - Aplying Machine Learning Techniques to assess coastal risk in Latin America and The Caribbean

    Science.gov (United States)

    Calil, J.

    2016-12-01

    The global population, currently at 7.3 billion, is increasing by nearly 230,000 people every day. As the world's population grows to an estimated 11.2 billion by 2100, the number of people living in low elevation areas, exposed to coastal hazards, is continuing to increase. In 2013, 22 million people were displaced by extreme weather events, with 37 events displacing at least 100,000 people each. Losses from natural disasters and disaster risk are determined by a complex interaction between physical hazards and the vulnerability of a society or social-ecological system, and its exposure to such hazards. Impacts from coastal hazards depend on the number of people, value of assets, and presence of critical resources in harm's way. Moreover, coastal risks are amplified by challenging socioeconomic dynamics, including ill-advised urban development, income inequality, and poverty level. Our results demonstrate that in Latin America and the Caribbean (LAC), more than half a million people live in areas where coastal hazards, exposure (of people, assets and ecosystems), and poverty converge, creating the ideal conditions for a perfect storm. In order to identify the population at greatest risk to coastal hazards in LAC, and in response to a growing demand for multidisciplinary coastal adaptation approaches, this study employs a combination of machine learning clustering techniques (K-Means and Self Organizing Maps), and a spatial index, to assess coastal risks on a comparative scale. Data for more than 13,000 coastal locations in LAC were collected and allocated into three categories: (1) Coastal Hazards (including storm surge, wave energy and El Niño); (2) Geographic Exposure (including population, agriculture, and ecosystems); and (3) Vulnerability (including income inequality, infant mortality rate and malnutrition). This study identified hotspots of coastal vulnerability, the key drivers of coastal risk at each geographic location. Our results provide important

  17. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  18. Parallelization of TMVA Machine Learning Algorithms

    CERN Document Server

    Hajili, Mammad

    2017-01-01

    This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.

  19. Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

    National Research Council Canada - National Science Library

    Toledo, Cíntia Matsuda; Cunha, Andre; Scarton, Carolina; Aluísio, Sandra

    2014-01-01

    .... A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE...

  20. Machine learning methods for nanolaser characterization

    CERN Document Server

    Zibar, Darko; Winther, Ole; Moerk, Jesper; Schaeffer, Christian

    2016-01-01

    Nanocavity lasers, which are an integral part of an on-chip integrated photonic network, are setting stringent requirements on the sensitivity of the techniques used to characterize the laser performance. Current characterization tools cannot provide detailed knowledge about nanolaser noise and dynamics. In this progress article, we will present tools and concepts from the Bayesian machine learning and digital coherent detection that offer novel approaches for highly-sensitive laser noise characterization and inference of laser dynamics. The goal of the paper is to trigger new research directions that combine the fields of machine learning and nanophotonics for characterizing nanolasers and eventually integrated photonic networks

  1. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  2. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  3. Geospatial and machine learning techniques for wicked social science problems: analysis of crash severity on a regional highway corridor

    Science.gov (United States)

    Effati, Meysam; Thill, Jean-Claude; Shabani, Shahin

    2015-04-01

    The contention of this paper is that many social science research problems are too "wicked" to be suitably studied using conventional statistical and regression-based methods of data analysis. This paper argues that an integrated geospatial approach based on methods of machine learning is well suited to this purpose. Recognizing the intrinsic wickedness of traffic safety issues, such approach is used to unravel the complexity of traffic crash severity on highway corridors as an example of such problems. The support vector machine (SVM) and coactive neuro-fuzzy inference system (CANFIS) algorithms are tested as inferential engines to predict crash severity and uncover spatial and non-spatial factors that systematically relate to crash severity, while a sensitivity analysis is conducted to determine the relative influence of crash severity factors. Different specifications of the two methods are implemented, trained, and evaluated against crash events recorded over a 4-year period on a regional highway corridor in Northern Iran. Overall, the SVM model outperforms CANFIS by a notable margin. The combined use of spatial analysis and artificial intelligence is effective at identifying leading factors of crash severity, while explicitly accounting for spatial dependence and spatial heterogeneity effects. Thanks to the demonstrated effectiveness of a sensitivity analysis, this approach produces comprehensive results that are consistent with existing traffic safety theories and supports the prioritization of effective safety measures that are geographically targeted and behaviorally sound on regional highway corridors.

  4. Photometric Supernova Classification With Machine Learning

    CERN Document Server

    Lochner, Michelle; Peiris, Hiranya V; Lahav, Ofer; Winter, Max K

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Telescope (LSST), given that spectroscopic confirmation of type for all supernovae discovered with these surveys will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques fitting parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks and boosted decision trees. We test the pipeline on simulated multi-ba...

  5. Oscillation frequencies for 35 \\Kepler solar-type planet-hosting stars using Bayesian techniques and machine learning

    CERN Document Server

    Davies, G R; Bedding, T R; Handberg, R; Lund, M N; Chaplin, W J; Huber, D; White, T R; Benomar, O; Hekker, S; Basu, S; Campante, T L; Christensen-Dalsgaard, J; Elsworth, Y; Karoff, C; Kjeldsen, H; Lundkvist, M S; Metcalfe, T S; Stello, D

    2015-01-01

    \\Kepler has revolutionised our understanding of both exoplanets and their host stars. Asteroseismology is a valuable tool in the characterisation of stars and \\Kepler is an excellent observing facility to perform asteroseismology. Here we select a sample of 35 \\Kepler solar-type stars which host transiting exoplanets (or planet candidates) with detected solar-like oscillations. Using available \\Kepler short cadence data up to Quarter 16 we create power spectra optimised for asteroseismology of solar-type stars. We identify modes of oscillation and estimate mode frequencies by ``peak bagging'' using a Bayesian MCMC framework. In addition, we expand the methodology of quality assurance using a Bayesian unsupervised machine learning approach. We report the measured frequencies of the modes of oscillation for all 35 stars and frequency ratios commonly used in detailed asteroseismic modelling. Due to the high correlations associated with frequency ratios we report the covariance matrix of all frequencies measured ...

  6. Machine learning analysis of binaural rowing sounds

    DEFF Research Database (Denmark)

    Johard, Leonard; Ruffaldi, Emanuele; Hoffmann, Pablo F.

    2011-01-01

    Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition metho...... methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train....

  7. Machine Learning Analysis of Binaural Rowing Sounds

    Directory of Open Access Journals (Sweden)

    Filippeschi Alessandro

    2011-12-01

    Full Text Available Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train.

  8. Machine learning methods in chemoinformatics

    Science.gov (United States)

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  9. The Changing Face of the of Former Soviet Cities: Elucidated by Remote Sensing and Machine Learning Techniques

    Science.gov (United States)

    Poghosyan, Armen

    2017-04-01

    Despite remote sensing of urbanization emerged as a powerful tool to acquire critical knowledge about urban growth and its effects on global environmental change, human-environment interface as well as environmentally sustainable urban development, there is lack of studies utilizing remote sensing techniques to investigate urbanization trends in the Post-Soviet states. The unique challenges accompanying the urbanization in the Post-Soviet republics combined with the expected robust urban growth in developing countries over the next several decades highlight the critical need for a quantitative assessment of the urban dynamics in the former Soviet states as they navigate towards a free market democracy. This study uses total of 32 Level-1 precision terrain corrected (L1T) Landsat scenes with 30-m resolution as well as further auxiliary population and economic data for ten cities distributed in nine former Soviet republics to quantify the urbanization patterns in the Post-Soviet region. Land cover in each urban center of this study was classified by using Support Vector Machine (SVM) learning algorithm with overall accuracies ranging from 87 % to 97 % for 29 classification maps over three time steps during the past twenty-five years in order to estimate quantities, trends and drivers of urban growth in the study area. The results demonstrated several spatial and temporal urbanization patterns observed across the Post-Soviet states and based on urban expansion rates the cities can be divided into two groups, fast growing and slow growing urban centers. The relatively fast-growing urban centers have an average urban expansion rate of about 2.8 % per year, whereas the slow growing cities have an average urban expansion rate of about 1.0 % per year. The total area of new land converted to urban environment ranged from as low as 26 km2 to as high as 780 km2 for the ten cities over the 1990 - 2015 period, while the overall urban land increase ranged from 11.3 % to 96

  10. Machine Learning in Medicine

    National Research Council Canada - National Science Library

    Deo, Rahul C

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success...

  11. Machine learning analysis of binaural rowing sounds

    DEFF Research Database (Denmark)

    Johard, Leonard; Ruffaldi, Emanuele; Hoffmann, Pablo F.

    2011-01-01

    Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition metho...... methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train.......Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition...

  12. Machine Learning and Cosmological Simulations

    Science.gov (United States)

    Kamdar, Harshil; Turk, Matthew; Brunner, Robert

    2016-01-01

    We explore the application of machine learning (ML) to the problem of galaxy formation and evolution in a hierarchical universe. Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively evaluating the extent of the influence of dark matter halo properties on small-scale structure formation. For our analyses, we use both semi-analytical models (Millennium simulation) and N-body + hydrodynamical simulations (Illustris simulation). The ML algorithms are trained on important dark matter halo properties (inputs) and galaxy properties (outputs). The trained models are able to robustly predict the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity. Moreover, the ML simulated galaxies obey fundamental observational constraints implying that the population of ML predicted galaxies is physically and statistically robust. Next, ML algorithms are trained on an N-body + hydrodynamical simulation and applied to an N-body only simulation (Dark Sky simulation, Illustris Dark), populating this new simulation with galaxies. We can examine how structure formation changes with different cosmological parameters and are able to mimic a full-blown hydrodynamical simulation in a computation time that is orders of magnitude smaller. We find that the set of ML simulated galaxies in Dark Sky obey the same observational constraints, further solidifying ML's place as an intriguing and promising technique in future galaxy formation studies and rapid mock galaxy catalog creation.

  13. Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: a probability bounds approach.

    Science.gov (United States)

    Betrie, Getnet D; Sadiq, Rehan; Morin, Kevin A; Tesfamariam, Solomon

    2014-08-15

    Acid rock drainage (ARD) is a major pollution problem globally that has adversely impacted the environment. Identification and quantification of uncertainties are integral parts of ARD assessment and risk mitigation, however previous studies on predicting ARD drainage chemistry have not fully addressed issues of uncertainties. In this study, artificial neural networks (ANN) and support vector machine (SVM) are used for the prediction of ARD drainage chemistry and their predictive uncertainties are quantified using probability bounds analysis. Furthermore, the predictions of ANN and SVM are integrated using four aggregation methods to improve their individual predictions. The results of this study showed that ANN performed better than SVM in enveloping the observed concentrations. In addition, integrating the prediction of ANN and SVM using the aggregation methods improved the predictions of individual techniques.

  14. Machine Learning in Parliament Elections

    Directory of Open Access Journals (Sweden)

    Ahmad Esfandiari

    2012-09-01

    Full Text Available Parliament is considered as one of the most important pillars of the country governance. The parliamentary elections and prediction it, had been considered by scholars of from various field like political science long ago. Some important features are used to model the results of consultative parliament elections. These features are as follows: reputation and popularity, political orientation, tradesmen's support, clergymen's support, support from political wings and the type of supportive wing. Two parameters of reputation and popularity and the support of clergymen and religious scholars that have more impact in reducing of prediction error in election results, have been used as input parameters in implementation. In this study, the Iranian parliamentary elections, modeled and predicted using learnable machines of neural network and neuro-fuzzy. Neuro-fuzzy machine combines the ability of knowledge representation of fuzzy sets and the learning power of neural networks simultaneously. In predicting the social and political behavior, the neural network is first trained by two learning algorithms using the training data set and then this machine predict the result on test data. Next, the learning of neuro-fuzzy inference machine is performed. Then, be compared the results of two machines.

  15. Machine Learning for Education: Learning to Teach

    Science.gov (United States)

    2016-12-01

    1 Machine Learning for Education: Learning to Teach Matthew C. Gombolay, Reed Jensen, Sung-Hyun Son Massachusetts Institute of Technology Lincoln...training tools and develop military strategies within their training environment. Second, we develop methods for improving warfighter education: learning to...and do not necessarily reflect the views of the Department of the Navy. RAMS # 1001485 Fig. 1. SGD enables development of automated teaching tools for

  16. Fast, Continuous Audiogram Estimation Using Machine Learning.

    Science.gov (United States)

    Song, Xinyu D; Wallace, Brittany M; Gardner, Jacob R; Ledbetter, Noah M; Weinberger, Kilian Q; Barbour, Dennis L

    2015-01-01

    Pure-tone audiometry has been a staple of hearing assessments for decades. Many different procedures have been proposed for measuring thresholds with pure tones by systematically manipulating intensity one frequency at a time until a discrete threshold function is determined. The authors have developed a novel nonparametric approach for estimating a continuous threshold audiogram using Bayesian estimation and machine learning classification. The objective of this study was to assess the accuracy and reliability of this new method relative to a commonly used threshold measurement technique. The authors performed air conduction pure-tone audiometry on 21 participants between the ages of 18 and 90 years with varying degrees of hearing ability. Two repetitions of automated machine learning audiogram estimation and one repetition of conventional modified Hughson-Westlake ascending-descending audiogram estimation were acquired by an audiologist. The estimated hearing thresholds of these two techniques were compared at standard audiogram frequencies (i.e., 0.25, 0.5, 1, 2, 4, 8 kHz). The two threshold estimate methods delivered very similar estimates at standard audiogram frequencies. Specifically, the mean absolute difference between estimates was 4.16 ± 3.76 dB HL. The mean absolute difference between repeated measurements of the new machine learning procedure was 4.51 ± 4.45 dB HL. These values compare favorably with those of other threshold audiogram estimation procedures. Furthermore, the machine learning method generated threshold estimates from significantly fewer samples than the modified Hughson-Westlake procedure while returning a continuous threshold estimate as a function of frequency. The new machine learning audiogram estimation technique produces continuous threshold audiogram estimates accurately, reliably, and efficiently, making it a strong candidate for widespread application in clinical and research audiometry.

  17. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  18. Machine Learning applications in CMS

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Machine Learning is used in many aspects of CMS data taking, monitoring, processing and analysis. We review a few of these use cases and the most recent developments, with an outlook to future applications in the LHC Run III and for the High-Luminosity phase.

  19. Combining Formal Logic and Machine Learning for Sentiment Analysis

    DEFF Research Database (Denmark)

    Petersen, Niklas Christoffer; Villadsen, Jørgen

    2014-01-01

    This paper presents a formal logical method for deep structural analysis of the syntactical properties of texts using machine learning techniques for efficient syntactical tagging. To evaluate the method it is used for entity level sentiment analysis as an alternative to pure machine learning...

  20. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  1. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  2. Meter-scale Urban Land Cover Mapping for EPA EnviroAtlas Using Machine Learning and OBIA Remote Sensing Techniques

    Science.gov (United States)

    Pilant, A. N.; Baynes, J.; Dannenberg, M.; Riegel, J.; Rudder, C.; Endres, K.

    2013-12-01

    US EPA EnviroAtlas is an online collection of tools and resources that provides geospatial data, maps, research, and analysis on the relationships between nature, people, health, and the economy (http://www.epa.gov/research/enviroatlas/index.htm). Using EnviroAtlas, you can see and explore information related to the benefits (e.g., ecosystem services) that humans receive from nature, including clean air, clean and plentiful water, natural hazard mitigation, biodiversity conservation, food, fuel, and materials, recreational opportunities, and cultural and aesthetic value. EPA developed several urban land cover maps at very high spatial resolution (one-meter pixel size) for a portion of EnviroAtlas devoted to urban studies. This urban mapping effort supported analysis of relations among land cover, human health and demographics at the US Census Block Group level. Supervised classification of 2010 USDA NAIP (National Agricultural Imagery Program) digital aerial photos produced eight-class land cover maps for several cities, including Durham, NC, Portland, ME, Tampa, FL, New Bedford, MA, Pittsburgh, PA, Portland, OR, and Milwaukee, WI. Semi-automated feature extraction methods were used to classify the NAIP imagery: genetic algorithms/machine learning, random forest, and object-based image analysis (OBIA). In this presentation we describe the image processing and fuzzy accuracy assessment methods used, and report on some sustainability and ecosystem service metrics computed using this land cover as input (e.g., carbon sequestration from USFS iTREE model; health and demographics in relation to road buffer forest width). We also discuss the land cover classification schema (a modified Anderson Level 1 after the National Land Cover Data (NLCD)), and offer some observations on lessons learned. Meter-scale urban land cover in Portland, OR overlaid on NAIP aerial photo. Streets, buildings and individual trees are identifiable.

  3. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    Science.gov (United States)

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  4. Advanced analytical methodologies for measuring healthy ageing and its determinants, using factor analysis and machine learning techniques: the ATHLOS project

    Science.gov (United States)

    Félix Caballero, Francisco; Soulis, George; Engchuan, Worrawat; Sánchez-Niubó, Albert; Arndt, Holger; Ayuso-Mateos, José Luis; Haro, Josep Maria; Chatterji, Somnath; Panagiotakos, Demosthenes B.

    2017-01-01

    A most challenging task for scientists that are involved in the study of ageing is the development of a measure to quantify health status across populations and over time. In the present study, a Bayesian multilevel Item Response Theory approach is used to create a health score that can be compared across different waves in a longitudinal study, using anchor items and items that vary across waves. The same approach can be applied to compare health scores across different longitudinal studies, using items that vary across studies. Data from the English Longitudinal Study of Ageing (ELSA) are employed. Mixed-effects multilevel regression and Machine Learning methods were used to identify relationships between socio-demographics and the health score created. The metric of health was created for 17,886 subjects (54.6% of women) participating in at least one of the first six ELSA waves and correlated well with already known conditions that affect health. Future efforts will implement this approach in a harmonised data set comprising several longitudinal studies of ageing. This will enable valid comparisons between clinical and community dwelling populations and help to generate norms that could be useful in day-to-day clinical practice. PMID:28281663

  5. Use of Machine Learning Techniques for Iidentification of Robust Teleconnections to East African Rainfall Variability in Observations and Models

    Science.gov (United States)

    Roberts, J. Brent; Robertson, Franklin R.; Funk, Chris

    2014-01-01

    Providing advance warning of East African rainfall variations is a particular focus of several groups including those participating in the Famine Early Warming Systems Network. Both seasonal and long-term model projections of climate variability are being used to examine the societal impacts of hydrometeorological variability on seasonal to interannual and longer time scales. The NASA / USAID SERVIR project, which leverages satellite and modeling-based resources for environmental decision making in developing nations, is focusing on the evaluation of both seasonal and climate model projections to develop downscaled scenarios for using in impact modeling. The utility of these projections is reliant on the ability of current models to capture the embedded relationships between East African rainfall and evolving forcing within the coupled ocean-atmosphere-land climate system. Previous studies have posited relationships between variations in El Niño, the Walker circulation, Pacific decadal variability (PDV), and anthropogenic forcing. This study applies machine learning methods (e.g. clustering, probabilistic graphical model, nonlinear PCA) to observational datasets in an attempt to expose the importance of local and remote forcing mechanisms of East African rainfall variability. The ability of the NASA Goddard Earth Observing System (GEOS5) coupled model to capture the associated relationships will be evaluated using Coupled Model Intercomparison Project Phase 5 (CMIP5) simulations.

  6. Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae

    Science.gov (United States)

    Kontos, Kevin; Godard, Patrice; André, Bruno; van Helden, Jacques; Bontempi, Gianluca

    2008-01-01

    Background Nitrogen is an essential nutrient for all life forms. Like most unicellular organisms, the yeast Saccharomyces cerevisiae transports and catabolizes good nitrogen sources in preference to poor ones. Nitrogen catabolite repression (NCR) refers to this selection mechanism. All known nitrogen catabolite pathways are regulated by four regulators. The ultimate goal is to infer the complete nitrogen catabolite pathways. Bioinformatics approaches offer the possibility to identify putative NCR genes and to discard uninteresting genes. Results We present a machine learning approach where the identification of putative NCR genes in the yeast Saccharomyces cerevisiae is formulated as a supervised two-class classification problem. Classifiers predict whether genes are NCR-sensitive or not from a large number of variables related to the GATA motif in the upstream non-coding sequences of the genes. The positive and negative training sets are composed of annotated NCR genes and manually-selected genes known to be insensitive to NCR, respectively. Different classifiers and variable selection methods are compared. We show that all classifiers make significant and biologically valid predictions by comparing these predictions to annotated and putative NCR genes, and by performing several negative controls. In particular, the inferred NCR genes significantly overlap with putative NCR genes identified in three genome-wide experimental and bioinformatics studies. Conclusion These results suggest that our approach can successfully identify potential NCR genes. Hence, the dimensionality of the problem of identifying all genes involved in NCR is drastically reduced. PMID:19091052

  7. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  8. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  9. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  10. The ATLAS Higgs Machine Learning Challenge

    CERN Document Server

    Cowan, Glen; The ATLAS collaboration; Bourdarios, Claire

    2015-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 1990s with Artificial Neural Net and more recently with Boosted Decision Trees, Random Forest etc. Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, data scientists are developing new Machine Learning algorithms to extract meaning from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, and at the same time data scientists have advanced algorithms: the goal of the HiggsML project was to bring the two together by a “challenge”: participants from all over the world and any scientific background could compete online to obtain the best Higgs to tau tau signal significance on a set of ATLAS fully simulated Monte Carlo signal and background. Instead of HEP physicists browsing through machine learning papers and trying to infer which new algorithms might be useful for HEP, then c...

  11. Learning from Distributions via Support Measure Machines

    CERN Document Server

    Muandet, Krikamol; Fukumizu, Kenji; Dinuzzo, Francesco

    2012-01-01

    This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.

  12. Estimation and scaling of hydrostratigraphic units: application of unsupervised machine learning and multivariate statistical techniques to hydrogeophysical data

    Science.gov (United States)

    Friedel, Michael J.

    2016-08-01

    Numerical models provide a way to evaluate groundwater systems, but determining the hydrostratigraphic units (HSUs) used in constructing these models remains subjective, nonunique, and uncertain. A three-step machine-learning approach is proposed in which fusion, estimation, and clustering operations are performed on different data sets to arrive at HSUs at different scales. In step one, data fusion is performed by training a self-organizing map (SOM) with sparse borehole hydrogeologic (lithology, hydraulic conductivity, aqueous field parameters, dissolved constituents) and geophysical (gamma, spontaneous potential, and resistivity) measurements. Estimation is handled by iterative least-squares minimization of the SOM quantization and topographical errors. Application of the Davies-Bouldin criteria to k-means clustering of SOM nodes is used to determine the number and location of discontinuous borehole HSUs with low lateral density (based on borehole spacing at 100 s m) and high vertical density (based on cm-scale logging). In step two, a scaling network is trained using the estimated borehole HSUs, airborne electromagnetic measurements, and numerically inverted resistivity profiles. In step three, independent airborne electromagnetic measurements are applied to the scaling network, and the estimation performed to arrive at a set of continuous HSUs with high lateral density (based on sounding locations at meter (m) spacing) and medium vertical density (based on m-layer modeled structure). Performance metrics are used to evaluate each step of the approach. Efficacy of the proposed approach is demonstrated to map local-to-regional scale HSUs using hydrogeophysical data collected at a heterogeneous surficial aquifer in northwestern Nebraska, USA.

  13. Automated analysis of retinal imaging using machine learning techniques for computer vision [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jeffrey De Fauw

    2017-06-01

    Full Text Available There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases. Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet” age-related macular degeneration (wet AMD and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves. Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges. This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients. Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.

  14. Estimation and scaling of hydrostratigraphic units: application of unsupervised machine learning and multivariate statistical techniques to hydrogeophysical data

    Science.gov (United States)

    Friedel, Michael J.

    2016-12-01

    Numerical models provide a way to evaluate groundwater systems, but determining the hydrostratigraphic units (HSUs) used in constructing these models remains subjective, nonunique, and uncertain. A three-step machine-learning approach is proposed in which fusion, estimation, and clustering operations are performed on different data sets to arrive at HSUs at different scales. In step one, data fusion is performed by training a self-organizing map (SOM) with sparse borehole hydrogeologic (lithology, hydraulic conductivity, aqueous field parameters, dissolved constituents) and geophysical (gamma, spontaneous potential, and resistivity) measurements. Estimation is handled by iterative least-squares minimization of the SOM quantization and topographical errors. Application of the Davies-Bouldin criteria to k-means clustering of SOM nodes is used to determine the number and location of discontinuous borehole HSUs with low lateral density (based on borehole spacing at 100 s m) and high vertical density (based on cm-scale logging). In step two, a scaling network is trained using the estimated borehole HSUs, airborne electromagnetic measurements, and numerically inverted resistivity profiles. In step three, independent airborne electromagnetic measurements are applied to the scaling network, and the estimation performed to arrive at a set of continuous HSUs with high lateral density (based on sounding locations at meter (m) spacing) and medium vertical density (based on m-layer modeled structure). Performance metrics are used to evaluate each step of the approach. Efficacy of the proposed approach is demonstrated to map local-to-regional scale HSUs using hydrogeophysical data collected at a heterogeneous surficial aquifer in northwestern Nebraska, USA.

  15. Automated analysis of retinal imaging using machine learning techniques for computer vision [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jeffrey De Fauw

    2016-07-01

    Full Text Available There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases.   Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet” age-related macular degeneration (wet AMD and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves. Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges.   This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients.   Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, Google DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.

  16. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  17. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  18. Attractor Control Using Machine Learning

    CERN Document Server

    Duriez, Thomas; Noack, Bernd R; Cordier, Laurent; Segond, Marc; Abel, Markus

    2013-01-01

    We propose a general strategy for feedback control design of complex dynamical systems exploiting the nonlinear mechanisms in a systematic unsupervised manner. These dynamical systems can have a state space of arbitrary dimension with finite number of actuators (multiple inputs) and sensors (multiple outputs). The control law maps outputs into inputs and is optimized with respect to a cost function, containing physics via the dynamical or statistical properties of the attractor to be controlled. Thus, we are capable of exploiting nonlinear mechanisms, e.g. chaos or frequency cross-talk, serving the control objective. This optimization is based on genetic programming, a branch of machine learning. This machine learning control is successfully applied to the stabilization of nonlinearly coupled oscillators and maximization of Lyapunov exponent of a forced Lorenz system. We foresee potential applications to most nonlinear multiple inputs/multiple outputs control problems, particulary in experiments.

  19. Development of automatic surveillance of animal behaviour and welfare using image analysis and machine learned segmentation technique.

    Science.gov (United States)

    Nilsson, M; Herlin, A H; Ardö, H; Guzhva, O; Åström, K; Bergsten, C

    2015-11-01

    In this paper the feasibility to extract the proportion of pigs located in different areas of a pig pen by advanced image analysis technique is explored and discussed for possible applications. For example, pigs generally locate themselves in the wet dunging area at high ambient temperatures in order to avoid heat stress, as wetting the body surface is the major path to dissipate the heat by evaporation. Thus, the portion of pigs in the dunging area and resting area, respectively, could be used as an indicator of failure of controlling the climate in the pig environment as pigs are not supposed to rest in the dunging area. The computer vision methodology utilizes a learning based segmentation approach using several features extracted from the image. The learning based approach applied is based on extended state-of-the-art features in combination with a structured prediction framework based on a logistic regression solver using elastic net regularization. In addition, the method is able to produce a probability per pixel rather than form a hard decision. This overcomes some of the limitations found in a setup using grey-scale information only. The pig pen is a difficult imaging environment because of challenging lighting conditions like shadows, poor lighting and poor contrast between pig and background. In order to test practical conditions, a pen containing nine young pigs was filmed from a top view perspective by an Axis M3006 camera with a resolution of 640 × 480 in three, 10-min sessions under different lighting conditions. The results indicate that a learning based method improves, in comparison with greyscale methods, the possibility to reliable identify proportions of pigs in different areas of the pen. Pigs with a changed behaviour (location) in the pen may indicate changed climate conditions. Changed individual behaviour may also indicate inferior health or acute illness.

  20. Machine learning phases of matter

    OpenAIRE

    Carrasquilla, Juan; Melko, Roger G.

    2016-01-01

    Neural networks can be used to identify phases and phase transitions in condensed matter systems via supervised machine learning. Readily programmable through modern software libraries, we show that a standard feed-forward neural network can be trained to detect multiple types of order parameter directly from raw state configurations sampled with Monte Carlo. In addition, they can detect highly non-trivial states such as Coulomb phases, and if modified to a convolutional neural network, topol...

  1. Galaxy Classification using Machine Learning

    Science.gov (United States)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  2. Determination of the W W polarization fractions in p p →W±W±j j using a deep machine learning technique

    Science.gov (United States)

    Searcy, Jacob; Huang, Lillian; Pleier, Marc-André; Zhu, Junjie

    2016-05-01

    The unitarization of the longitudinal vector boson scattering (VBS) cross section by the Higgs boson is a fundamental prediction of the Standard Model which has not been experimentally verified. One of the most promising ways to measure VBS uses events containing two leptonically decaying same-electric-charge W bosons produced in association with two jets. However, the angular distributions of the leptons in the W boson rest frame, which are commonly used to fit polarization fractions, are not readily available in this process due to the presence of two neutrinos in the final state. In this paper we present a method to alleviate this problem by using a deep machine learning technique to recover these angular distributions from measurable event kinematics and demonstrate how the longitudinal-longitudinal scattering fraction could be studied. We show that this method doubles the expected sensitivity when compared to previous proposals.

  3. Determination of the $WW$ polarization fractions in $pp \\to W^\\pm W^\\pm jj$ using a deep machine learning technique

    CERN Document Server

    Searcy, Jacob; Pleier, Marc-André; Zhu, Junjie

    2015-01-01

    The unitarization of the longitudinal vector boson scattering (VBS) cross section by the Higgs boson is a fundamental prediction of the Standard Model which has not been experimentally verified. One of the most promising ways to measure VBS uses events containing two leptonically-decaying same-electric-charge $W$ bosons produced in association with two jets. However, the angular distributions of the leptons in the $W$ boson rest frame, which are commonly used to fit polarization fractions, are not readily available in this process due to the presence of two neutrinos in the final state. In this paper we present a method to alleviate this problem by using a deep machine learning technique to recover these angular distributions from measurable event kinematics and demonstrate how the longitudinal-longitudinal scattering fraction could be studied. We show that this method doubles the expected sensitivity when compared to previous proposals.

  4. A Benchmark for Banks’ Strategy in Online Presence – An Innovative Approach Based on Elements of Search Engine Optimization (SEO and Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Camelia Elena CIOLAC

    2011-06-01

    Full Text Available This paper aims to offer a new decision tool to assist banks in evaluating their efficiency of Internet presence and in planning the IT investments towards gaining better Internet popularity. The methodology used in this paper goes beyond the simple website interface analysis and uses web crawling as a source for collecting website performance data and employed web technologies and servers. The paper complements this technical perspective with a proposed scorecard used to assess the efforts of banks in Internet presence that reflects the banks’ commitment to Internet as a distribution channel. An innovative approach based on Machine Learning Techniques, the K-Nearest Neighbor Algorithm, is proposed by the author to estimate the Internet Popularity that a bank is likely to achieve based on its size and efforts in Internet presence.

  5. Special techniques in ultra-precision machining

    Science.gov (United States)

    Li, Li; Min, Xu; Chen, Dong; Wang, JunHua

    2007-12-01

    As the development of ultra-precision machining, the SPDT (single point diamond turning) was applied for the manufacture of a variety of optical components for its high precision , versatility and lower manufacturing cost. Whereas, the improvement of ultra-precision machining is not only related to the most topnotch equipments in the world but also closely linked to the special techniques in the ultra-precision Machining. Therefore, the industrialization and marketization of the ultra-precision machining will not be realized without these special techniques. This paper introduces the principle, trait and application of some important special techniques which can match the SPDT efficaciously, they are FTS, STS, SSS, ACT, VQ, LADT and UADT techniques.

  6. Machine learning in geosciences and remote sensing

    Institute of Scientific and Technical Information of China (English)

    David J. Lary; Amir H. Alavi; Amir H. Gandomi; Annette L. Walker

    2016-01-01

    Learning incorporates a broad range of complex procedures. Machine learning (ML) is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc.) that can provide multivariate, nonlinear, nonparametric regres-sion or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the ef-ficiency of ML for tackling the geosciences and remote sensing problems.

  7. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  8. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM......) approach is able to perform one-shot learning in a reinforcement learning task without catastrophic forgetting of previously stored associations....

  9. Ensemble Machine Learning Methods and Applications

    CERN Document Server

    Ma, Yunqian

    2012-01-01

    It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed “ensemble learning” by researchers in computational intelligence and machine learning, it is known to improve a decision system’s robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as “boosting” and “random forest” facilitate solutions to key computational issues such as face detection and are now being applied in areas as diverse as object trackingand bioinformatics.   Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs. At once a solid theoretical study and a practical guide, the volume is a windfall for r...

  10. On-the-Fly Learning in a Perpetual Learning Machine

    OpenAIRE

    2015-01-01

    Despite the promise of brain-inspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brain-like dynamic 'on the fly' learning because it exists in a self-supervised state of Perpetual Stochastic Gradient Descent. Thus, we provide the means to unify learning and memory within a machine learning framework. We also explore ...

  11. Cells, Agents, and Support Vectors in Interaction - Modeling Urban Sprawl based on Machine Learning and Artificial Intelligence Techniques in a Post-Industrial Region

    Science.gov (United States)

    Rienow, A.; Menz, G.

    2015-12-01

    Since the beginning of the millennium, artificial intelligence techniques as cellular automata (CA) and multi-agent systems (MAS) have been incorporated into land-system simulations to address the complex challenges of transitions in urban areas as open, dynamic systems. The study presents a hybrid modeling approach for modeling the two antagonistic processes of urban sprawl and urban decline at once. The simulation power of support vector machines (SVM), cellular automata (CA) and multi-agent systems (MAS) are integrated into one modeling framework and applied to the largest agglomeration of Central Europe: the Ruhr. A modified version of SLEUTH (short for Slope, Land-use, Exclusion, Urban, Transport, and Hillshade) functions as the CA component. SLEUTH makes use of historic urban land-use data sets and growth coefficients for the purpose of modeling physical urban expansion. The machine learning algorithm of SVM is applied in order to enhance SLEUTH. Thus, the stochastic variability of the CA is reduced and information about the human and ecological forces driving the local suitability of urban sprawl is incorporated. Subsequently, the supported CA is coupled with the MAS ReHoSh (Residential Mobility and the Housing Market of Shrinking City Systems). The MAS models population patterns, housing prices, and housing demand in shrinking regions based on interactions between household and city agents. Semi-explicit urban weights are introduced as a possibility of modeling from and to the pixel simultaneously. Three scenarios of changing housing preferences reveal the urban development of the region in terms of quantity and location. They reflect the dissemination of sustainable thinking among stakeholders versus the steady dream of owning a house in sub- and exurban areas. Additionally, the outcomes are transferred into a digital petri dish reflecting a synthetic environment with perfect conditions of growth. Hence, the generic growth elements affecting the future

  12. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  13. Archetypal Analysis for Machine Learning

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai

    2010-01-01

    Archetypal analysis (AA) proposed by Cutler and Breiman in [1] estimates the principal convex hull of a data set. As such AA favors features that constitute representative ’corners’ of the data, i.e. distinct aspects or archetypes. We will show that AA enjoys the interpretability of clustering - ...... for K-means [2]. We demonstrate that the AA model is relevant for feature extraction and dimensional reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, text mining and collaborative filtering....

  14. Potential for optimizing the Higgs boson C P measurement in H →τ τ decays at the LHC including machine learning techniques

    Science.gov (United States)

    Józefowicz, R.; Richter-Was, E.; Was, Z.

    2016-11-01

    We investigate the potential for measuring the C P state of the Higgs boson in H →τ τ decays with consecutive τ -lepton decays in the channels: τ±→ρ±ντ and τ±→a1±ντ combined. Subsequent decays ρ±→π±π0, a1±→ρ0π± and ρ0→π+π- are taken into account. We will explore extensions of the method, where the acoplanarity angle for the planes build on the visible decay products, π±π0 of τ±→π±π0ντ, was used. The angle is sensitive to transverse spin correlations, thus to parity. We show that in the case of the cascade decays of τ →a1ν , information on the C P state of Higgs can be extracted from the acoplanarity angles as well. Because in the cascade decay a1±→ρ0π± , ρ0→π+π- up to four planes can be defined, up to 16 distinct acoplanarity angles are available for H →τ τ →a1+a1-ν ν decays. These acoplanarities carry in part supplementary but also correlated information. It is thus cumbersome to evaluate an overall sensitivity. We investigate the sensitivity potential of such analysis, by developing and applying machine learning techniques. We quantify possible improvements when multidimensional phase space of outgoing decay products directions is used, instead of one-dimensional projections i.e. the acoplanarity angles. We do not take into account ambiguities resulting from detector uncertainties or background contamination; we concentrate on the usefulness of machine learning methods and τ →3 π ν decays for Higgs boson parity measurement.

  15. Nontraditional manufacturing technique-Nano machining technique based on SPM

    Institute of Scientific and Technical Information of China (English)

    DONG; Shen; YAN; Yongda; SUN; Tao; LIANG; Yingchun; CHENG

    2004-01-01

    Nano machining based on SPM is a novel, nontraditional advanced manufacturing technique. There are three main machining methods based on SPM, i.e.single atom manipulation, surface modification using physical or chemical actions and mechanical scratching. The current development of this technique is summarized. Based on the analysis of mechanical scratching mechanism, a 5 μm micro inflation hole is fabricated on the surface of inertial confinement fusion (ICF) target. The processing technique is optimized. The machining properties of brittle material, single crystal Ge, are investigated. A micro machining system combining SPM and a high accuracy stage is developed. Some 2D and 3D microstructures are fabricated using the system. This method has broad applications in the field of nano machining.

  16. Extreme Learning Machine for land cover classification

    OpenAIRE

    Pal, Mahesh

    2008-01-01

    This paper explores the potential of extreme learning machine based supervised classification algorithm for land cover classification. In comparison to a backpropagation neural network, which requires setting of several user-defined parameters and may produce local minima, extreme learning machine require setting of one parameter and produce a unique solution. ETM+ multispectral data set (England) was used to judge the suitability of extreme learning machine for remote sensing classifications...

  17. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  18. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  19. Trends in Machine Learning for Signal Processing

    DEFF Research Database (Denmark)

    Adali, Tulay; Miller, David J.; Diamantaras, Konstantinos I.

    2011-01-01

    By putting the accent on learning from the data and the environment, the Machine Learning for SP (MLSP) Technical Committee (TC) provides the essential bridge between the machine learning and SP communities. While the emphasis in MLSP is on learning and data-driven approaches, SP defines the main...... applications of interest, and thus the constraints and requirements on solutions, which include computational efficiency, online adaptation, and learning with limited supervision/reference data....

  20. The ATLAS Higgs machine learning challenge

    CERN Document Server

    Davey, W; The ATLAS collaboration; Rousseau, D; Cowan, G; Kegl, B; Germain-Renaud, C; Guyon, I

    2014-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 90's with Artificial Neural Net for example, more recently with Boosted Decision Trees, Random Forest etc... Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, Data Scientists are developing new Machine Learning algorithms to extract sense from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, data scientists have advanced algorithms: the goal of the HiggsML project is to bring the two together by a “challenge”: participants from all over the world and any scientific background can compete online ( https://www.kaggle.com/c/higgs-boson ) to obtain the best Higgs to tau tau signal significance on a set of ATLAS full simulated Monte Carlo signal and background. Winners with the best scores will receive money prizes ; authors of the best method (most usable) will be invited t...

  1. Implementing Machine Learning in Radiology Practice and Research.

    Science.gov (United States)

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  2. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  3. Principles and techniques for designing precision machines

    Energy Technology Data Exchange (ETDEWEB)

    Hale, L C

    1999-02-01

    This thesis is written to advance the reader's knowledge of precision-engineering principles and their application to designing machines that achieve both sufficient precision and minimum cost. It provides the concepts and tools necessary for the engineer to create new precision machine designs. Four case studies demonstrate the principles and showcase approaches and solutions to specific problems that generally have wider applications. These come from projects at the Lawrence Livermore National Laboratory in which the author participated: the Large Optics Diamond Turning Machine, Accuracy Enhancement of High- Productivity Machine Tools, the National Ignition Facility, and Extreme Ultraviolet Lithography. Although broad in scope, the topics go into sufficient depth to be useful to practicing precision engineers and often fulfill more academic ambitions. The thesis begins with a chapter that presents significant principles and fundamental knowledge from the Precision Engineering literature. Following this is a chapter that presents engineering design techniques that are general and not specific to precision machines. All subsequent chapters cover specific aspects of precision machine design. The first of these is Structural Design, guidelines and analysis techniques for achieving independently stiff machine structures. The next chapter addresses dynamic stiffness by presenting several techniques for Deterministic Damping, damping designs that can be analyzed and optimized with predictive results. Several chapters present a main thrust of the thesis, Exact-Constraint Design. A main contribution is a generalized modeling approach developed through the course of creating several unique designs. The final chapter is the primary case study of the thesis, the Conceptual Design of a Horizontal Machining Center.

  4. Building machine learning systems with Python

    CERN Document Server

    Coelho, Luis Pedro

    2015-01-01

    This book primarily targets Python developers who want to learn and use Python's machine learning capabilities and gain valuable insights from data to develop effective solutions for business problems.

  5. Applying Machine Learning to Star Cluster Classification

    Science.gov (United States)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  6. Learning as a Machine: Crossovers between Humans and Machines

    Science.gov (United States)

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  7. AVP-IC50 Pred: Multiple machine learning techniques-based prediction of peptide antiviral activity in terms of half maximal inhibitory concentration (IC50).

    Science.gov (United States)

    Qureshi, Abid; Tandon, Himani; Kumar, Manoj

    2015-11-01

    Peptide-based antiviral therapeutics has gradually paved their way into mainstream drug discovery research. Experimental determination of peptides' antiviral activity as expressed by their IC50 values involves a lot of effort. Therefore, we have developed "AVP-IC50 Pred," a regression-based algorithm to predict the antiviral activity in terms of IC50 values (μM). A total of 759 non-redundant peptides from AVPdb and HIPdb were divided into a training/test set having 683 peptides (T(683)) and a validation set with 76 independent peptides (V(76)) for evaluation. We utilized important peptide sequence features like amino-acid compositions, binary profile of N8-C8 residues, physicochemical properties and their hybrids. Four different machine learning techniques (MLTs) namely Support vector machine, Random Forest, Instance-based classifier, and K-Star were employed. During 10-fold cross validation, we achieved maximum Pearson correlation coefficients (PCCs) of 0.66, 0.64, 0.56, 0.55, respectively, for the above MLTs using the best combination of feature sets. All the predictive models also performed well on the independent validation dataset and achieved maximum PCCs of 0.74, 0.68, 0.59, 0.57, respectively, on the best combination of feature sets. The AVP-IC50 Pred web server is anticipated to assist the researchers working on antiviral therapeutics by enabling them to computationally screen many compounds and focus experimental validation on the most promising set of peptides, thus reducing cost and time efforts. The server is available at http://crdd.osdd.net/servers/ic50avp.

  8. Prediction of Driver’s Intention of Lane Change by Augmenting Sensor Information Using Machine Learning Techniques

    Science.gov (United States)

    Kim, Il-Hwan; Bong, Jae-Hwan; Park, Jooyoung; Park, Shinsuk

    2017-01-01

    Driver assistance systems have become a major safety feature of modern passenger vehicles. The advanced driver assistance system (ADAS) is one of the active safety systems to improve the vehicle control performance and, thus, the safety of the driver and the passengers. To use the ADAS for lane change control, rapid and correct detection of the driver’s intention is essential. This study proposes a novel preprocessing algorithm for the ADAS to improve the accuracy in classifying the driver’s intention for lane change by augmenting basic measurements from conventional on-board sensors. The information on the vehicle states and the road surface condition is augmented by using an artificial neural network (ANN) models, and the augmented information is fed to a support vector machine (SVM) to detect the driver’s intention with high accuracy. The feasibility of the developed algorithm was tested through driving simulator experiments. The results show that the classification accuracy for the driver’s intention can be improved by providing an SVM model with sufficient driving information augmented by using ANN models of vehicle dynamics. PMID:28604582

  9. Prediction of Driver’s Intention of Lane Change by Augmenting Sensor Information Using Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Il-Hwan Kim

    2017-06-01

    Full Text Available Driver assistance systems have become a major safety feature of modern passenger vehicles. The advanced driver assistance system (ADAS is one of the active safety systems to improve the vehicle control performance and, thus, the safety of the driver and the passengers. To use the ADAS for lane change control, rapid and correct detection of the driver’s intention is essential. This study proposes a novel preprocessing algorithm for the ADAS to improve the accuracy in classifying the driver’s intention for lane change by augmenting basic measurements from conventional on-board sensors. The information on the vehicle states and the road surface condition is augmented by using an artificial neural network (ANN models, and the augmented information is fed to a support vector machine (SVM to detect the driver’s intention with high accuracy. The feasibility of the developed algorithm was tested through driving simulator experiments. The results show that the classification accuracy for the driver’s intention can be improved by providing an SVM model with sufficient driving information augmented by using ANN models of vehicle dynamics.

  10. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  11. Machine learning in sedimentation modelling.

    Science.gov (United States)

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    The paper presents machine learning (ML) models that predict sedimentation in the harbour basin of the Port of Rotterdam. The important factors affecting the sedimentation process such as waves, wind, tides, surge, river discharge, etc. are studied, the corresponding time series data is analysed, missing values are estimated and the most important variables behind the process are chosen as the inputs. Two ML methods are used: MLP ANN and M5 model tree. The latter is a collection of piece-wise linear regression models, each being an expert for a particular region of the input space. The models are trained on the data collected during 1992-1998 and tested by the data of 1999-2000. The predictive accuracy of the models is found to be adequate for the potential use in the operational decision making.

  12. Machine learning in motion control

    Science.gov (United States)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  13. Machine learning in motion control

    Science.gov (United States)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  14. Mapping Urban Areas with Integration of DMSP/OLS Nighttime Light and MODIS Data Using Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Wenlong Jing

    2015-09-01

    Full Text Available Mapping urban areas at global and regional scales is an urgent and crucial task for detecting urbanization and human activities throughout the world and is useful for discerning the influence of urban expansion upon the ecosystem and the surrounding environment. DMSP-OLS stable nighttime lights have provided an effective way to monitor human activities on a global scale. Threshold-based algorithms have been widely used for extracting urban areas and estimating urban expansion, but the accuracy can decrease because of the empirical and subjective selection of threshold values. This paper proposes an approach for extracting urban areas with the integration of DMSP-OLS stable nighttime lights and MODIS data utilizing training sample datasets selected from DMSP-OLS and MODIS NDVI based on several simple strategies. Four classification algorithms were implemented for comparison: the classification and regression tree (CART, k-nearest-neighbors (k-NN, support vector machine (SVM, and random forests (RF. A case study was carried out on the eastern part of China, covering 99 cities and 1,027,700 km2. The classification results were validated using an independent land cover dataset, and then compared with an existing contextual classification method. The results showed that the new method can achieve results with comparable accuracies, and is easier to implement and less sensitive to the initial thresholds than the contextual method. Among the four classifiers implemented, RF achieved the most stable results and the highest average Kappa. Meanwhile CART produced highly overestimated results compared to the other three classifiers. Although k-NN and SVM tended to produce similar accuracy, less-bright areas around the urban cores seemed to be ignored when using SVM, which led to the underestimation of urban areas. Furthermore, quantity assessment showed that the results produced by k-NN, SVM, and RFs exhibited better agreement in larger cities and low

  15. Machine learning techniques to select Be star candidates. An application in the OGLE-IV Gaia south ecliptic pole field

    Science.gov (United States)

    Pérez-Ortiz, M. F.; García-Varela, A.; Quiroz, A. J.; Sabogal, B. E.; Hernández, J.

    2017-09-01

    Context. Optical and infrared variability surveys produce a large number of high quality light curves. Statistical pattern recognition methods have provided competitive solutions for variable star classification at a relatively low computational cost. In order to perform supervised classification, a set of features is proposed and used to train an automatic classification system. Quantities related to the magnitude density of the light curves and their Fourier coefficients have been chosen as features in previous studies. However, some of these features are not robust to the presence of outliers and the calculation of Fourier coefficients is computationally expensive for large data sets. Aims: We propose and evaluate the performance of a new robust set of features using supervised classifiers in order to look for new Be star candidates in the OGLE-IV Gaia south ecliptic pole field. Methods: We calculated the proposed set of features on six types of variable stars and also on a set of Be star candidates reported in the literature. We evaluated the performance of these features using classification trees and random forests along with the K-nearest neighbours, support vector machines, and gradient boosted trees methods. We tuned the classifiers with a 10-fold cross-validation and grid search. We then validated the performance of the best classifier on a set of OGLE-IV light curves and applied this to find new Be star candidates. Results: The random forest classifier outperformed the others. By using the random forest classifier and colours criteria we found 50 Be star candidates in the direction of the Gaia south ecliptic pole field, four of which have infrared colours that are consistent with Herbig Ae/Be stars. Conclusions: Supervised methods are very useful in order to obtain preliminary samples of variable stars extracted from large databases. As usual, the stars classified as Be stars candidates must be checked for the colours and spectroscopic characteristics

  16. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  17. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  18. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  19. Adaptive Learning Systems: Beyond Teaching Machines

    Science.gov (United States)

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  20. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology

    Directory of Open Access Journals (Sweden)

    Jieru Zhang

    2016-01-01

    Full Text Available Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram, have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  1. A Novel Flavour Tagging Algorithm using Machine Learning Techniques and a Precision Measurement of the $B^0 - \\overline{B^0}$ Oscillation Frequency at the LHCb Experiment

    CERN Document Server

    Kreplin, Katharina

    This thesis presents a novel flavour tagging algorithm using machine learning techniques and a precision measurement of the $B^0 -\\overline{B^0}$ oscillation frequency $\\Delta m_d$ using semileptonic $B^0$ decays. The LHC Run I data set is used which corresponds to $3 \\textrm{fb}^{-1}$ of data taken by the LHCb experiment at a center-of-mass energy of 7 TeV and 8 TeV. The performance of flavour tagging algorithms, exploiting the $b\\bar{b}$ pair production and the $b$ quark hadronization, is relatively low at the LHC due to the large amount of soft QCD background in inelastic proton-proton collisions. The standard approach is a cut-based selection of particles, whose charges are correlated to the production flavour of the $B$ meson. The novel tagging algorithm classifies the particles using an artificial neural network (ANN). It assigns higher weights to particles, which are likely to be correlated to the $b$ flavour. A second ANN combines the particles with the highest weights to derive the tagging decision. ...

  2. Scaling Datalog for Machine Learning on Big Data

    CERN Document Server

    Bu, Yingyi; Carey, Michael J; Rosen, Joshua; Polyzotis, Neoklis; Condie, Tyson; Weimer, Markus; Ramakrishnan, Raghu

    2012-01-01

    In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine. As a proof of concept, we consider two programming models--Pregel and Iterative Map-Reduce-Update---from the machine learning domain, and show how they can be captured in Datalog, tuned for a specific task, and then compiled into an optimized physical plan. Experiments performed on a large computing cluster with real data demonstrate that this declarative approach can provide very good performance while offering both increased generality and programming ease.

  3. International Conference on Extreme Learning Machines 2014

    CERN Document Server

    Mao, Kezhi; Cambria, Erik; Man, Zhihong; Toh, Kar-Ann

    2015-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2014, which was held in Singapore, December 8-10, 2014. This conference brought together the researchers and practitioners of Extreme Learning Machine (ELM) from a variety of fields to promote research and development of “learning without iterative tuning”.  The book covers theories, algorithms and applications of ELM. It gives the readers a glance of the most recent advances of ELM.  

  4. Visual quality assessment by machine learning

    CERN Document Server

    Xu, Long; Kuo, C -C Jay

    2015-01-01

    The book encompasses the state-of-the-art visual quality assessment (VQA) and learning based visual quality assessment (LB-VQA) by providing a comprehensive overview of the existing relevant methods. It delivers the readers the basic knowledge, systematic overview and new development of VQA. It also encompasses the preliminary knowledge of Machine Learning (ML) to VQA tasks and newly developed ML techniques for the purpose. Hence, firstly, it is particularly helpful to the beginner-readers (including research students) to enter into VQA field in general and LB-VQA one in particular. Secondly, new development in VQA and LB-VQA particularly are detailed in this book, which will give peer researchers and engineers new insights in VQA.

  5. Machine learning methods without tears: a primer for ecologists.

    Science.gov (United States)

    Olden, Julian D; Lawler, Joshua J; Poff, N LeRoy

    2008-06-01

    Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.

  6. Photometric Supernova Classification with Machine Learning

    Science.gov (United States)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  7. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  8. CD process control through machine learning

    Science.gov (United States)

    Utzny, Clemens

    2016-10-01

    For the specific requirements of the 14nm and 20nm site applications a new CD map approach was developed at the AMTC. This approach relies on a well established machine learning technique called recursive partitioning. Recursive partitioning is a powerful technique which creates a decision tree by successively testing whether the quantity of interest can be explained by one of the supplied covariates. The test performed is generally a statistical test with a pre-supplied significance level. Once the test indicates significant association between the variable of interest and a covariate a split performed at a threshold value which minimizes the variation within the newly attained groups. This partitioning is recurred until either no significant association can be detected or the resulting sub group size falls below a pre-supplied level.

  9. Prediction of antiepileptic drug treatment outcomes using machine learning

    Science.gov (United States)

    Colic, Sinisa; Wither, Robert G.; Lang, Min; Zhang, Liang; Eubanks, James H.; Bardakjian, Berj L.

    2017-02-01

    Objective. Antiepileptic drug (AED) treatments produce inconsistent outcomes, often necessitating patients to go through several drug trials until a successful treatment can be found. This study proposes the use of machine learning techniques to predict epilepsy treatment outcomes of commonly used AEDs. Approach. Machine learning algorithms were trained and evaluated using features obtained from intracranial electroencephalogram (iEEG) recordings of the epileptiform discharges observed in Mecp2-deficient mouse model of the Rett Syndrome. Previous work have linked the presence of cross-frequency coupling (I CFC) of the delta (2-5 Hz) rhythm with the fast ripple (400-600 Hz) rhythm in epileptiform discharges. Using the I CFC to label post-treatment outcomes we compared support vector machines (SVMs) and random forest (RF) machine learning classifiers for providing likelihood scores of successful treatment outcomes. Main results. (a) There was heterogeneity in AED treatment outcomes, (b) machine learning techniques could be used to rank the efficacy of AEDs by estimating likelihood scores for successful treatment outcome, (c) I CFC features yielded the most effective a priori identification of appropriate AED treatment, and (d) both classifiers performed comparably. Significance. Machine learning approaches yielded predictions of successful drug treatment outcomes which in turn could reduce the burdens of drug trials and lead to substantial improvements in patient quality of life.

  10. An introduction to machine learning with Scikit-Learn

    CERN Document Server

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  11. PCP-ML: protein characterization package for machine learning.

    Science.gov (United States)

    Eickholt, Jesse; Wang, Zheng

    2014-11-18

    Machine Learning (ML) has a number of demonstrated applications in protein prediction tasks such as protein structure prediction. To speed further development of machine learning based tools and their release to the community, we have developed a package which characterizes several aspects of a protein commonly used for protein prediction tasks with machine learning. A number of software libraries and modules exist for handling protein related data. The package we present in this work, PCP-ML, is unique in its small footprint and emphasis on machine learning. Its primary focus is on characterizing various aspects of a protein through sets of numerical data. The generated data can then be used with machine learning tools and/or techniques. PCP-ML is very flexible in how the generated data is formatted and as a result is compatible with a variety of existing machine learning packages. Given its small size, it can be directly packaged and distributed with community developed tools for protein prediction tasks. Source code and example programs are available under a BSD license at http://mlid.cps.cmich.edu/eickh1jl/tools/PCPML/. The package is implemented in C++ and accessible as a Python module.

  12. 2015 International Conference on Machine Learning and Signal Processing

    CERN Document Server

    Woo, Wai; Sulaiman, Hamzah; Othman, Mohd; Saat, Mohd

    2016-01-01

    This book presents important research findings and recent innovations in the field of machine learning and signal processing. A wide range of topics relating to machine learning and signal processing techniques and their applications are addressed in order to provide both researchers and practitioners with a valuable resource documenting the latest advances and trends. The book comprises a careful selection of the papers submitted to the 2015 International Conference on Machine Learning and Signal Processing (MALSIP 2015), which was held on 15–17 December 2015 in Ho Chi Minh City, Vietnam with the aim of offering researchers, academicians, and practitioners an ideal opportunity to disseminate their findings and achievements. All of the included contributions were chosen by expert peer reviewers from across the world on the basis of their interest to the community. In addition to presenting the latest in design, development, and research, the book provides access to numerous new algorithms for machine learni...

  13. Less is more: regularization perspectives on large scale machine learning

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Deep learning based techniques provide a possible solution at the expanse of theoretical guidance and, especially, of computational requirements. It is then a key challenge for large scale machine learning to devise approaches guaranteed to be accurate and yet computationally efficient. In this talk, we will consider a regularization perspectives on machine learning appealing to classical ideas in linear algebra and inverse problems to scale-up dramatically nonparametric methods such as kernel methods, often dismissed because of prohibitive costs. Our analysis derives optimal theoretical guarantees while providing experimental results at par or out-performing state of the art approaches.

  14. A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Hayden Wimmer

    2014-11-01

    Full Text Available While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM, a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Bayesian classifiers. This applied research reveals practical implications for applying PPDM to data mining and machine learning and serves as a critical first step learning how to apply PPDM to machine learning algorithms and the effects of PPDM on machine learning. Results indicate that certain machine learning algorithms are more suited for use with PPDM techniques.

  15. Stacking for machine learning redshifts applied to SDSS galaxies

    OpenAIRE

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-01-01

    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised...

  16. Lane Detection Based on Machine Learning Algorithm

    National Research Council Canada - National Science Library

    Chao Fan; Jingbo Xu; Shuai Di

    2013-01-01

    In order to improve accuracy and robustness of the lane detection in complex conditions, such as the shadows and illumination changing, a novel detection algorithm was proposed based on machine learning...

  17. Implementing Machine Learning in the PCWG Tool

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    2016-12-13

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  18. Development of Machine Learning Tools in ROOT

    Science.gov (United States)

    Gleyzer, S. V.; Moneta, L.; Zapata, Omar A.

    2016-10-01

    ROOT is a framework for large-scale data analysis that provides basic and advanced statistical methods used by the LHC experiments. These include machine learning algorithms from the ROOT-integrated Toolkit for Multivariate Analysis (TMVA). We present several recent developments in TMVA, including a new modular design, new algorithms for variable importance and cross-validation, interfaces to other machine-learning software packages and integration of TMVA with Jupyter, making it accessible with a browser.

  19. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  20. Addressing uncertainty in atomistic machine learning.

    Science.gov (United States)

    Peterson, Andrew A; Christensen, Rune; Khorshidi, Alireza

    2017-05-10

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate of the uncertainty when the width is comparable to that in the training data. Intriguingly, we also show that the uncertainty can be localized to specific atoms in the simulation, which may offer hints for the generation of training data to strategically improve the machine-learned representation.

  1. Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

    Directory of Open Access Journals (Sweden)

    Cíntia Matsuda Toledo

    Full Text Available Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.OBJECTIVE: The aims were to describe how to: (i develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii automatically identify the features that best distinguish the groups.METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age. In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.RESULTS AND CONCLUSION:A Support Vector Machine (SVM with a radial basis function (RBF kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS is a strong candidate to replace manual feature selection methods.

  2. Machine learning of Calabi-Yau volumes

    Science.gov (United States)

    Krefl, Daniel; Seong, Rak-Kyeong

    2017-09-01

    We employ machine learning techniques to investigate the volume minimum of Sasaki-Einstein base manifolds of noncompact toric Calabi-Yau three-folds. We find that the minimum volume can be approximated via a second-order multiple linear regression on standard topological quantities obtained from the corresponding toric diagram. The approximation improves further after invoking a convolutional neural network with the full toric diagram of the Calabi-Yau three-folds as the input. We are thereby able to circumvent any minimization procedure that was previously necessary and find an explicit mapping between the minimum volume and the topological quantities of the toric diagram. Under the AdS/CFT correspondence, the minimum volumes of Sasaki-Einstein manifolds correspond to central charges of a class of 4 d N =1 superconformal field theories. We therefore find empirical evidence for a function that gives values of central charges without the usual extremization procedure.

  3. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    Science.gov (United States)

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  4. Machine Learning with Operational Costs

    CERN Document Server

    Tulabandhula, Theja

    2011-01-01

    This work concerns the way that statistical models are used to make decisions. In particular, we aim to merge the way estimation algorithms are designed with how they are used for a subsequent task. Our methodology considers the operational cost of carrying out a policy, based on a predictive model. The operational cost becomes a regularization term in the learning algorithm's objective function, allowing either an \\textit{optimistic} or \\textit{pessimistic} view of possible costs. Limiting the operational cost reduces the hypothesis space for the predictive model, and can thus improve generalization. We show that different types of operational problems can lead to the same type of restriction on the hypothesis space, namely the restriction to an intersection of an $\\ell_{q}$ ball with a halfspace. We bound the complexity of such hypothesis spaces by proposing a technique that involves counting integer points in polyhedrons.

  5. Teraflop-scale Incremental Machine Learning

    CERN Document Server

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-using previous solutions, learning programming idioms and discovery of frequent subprograms. Experiments with two training sequences demonstrate that our approach to incremental learning is effective.

  6. Applications of Machine Learning in Cancer Prediction and Prognosis

    Directory of Open Access Journals (Sweden)

    Joseph A. Cruz

    2006-01-01

    Full Text Available Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25% improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  7. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.

  8. Automated mapping of building facades by machine learning

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    Facades of buildings contain various types of objects which have to be recorded for information systems. The article describes a solution for this task focussing on automated classification by means of machine learning techniques. Stereo pairs of oblique images are used to derive 3D point clouds...

  9. Machine learning versus knowledge based classification of legal texts

    NARCIS (Netherlands)

    de Maat, E.; Krabben, K.; Winkels, R.

    2010-01-01

    This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but

  10. Induction and physical theory formation by Machine Learning

    CERN Document Server

    Svozil, Alexander

    2016-01-01

    Machine learning presents a general, systematic framework for the generation of formal theoretical models for physical description and prediction. Tentatively standard linear modeling techniques are reviewed; followed by a brief discussion of generalizations to deep forward networks for approximating nonlinear phenomena.

  11. Acquiring Software Design Schemas: A Machine Learning Perspective

    Science.gov (United States)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  12. Advances in machine learning and data mining for astronomy

    CERN Document Server

    Way, Michael J

    2012-01-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health

  13. Machine learning models in breast cancer survival prediction.

    Science.gov (United States)

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of

  14. Machine Learning Tools for Geomorphic Mapping of Planetary Surfaces

    OpenAIRE

    Stepinski, Tomasz F.; Vilalta, Ricardo

    2010-01-01

    Geomorphic auto-mapping of planetary surfaces is a challenging problem. Here we have described how machine learning techniques, such as clustering or classification, can be utilized to automate the process of geomorphic mapping for exploratory and exploitation purposes. Relatively coarse resolution of planetary topographic data limits the number of features that can be used in the learning process and makes planetary auto-mapping more challenging than terrestrial auto-mapping. With this cavea...

  15. Deep Extreme Learning Machine and Its Application in EEG Classification

    OpenAIRE

    Shifei Ding; Nan Zhang; Xinzheng Xu; Lili Guo; Jian Zhang

    2015-01-01

    Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM) is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM appr...

  16. Deep Extreme Learning Machine and Its Application in EEG Classification

    OpenAIRE

    2015-01-01

    Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM) is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM appr...

  17. Heterogeneous versus Homogeneous Machine Learning Ensembles

    Directory of Open Access Journals (Sweden)

    Petrakova Aleksandra

    2015-12-01

    Full Text Available The research demonstrates efficiency of the heterogeneous model ensemble application for a cancer diagnostic procedure. Machine learning methods used for the ensemble model training are neural networks, random forest, support vector machine and offspring selection genetic algorithm. Training of models and the ensemble design is performed by means of HeuristicLab software. The data used in the research have been provided by the General Hospital of Linz, Austria.

  18. Virtual Machine Monitor Indigenous Memory Reclamation Technique

    Directory of Open Access Journals (Sweden)

    Muhammad Shams Ul Haq

    2016-04-01

    Full Text Available Sandboxing is a mechanism to monitor and control the execution of malicious or untrusted program. Memory overhead incurred by sandbox solutions is one of bottleneck for sandboxing most of applications in a system. Memory reclamation techniques proposed for traditional full virtualization do not suit sandbox environment due to lack of full scale guest operating system in sandbox. In this paper, we propose memory reclamation technique for sandboxed applications. The proposed technique indigenously works in virtual machine monitor layer without installing any driver in VMX non root mode and without new communication channel with host kernel. Proposed Page reclamation algorithm is a simple modified form of Least recently used page reclamation and Working set page reclamation algorithms. For efficiently collecting working set of application, we use a hardware virtualization extension, page Modification logging introduced by Intel. We implemented proposed technique with one of open source sandboxes to show effectiveness of proposed memory reclamation method. Experimental results show that proposed technique successfully reclaim up to 11% memory from sandboxed applications with negligible CPU overheads

  19. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  20. GEOLOGICAL MAPPING USING MACHINE LEARNING ALGORITHMS

    Directory of Open Access Journals (Sweden)

    A. S. Harvey

    2016-06-01

    Full Text Available Remotely sensed spectral imagery, geophysical (magnetic and gravity, and geodetic (elevation data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA, which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  1. Geological Mapping Using Machine Learning Algorithms

    Science.gov (United States)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  2. Recognition of printed Arabic text using machine learning

    Science.gov (United States)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  3. Machine Learning Optimization of Evolvable Artificial Cells

    DEFF Research Database (Denmark)

    Caschera, F.; Rasmussen, S.; Hanczyc, M.

    2011-01-01

    can be explored. A machine learning approach (Evo-DoE) could be applied to explore this experimental space and define optimal interactions according to a specific fitness function. Herein an implementation of an evolutionary design of experiments to optimize chemical and biochemical systems based...... on a machine learning process is presented. The optimization proceeds over generations of experiments in iterative loop until optimal compositions are discovered. The fitness function is experimentally measured every time the loop is closed. Two examples of complex systems, namely a liposomal drug formulation...

  4. Paradigms for Realizing Machine Learning Algorithms.

    Science.gov (United States)

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data.

  5. A Machine Learning Approach to Automated Negotiation

    Institute of Scientific and Technical Information of China (English)

    Zhang Huaxiang(张化祥); Zhang Liang; Huang Shangteng; Ma Fanyuan

    2004-01-01

    Automated negotiation between two competitive agents is analyzed, and a multi-issue negotiation model based on machine learning, time belief, offer belief and state-action pair expected Q value is developed. Unlike the widely used approaches such as game theory approach, heuristic approach and argumentation approach, This paper uses a machine learning method to compute agents' average Q values in each negotiation stage. The delayed reward is used to generate agents' offer and counteroffer of every issue. The effect of time and discount rate on negotiation outcome is analyzed. Theory analysis and experimental data show this negotiation model is practical.

  6. Optimization of machining techniques – A retrospective and literature review

    Indian Academy of Sciences (India)

    Aman Aggarwal; Hari Singh

    2005-12-01

    In this paper an attempt is made to review the literature on optimizing machining parameters in turning processes. Various conventional techniques employed for machining optimization include geometric programming, geometric plus linear programming, goal programming, sequential unconstrained minimizationtechnique, dynamic programming etc. The latest techniques for optimization include fuzzy logic, scatter search technique, genetic algorithm, Taguchi technique and response surface methodology.

  7. A Machine Learning Based Framework for Adaptive Mobile Learning

    Science.gov (United States)

    Al-Hmouz, Ahmed; Shen, Jun; Yan, Jun

    Advances in wireless technology and handheld devices have created significant interest in mobile learning (m-learning) in recent years. Students nowadays are able to learn anywhere and at any time. Mobile learning environments must also cater for different user preferences and various devices with limited capability, where not all of the information is relevant and critical to each learning environment. To address this issue, this paper presents a framework that depicts the process of adapting learning content to satisfy individual learner characteristics by taking into consideration his/her learning style. We use a machine learning based algorithm for acquiring, representing, storing, reasoning and updating each learner acquired profile.

  8. Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.

    Science.gov (United States)

    Vock, David M; Wolfson, Julian; Bandyopadhyay, Sunayan; Adomavicius, Gediminas; Johnson, Paul E; Vazquez-Benitez, Gabriela; O'Connor, Patrick J

    2016-06-01

    Models for predicting the probability of experiencing various health outcomes or adverse events over a certain time frame (e.g., having a heart attack in the next 5years) based on individual patient characteristics are important tools for managing patient care. Electronic health data (EHD) are appealing sources of training data because they provide access to large amounts of rich individual-level data from present-day patient populations. However, because EHD are derived by extracting information from administrative and clinical databases, some fraction of subjects will not be under observation for the entire time frame over which one wants to make predictions; this loss to follow-up is often due to disenrollment from the health system. For subjects without complete follow-up, whether or not they experienced the adverse event is unknown, and in statistical terms the event time is said to be right-censored. Most machine learning approaches to the problem have been relatively ad hoc; for example, common approaches for handling observations in which the event status is unknown include (1) discarding those observations, (2) treating them as non-events, (3) splitting those observations into two observations: one where the event occurs and one where the event does not. In this paper, we present a general-purpose approach to account for right-censored outcomes using inverse probability of censoring weighting (IPCW). We illustrate how IPCW can easily be incorporated into a number of existing machine learning algorithms used to mine big health care data including Bayesian networks, k-nearest neighbors, decision trees, and generalized additive models. We then show that our approach leads to better calibrated predictions than the three ad hoc approaches when applied to predicting the 5-year risk of experiencing a cardiovascular adverse event, using EHD from a large U.S. Midwestern healthcare system.

  9. Stacking for machine learning redshifts applied to SDSS galaxies

    Science.gov (United States)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called `stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We show how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organizing maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9 per cent and 21 per cent on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4 per cent and 2.5 per cent for the explored metrics and comes at almost no additional computational cost.

  10. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    Science.gov (United States)

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  11. Machine learning with quantum relative entropy

    Energy Technology Data Exchange (ETDEWEB)

    Tsuda, Koji [Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tuebingen, 72076 (Germany)], E-mail: koji.tsuda@tuebingen.mpg.de

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  12. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  13. Data Mining and Machine Learning in Astronomy

    CERN Document Server

    Ball, Nicholas M

    2009-01-01

    We review the current state of data mining and machine learning in Astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. On the one hand, it is a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, which promises almost limitless scientific advances. On the other, it can be the application of black-box computing algorithms that at best give little physical insight, and at worst provide questionable results. Here, we give an overview of the entire data mining process, from data collection through the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines; applications from a broad range of Astronomy, with an emphasis on those where data mining resulted in improved physical insights, and important current and future directions, including the construction of full probability density functions, parallel algorithm...

  14. Tracking by Machine Learning Methods

    CERN Document Server

    Jofrehei, Arash

    2015-01-01

    Current track reconstructing methods start with two points and then for each layer loop through all possible hits to find proper hits to add to that track. Another idea would be to use this large number of already reconstructed events and/or simulated data and train a machine on this data to find tracks given hit pixels. Training time could be long but real time tracking is really fast Simulation might not be as realistic as real data but tacking has been done for that with 100 percent efficiency while by using real data we would probably be limited to current efficiency.

  15. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of poten

  16. Supporting visual quality assessment with machine learning

    NARCIS (Netherlands)

    Gastaldo, P.; Zunino, R.; Redi, J.

    2013-01-01

    Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality

  17. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  18. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  19. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  20. Machine learning a theoretical approach

    CERN Document Server

    Natarajan, Balas K

    2014-01-01

    This is the first comprehensive introduction to computational learning theory. The author's uniform presentation of fundamental results and their applications offers AI researchers a theoretical perspective on the problems they study. The book presents tools for the analysis of probabilistic models of learning, tools that crisply classify what is and is not efficiently learnable. After a general introduction to Valiant's PAC paradigm and the important notion of the Vapnik-Chervonenkis dimension, the author explores specific topics such as finite automata and neural networks. The presentation

  1. Stacking for machine learning redshifts applied to SDSS galaxies

    CERN Document Server

    Zitlau, Roman; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-01-01

    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organising maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9% and 21% on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When appl...

  2. Interface Metaphors for Interactive Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Jasper, Robert J.; Blaha, Leslie M.

    2017-07-14

    To promote more interactive and dynamic machine learn- ing, we revisit the notion of user-interface metaphors. User-interface metaphors provide intuitive constructs for supporting user needs through interface design elements. A user-interface metaphor provides a visual or action pattern that leverages a user’s knowledge of another domain. Metaphors suggest both the visual representations that should be used in a display as well as the interactions that should be afforded to the user. We argue that user-interface metaphors can also offer a method of extracting interaction-based user feedback for use in machine learning. Metaphors offer indirect, context-based information that can be used in addition to explicit user inputs, such as user-provided labels. Implicit information from user interactions with metaphors can augment explicit user input for active learning paradigms. Or it might be leveraged in systems where explicit user inputs are more challenging to obtain. Each interaction with the metaphor provides an opportunity to gather data and learn. We argue this approach is especially important in streaming applications, where we desire machine learning systems that can adapt to dynamic, changing data.

  3. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  4. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  5. Financial signal processing and machine learning

    CERN Document Server

    Kulkarni,Sanjeev R; Dmitry M. Malioutov

    2016-01-01

    The modern financial industry has been required to deal with large and diverse portfolios in a variety of asset classes often with limited market data available. Financial Signal Processing and Machine Learning unifies a number of recent advances made in signal processing and machine learning for the design and management of investment portfolios and financial engineering. This book bridges the gap between these disciplines, offering the latest information on key topics including characterizing statistical dependence and correlation in high dimensions, constructing effective and robust risk measures, and their use in portfolio optimization and rebalancing. The book focuses on signal processing approaches to model return, momentum, and mean reversion, addressing theoretical and implementation aspects. It highlights the connections between portfolio theory, sparse learning and compressed sensing, sparse eigen-portfolios, robust optimization, non-Gaussian data-driven risk measures, graphical models, causal analy...

  6. Machine Learning for Vision-Based Motion Analysis

    CERN Document Server

    Wang, Liang; Cheng, Li; Pietikainen, Matti

    2011-01-01

    Techniques of vision-based motion analysis aim to detect, track, identify, and generally understand the behavior of objects in image sequences. With the growth of video data in a wide range of applications from visual surveillance to human-machine interfaces, the ability to automatically analyze and understand object motions from video footage is of increasing importance. Among the latest developments in this field is the application of statistical machine learning algorithms for object tracking, activity modeling, and recognition. Developed from expert contributions to the first and second In

  7. Predicting Increased Blood Pressure Using Machine Learning

    Science.gov (United States)

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Soares, Telma de Jesus; dos Reis, Luciana Araujo

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R2 (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R2 (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313

  8. Image Segmentation for Connectomics Using Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Tasdizen, Tolga; Seyedhosseini, Mojtaba; Liu, TIng; Jones, Cory; Jurrus, Elizabeth R.

    2014-12-01

    Reconstruction of neural circuits at the microscopic scale of individual neurons and synapses, also known as connectomics, is an important challenge for neuroscience. While an important motivation of connectomics is providing anatomical ground truth for neural circuit models, the ability to decipher neural wiring maps at the individual cell level is also important in studies of many neurodegenerative diseases. Reconstruction of a neural circuit at the individual neuron level requires the use of electron microscopy images due to their extremely high resolution. Computational challenges include pixel-by-pixel annotation of these images into classes such as cell membrane, mitochondria and synaptic vesicles and the segmentation of individual neurons. State-of-the-art image analysis solutions are still far from the accuracy and robustness of human vision and biologists are still limited to studying small neural circuits using mostly manual analysis. In this chapter, we describe our image analysis pipeline that makes use of novel supervised machine learning techniques to tackle this problem.

  9. Predicting Increased Blood Pressure Using Machine Learning

    Directory of Open Access Journals (Sweden)

    Hudson Fernandes Golino

    2014-01-01

    Full Text Available The present study investigates the prediction of increased blood pressure by body mass index (BMI, waist (WC and hip circumference (HC, and waist hip ratio (WHR using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42, misclassification (.19, and the higher pseudo R2 (.43. This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25, misclassification (.16, and the higher pseudo R2 (.46. This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power.

  10. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  11. Intelligent Machine Learning Approaches for Aerospace Applications

    Science.gov (United States)

    Sathyan, Anoop

    Machine Learning is a type of artificial intelligence that provides machines or networks the ability to learn from data without the need to explicitly program them. There are different kinds of machine learning techniques. This thesis discusses the applications of two of these approaches: Genetic Fuzzy Logic and Convolutional Neural Networks (CNN). Fuzzy Logic System (FLS) is a powerful tool that can be used for a wide variety of applications. FLS is a universal approximator that reduces the need for complex mathematics and replaces it with expert knowledge of the system to produce an input-output mapping using If-Then rules. The expert knowledge of a system can help in obtaining the parameters for small-scale FLSs, but for larger networks we will need to use sophisticated approaches that can automatically train the network to meet the design requirements. This is where Genetic Algorithms (GA) and EVE come into the picture. Both GA and EVE can tune the FLS parameters to minimize a cost function that is designed to meet the requirements of the specific problem. EVE is an artificial intelligence developed by Psibernetix that is trained to tune large scale FLSs. The parameters of an FLS can include the membership functions and rulebase of the inherent Fuzzy Inference Systems (FISs). The main issue with using the GFS is that the number of parameters in a FIS increase exponentially with the number of inputs thus making it increasingly harder to tune them. To reduce this issue, the FLSs discussed in this thesis consist of 2-input-1-output FISs in cascade (Chapter 4) or as a layer of parallel FISs (Chapter 7). We have obtained extremely good results using GFS for different applications at a reduced computational cost compared to other algorithms that are commonly used to solve the corresponding problems. In this thesis, GFSs have been designed for controlling an inverted double pendulum, a task allocation problem of clustering targets amongst a set of UAVs, a fire

  12. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  13. An efficient learning procedure for deep Boltzmann machines.

    Science.gov (United States)

    Salakhutdinov, Ruslan; Hinton, Geoffrey

    2012-08-01

    We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer pretraining phase that initializes the weights sensibly. The pretraining also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB data sets showing that deep Boltzmann machines learn very good generative models of handwritten digits and 3D objects. We also show that the features discovered by deep Boltzmann machines are a very effective way to initialize the hidden layers of feedforward neural nets, which are then discriminatively fine-tuned.

  14. Reduced multiple empirical kernel learning machine.

    Science.gov (United States)

    Wang, Zhe; Lu, MingZhe; Gao, Daqi

    2015-02-01

    Multiple kernel learning (MKL) is demonstrated to be flexible and effective in depicting heterogeneous data sources since MKL can introduce multiple kernels rather than a single fixed kernel into applications. However, MKL would get a high time and space complexity in contrast to single kernel learning, which is not expected in real-world applications. Meanwhile, it is known that the kernel mapping ways of MKL generally have two forms including implicit kernel mapping and empirical kernel mapping (EKM), where the latter is less attracted. In this paper, we focus on the MKL with the EKM, and propose a reduced multiple empirical kernel learning machine named RMEKLM for short. To the best of our knowledge, it is the first to reduce both time and space complexity of the MKL with EKM. Different from the existing MKL, the proposed RMEKLM adopts the Gauss Elimination technique to extract a set of feature vectors, which is validated that doing so does not lose much information of the original feature space. Then RMEKLM adopts the extracted feature vectors to span a reduced orthonormal subspace of the feature space, which is visualized in terms of the geometry structure. It can be demonstrated that the spanned subspace is isomorphic to the original feature space, which means that the dot product of two vectors in the original feature space is equal to that of the two corresponding vectors in the generated orthonormal subspace. More importantly, the proposed RMEKLM brings a simpler computation and meanwhile needs a less storage space, especially in the processing of testing. Finally, the experimental results show that RMEKLM owns a much efficient and effective performance in terms of both complexity and classification. The contributions of this paper can be given as follows: (1) by mapping the input space into an orthonormal subspace, the geometry of the generated subspace is visualized; (2) this paper first reduces both the time and space complexity of the EKM-based MKL; (3

  15. Machine Learning for Computer Vision

    CERN Document Server

    Battiato, Sebastiano; Farinella, Giovanni

    2013-01-01

    Computer vision is the science and technology of making machines that see. It is concerned with the theory, design and implementation of algorithms that can automatically process visual data to recognize objects, track and recover their shape and spatial layout. The International Computer Vision Summer School - ICVSS was established in 2007 to provide both an objective and clear overview and an in-depth analysis of the state-of-the-art research in Computer Vision. The courses are delivered by world renowned experts in the field, from both academia and industry, and cover both theoretical and practical aspects of real Computer Vision problems. The school is organized every year by University of Cambridge (Computer Vision and Robotics Group) and University of Catania (Image Processing Lab). Different topics are covered each year. A summary of the past Computer Vision Summer Schools can be found at: http://www.dmi.unict.it/icvss This edited volume contains a selection of articles covering some of the talks and t...

  16. Study on Applying Hybrid Machine Learning into Family Apparel Expenditure

    Institute of Scientific and Technical Information of China (English)

    SHEN Lei

    2008-01-01

    Hybrid Machine Learning (HMD is a kind of advanced algorithm in the field of intelligent information process.It combines the induced learning based-on decision-making tree with the blocking neural network.And it provides a useful intelligent knowledge-based data mining technique.Its core algorithm is ID3 and Field Theory based ART (FTART).The paper introduces the principals of hybrid machine learning firstly, and then applies it into analyzing family apparel expenditures and their influencing factors systematically.Finally, compared with those from the traditional statistic methods, the results from HML is more friendly and easily to be understood.Besides, the forecasting by HML is more correct than by the traditional ways.

  17. Machine learning based Intelligent cognitive network using fog computing

    Science.gov (United States)

    Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

    2017-05-01

    In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.

  18. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

    Science.gov (United States)

    Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao

    2017-09-04

    Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. An Interactive Web-based Learning System for Assisting Machining Technology Education

    Directory of Open Access Journals (Sweden)

    Min Jou

    2008-05-01

    Full Text Available The key technique of manufacturing methods is machining. The degree of technique of machining directly affects the quality of the product. Therefore, the machining technique is of primary importance in promoting student practice ability during the training process. Currently, practical training is applied in shop floor to discipline student’s practice ability. Much time and cost are used to teach these techniques. Particularly, computerized machines are continuously increasing in use. The development of educating engineers on computerized machines becomes much more difficult than with traditional machines. This is because of the limitation of the extremely expensive cost of teaching. The quality and quantity of teaching cannot always be promoted in this respect. The traditional teaching methods can not respond well to the needs of the future. Therefore, this research aims to the following topics; (1.Propose the teaching strategies for the students to learning machining processing planning through web-based learning system. (2.Establish on-line teaching material for the computer-aided manufacturing courses including CNC coding method, CNC simulation. (3.Develop the virtual machining laboratory to bring the machining practical training to web-based learning system. (4.Integrate multi-media and virtual laboratory in the developed e-learning web-based system to enhance the effectiveness of machining education through web-based system.

  20. Estimating extinction using unsupervised machine learning

    Science.gov (United States)

    Meingast, Stefan; Lombardi, Marco; Alves, João

    2017-05-01

    Dust extinction is the most robust tracer of the gas distribution in the interstellar medium, but measuring extinction is limited by the systematic uncertainties involved in estimating the intrinsic colors to background stars. In this paper we present a new technique, Pnicer, that estimates intrinsic colors and extinction for individual stars using unsupervised machine learning algorithms. This new method aims to be free from any priors with respect to the column density and intrinsic color distribution. It is applicable to any combination of parameters and works in arbitrary numbers of dimensions. Furthermore, it is not restricted to color space. Extinction toward single sources is determined by fitting Gaussian mixture models along the extinction vector to (extinction-free) control field observations. In this way it becomes possible to describe the extinction for observed sources with probability densities, rather than a single value. Pnicer effectively eliminates known biases found in similar methods and outperforms them in cases of deep observational data where the number of background galaxies is significant, or when a large number of parameters is used to break degeneracies in the intrinsic color distributions. This new method remains computationally competitive, making it possible to correctly de-redden millions of sources within a matter of seconds. With the ever-increasing number of large-scale high-sensitivity imaging surveys, Pnicer offers a fast and reliable way to efficiently calculate extinction for arbitrary parameter combinations without prior information on source characteristics. The Pnicer software package also offers access to the well-established Nicer technique in a simple unified interface and is capable of building extinction maps including the Nicest correction for cloud substructure. Pnicer is offered to the community as an open-source software solution and is entirely written in Python.

  1. Machine learning techniques in searches for $t\\bar{t}h$ in the $h \\rightarrow b\\bar{b}$ decay channel

    CERN Document Server

    Santos, Roberto; Ryu, Soo; Adelman, Jahred; Chekanov, Sergei; Zhou, Jie

    2016-01-01

    Study of the production of pairs of top quarks in association with a Higgs boson is one of the primary goals of the Large Hadron Collider over the next decade, as measurements of this process may help us to understand whether the uniquely large mass of the top quark plays a special role in electroweak symmetry breaking. Higgs bosons decay predominantly to $b\\bar{b}$, yielding signatures for the signal that are similar to $t\\bar{t}$ + jets with heavy flavor. Though particularly challenging to study due to the similar kinematics between signal and background events, such final states ($t\\bar{t} b\\bar{b}$) are an important channel for studying the top quark Yukawa coupling. This paper presents a systematic study of machine learning (ML) methods for detecting $t\\bar{t}h$ in the $h \\rightarrow b\\bar{b}$ decay channel. Among the seven ML methods tested, we show that neural network models outperform alternative methods. In addition, two neural models used in this paper outperform NeuroBayes, one of the standard algo...

  2. Application of Machine Learning to Rotorcraft Health Monitoring

    Science.gov (United States)

    Cody, Tyler; Dempsey, Paula J.

    2017-01-01

    Machine learning is a powerful tool for data exploration and model building with large data sets. This project aimed to use machine learning techniques to explore the inherent structure of data from rotorcraft gear tests, relationships between features and damage states, and to build a system for predicting gear health for future rotorcraft transmission applications. Classical machine learning techniques are difficult, if not irresponsible to apply to time series data because many make the assumption of independence between samples. To overcome this, Hidden Markov Models were used to create a binary classifier for identifying scuffing transitions and Recurrent Neural Networks were used to leverage long distance relationships in predicting discrete damage states. When combined in a workflow, where the binary classifier acted as a filter for the fatigue monitor, the system was able to demonstrate accuracy in damage state prediction and scuffing identification. The time dependent nature of the data restricted data exploration to collecting and analyzing data from the model selection process. The limited amount of available data was unable to give useful information, and the division of training and testing sets tended to heavily influence the scores of the models across combinations of features and hyper-parameters. This work built a framework for tracking scuffing and fatigue on streaming data and demonstrates that machine learning has much to offer rotorcraft health monitoring by using Bayesian learning and deep learning methods to capture the time dependent nature of the data. Suggested future work is to implement the framework developed in this project using a larger variety of data sets to test the generalization capabilities of the models and allow for data exploration.

  3. Evaluating machine learning classification for financial trading: An empirical approach

    OpenAIRE

    Gerlein, EA; McGinnity, M; Belatreche, A; Coleman, S.

    2016-01-01

    Technical and quantitative analysis in financial trading use mathematical and statistical tools to help investors decide on the optimum moment to initiate and close orders. While these traditional approaches have served their purpose to some extent, new techniques arising from the field of computational intelligence such as machine learning and data mining have emerged to analyse financial information. While the main financial engineering research has focused on complex computational models s...

  4. Machine Learning and Cosmological Simulations I: Semi-Analytical Models

    OpenAIRE

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2015-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various so...

  5. The relevance vector machine technique for channel equalization application

    OpenAIRE

    Chen, S.; Gunn, S.R.; Harris, C J

    2001-01-01

    The recently introduced relevance vector machine (RVM) technique is applied to communication channel equalization. It is demonstrated that the RVM equalizer can closely match the optimal performance of the Bayesian equalizer, with a much sparser kernel representation than that is achievable by the state-of-art support vector machine (SVM) technique.

  6. Machine Learning for ATLAS DDM Network Metrics

    CERN Document Server

    Lassnig, Mario; The ATLAS collaboration; Vamosi, Ralf

    2016-01-01

    The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from our ongoing automation efforts. First, we describe our framework for distributed data management and network metrics, automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  7. Discharge estimation based on machine learning

    Institute of Scientific and Technical Information of China (English)

    Zhu JIANG; Hui-yan WANG; Wen-wu SONG

    2013-01-01

    To overcome the limitations of the traditional stage-discharge models in describing the dynamic characteristics of a river, a machine learning method of non-parametric regression, the locally weighted regression method was used to estimate discharge. With the purpose of improving the precision and efficiency of river discharge estimation, a novel machine learning method is proposed:the clustering-tree weighted regression method. First, the training instances are clustered. Second, the k-nearest neighbor method is used to cluster new stage samples into the best-fit cluster. Finally, the daily discharge is estimated. In the estimation process, the interference of irrelevant information can be avoided, so that the precision and efficiency of daily discharge estimation are improved. Observed data from the Luding Hydrological Station were used for testing. The simulation results demonstrate that the precision of this method is high. This provides a new effective method for discharge estimation.

  8. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  9. Comparison of Machine Learning Approaches on Arabic Twitter Sentiment Analysis

    Directory of Open Access Journals (Sweden)

    Merfat.M. Altawaier

    2016-12-01

    Full Text Available With the dramatic expansion of information over internet, users around the world express their opinion daily on the social network such as Facebook and Twitter. Large corporations nowadays invest on analyzing these opinions in order to assess their products or services by knowing the people feedback toward such business. The process of knowing users’ opinions toward particular product or services whether positive or negative is called sentiment analysis. Arabic is one of the common languages that have been addressed regarding sentiment analysis. In the literature, several approaches have been proposed for Arabic sentiment analysis and most of these approaches are using machine learning techniques. Machine learning techniques are various and have different performances. Therefore, in this study, we try to identifying a simple, but workable approach for Arabic sentiment analysis on Twitter. Hence, this study aims to investigate the machine learning technique in terms of Arabic sentiment analysis on Twitter. Three techniques have been used including Naïve Bayes, Decision Tree (DT and Support Vector Machine (SVM. In addition, two simple sub-tasks pre-processing have been also used; Term Frequency-Inverse Document Frequency (TF-IDF and Arabic stemming to get the heaviest weight term as the feature for tweet classification. TF-IDF aims to identify the most frequent words, whereas stemming aims to retrieve the stem of the word by removing the inflectional derivations. The dataset that has been used is Modern Arabic Corpus which consists of Arabic tweets. The performance of classification has been evaluated based on the information retrieval metrics precision, recall and f-measure. The experimental results have shown that DT has outperformed the other techniques by obtaining 78% of f-measure.

  10. Micro machining techniques commonly used in manufacturing field

    Directory of Open Access Journals (Sweden)

    Adem Çiçek

    2011-06-01

    Full Text Available Developing technology and the need for high-precision parts in manufacturing industry has revealed the micro-machining. Machine tools and work pieces are miniaturized through micro-machining, materials and power consumption reduced to a minimum level. High productiveness in the use of resources and time can be obtained through this rapidly growing industry around the world. In this paper, different micro-machining techniques have been classified revising the investigations recently performed in the field of micro-machining and discussed their contributions to manufacturing process.

  11. Network anomaly detection a machine learning perspective

    CERN Document Server

    Bhattacharyya, Dhruba Kumar

    2013-01-01

    With the rapid rise in the ubiquity and sophistication of Internet technology and the accompanying growth in the number of network attacks, network intrusion detection has become increasingly important. Anomaly-based network intrusion detection refers to finding exceptional or nonconforming patterns in network traffic data compared to normal behavior. Finding these anomalies has extensive applications in areas such as cyber security, credit card and insurance fraud detection, and military surveillance for enemy activities. Network Anomaly Detection: A Machine Learning Perspective presents mach

  12. Ozone ensemble forecast with machine learning algorithms

    OpenAIRE

    Mallet, Vivien; Stoltz, Gilles; Mauricette, Boris

    2009-01-01

    International audience; We apply machine learning algorithms to perform sequential aggregation of ozone forecasts. The latter rely on a multimodel ensemble built for ozone forecasting with the modeling system Polyphemus. The ensemble simulations are obtained by changes in the physical parameterizations, the numerical schemes, and the input data to the models. The simulations are carried out for summer 2001 over western Europe in order to forecast ozone daily peaks and ozone hourly concentrati...

  13. Bayesian Machine Learning Techniques for revealing complex interactions among genetic and clinical factors in association with extra-intestinal Manifestations in IBD patients

    Science.gov (United States)

    Menti, E; Lanera, C; Lorenzoni, G; Giachino, Daniela F.; Marchi, Mario De; Gregori, Dario; Berchialla, Paola

    2016-01-01

    The objective of the study is to assess the predictive performance of three different techniques as classifiers for extra-intestinal manifestations in 152 patients with Crohn’s disease. Naïve Bayes, Bayesian Additive Regression Trees and Bayesian Networks implemented using a Greedy Thick Thinning algorithm for learning dependencies among variables and EM algorithm for learning conditional probabilities associated to each variable are taken into account. Three sets of variables were considered: (i) disease characteristics: presentation, behavior and location (ii) risk factors: age, gender, smoke and familiarity and (iii) genetic polymorphisms of the NOD2, CD14, TNFA, IL12B, and IL1RN genes, whose involvement in Crohn’s disease is known or suspected. Extra-intestinal manifestations occurred in 75 patients. Bayesian Networks achieved accuracy of 82% when considering only clinical factors and 89% when considering also genetic information, outperforming the other techniques. CD14 has a small predicting capability. Adding TNFA, IL12B to the 3020insC NOD2 variant improved the accuracy.

  14. Quantum Loop Topography for Machine Learning

    Science.gov (United States)

    Zhang, Yi; Kim, Eun-Ah

    2017-05-01

    Despite rapidly growing interest in harnessing machine learning in the study of quantum many-body systems, training neural networks to identify quantum phases is a nontrivial challenge. The key challenge is in efficiently extracting essential information from the many-body Hamiltonian or wave function and turning the information into an image that can be fed into a neural network. When targeting topological phases, this task becomes particularly challenging as topological phases are defined in terms of nonlocal properties. Here, we introduce quantum loop topography (QLT): a procedure of constructing a multidimensional image from the "sample" Hamiltonian or wave function by evaluating two-point operators that form loops at independent Monte Carlo steps. The loop configuration is guided by the characteristic response for defining the phase, which is Hall conductivity for the cases at hand. Feeding QLT to a fully connected neural network with a single hidden layer, we demonstrate that the architecture can be effectively trained to distinguish the Chern insulator and the fractional Chern insulator from trivial insulators with high fidelity. In addition to establishing the first case of obtaining a phase diagram with a topological quantum phase transition with machine learning, the perspective of bridging traditional condensed matter theory with machine learning will be broadly valuable.

  15. Position Paper: Applying Machine Learning to Software Analysis to Achieve Trusted, Repeatable Scientific Computing

    Energy Technology Data Exchange (ETDEWEB)

    Prowell, Stacy J [ORNL; Symons, Christopher T [ORNL

    2015-01-01

    Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.

  16. Deep Extreme Learning Machine and Its Application in EEG Classification

    Directory of Open Access Journals (Sweden)

    Shifei Ding

    2015-01-01

    Full Text Available Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM approximate the complicated function but it also does not need to iterate during the training process. We combining with MLELM and extreme learning machine with kernel (KELM put forward deep extreme learning machine (DELM and apply it to EEG classification in this paper. This paper focuses on the application of DELM in the classification of the visual feedback experiment, using MATLAB and the second brain-computer interface (BCI competition datasets. By simulating and analyzing the results of the experiments, effectiveness of the application of DELM in EEG classification is confirmed.

  17. Scalable Machine Learning for Massive Astronomical Datasets

    Science.gov (United States)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  18. Experimental analysis of the performance of machine learning algorithms in the classification of navigation accident records

    Directory of Open Access Journals (Sweden)

    REIS, M V. S. de A.

    2017-06-01

    Full Text Available This paper aims to evaluate the use of machine learning techniques in a database of marine accidents. We analyzed and evaluated the main causes and types of marine accidents in the Northern Fluminense region. For this, machine learning techniques were used. The study showed that the modeling can be done in a satisfactory manner using different configurations of classification algorithms, varying the activation functions and training parameters. The SMO (Sequential Minimal Optimization algorithm showed the best performance result.

  19. Finding New Perovskite Halides via Machine learning

    Directory of Open Access Journals (Sweden)

    Ghanshyam ePilania

    2016-04-01

    Full Text Available Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning via building a support vector machine (SVM based classifier that uses elemental features (or descriptors to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  20. Finding New Perovskite Halides via Machine learning

    Science.gov (United States)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  1. Machine Learning of the Reactor Core Loading Pattern Critical Parameters

    Directory of Open Access Journals (Sweden)

    Krešimir Trontl

    2008-01-01

    Full Text Available The usual approach to loading pattern optimization involves high degree of engineering judgment, a set of heuristic rules, an optimization algorithm, and a computer code used for evaluating proposed loading patterns. The speed of the optimization process is highly dependent on the computer code used for the evaluation. In this paper, we investigate the applicability of a machine learning model which could be used for fast loading pattern evaluation. We employ a recently introduced machine learning technique, support vector regression (SVR, which is a data driven, kernel based, nonlinear modeling paradigm, in which model parameters are automatically determined by solving a quadratic optimization problem. The main objective of the work reported in this paper was to evaluate the possibility of applying SVR method for reactor core loading pattern modeling. We illustrate the performance of the solution and discuss its applicability, that is, complexity, speed, and accuracy.

  2. How the machine learning conquers reconstruction in neutrino experiments

    CERN Document Server

    CERN. Geneva

    2017-01-01

    An evolution from the purely algorithmic approaches towards the machine learning solutions started a few years ago in the neutrino experiments. Now, this process turns into a true boom, especially in the experiments based on the imaging technologies, such as LArTPC’s used in MicroBooNE and DUNE experiments or liquid scintillator detector implemented by the NOvA Collaboration. High resolution, image-like projections of events obtained with these detectors proved to be hard pattern recognition problems for the conventional reconstruction techniques. In the seminar, I will present why the neutrino events are so challenging and how the essential difficulties are now being attacked with the machine learning.

  3. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    CERN Document Server

    Hoyle, Ben; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-01-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million 'clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 'anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed 'anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured stat...

  4. Machine Learning and Cosmological Simulations II: Hydrodynamical Simulations

    CERN Document Server

    Kamdar, Harshil M; Brunner, Robert J

    2015-01-01

    We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris Simulation to train and test various sophisticated machine learning algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydro...

  5. Machine Learning with Squared-Loss Mutual Information

    Directory of Open Access Journals (Sweden)

    Masashi Sugiyama

    2012-12-01

    Full Text Available Mutual information (MI is useful for detecting statistical independence between random variables, and it has been successfully applied to solving various machine learning problems. Recently, an alternative to MI called squared-loss MI (SMI was introduced. While ordinary MI is the Kullback–Leibler divergence from the joint distribution to the product of the marginal distributions, SMI is its Pearson divergence variant. Because both the divergences belong to the ƒ-divergence family, they share similar theoretical properties. However, a notable advantage of SMI is that it can be approximated from data in a computationally more efficient and numerically more stable way than ordinary MI. In this article, we review recent development in SMI approximation based on direct density-ratio estimation and SMI-based machine learning techniques such as independence testing, dimensionality reduction, canonical dependency analysis, independent component analysis, object matching, clustering, and causal inference.

  6. Entanglement-based machine learning on a quantum computer.

    Science.gov (United States)

    Cai, X-D; Wu, D; Su, Z-E; Chen, M-C; Wang, X-L; Li, Li; Liu, N-L; Lu, C-Y; Pan, J-W

    2015-03-20

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  7. Sparse extreme learning machine for classification.

    Science.gov (United States)

    Bai, Zuo; Huang, Guang-Bin; Wang, Danwei; Wang, Han; Westover, M Brandon

    2014-10-01

    Extreme learning machine (ELM) was initially proposed for single-hidden-layer feedforward neural networks (SLFNs). In the hidden layer (feature mapping), nodes are randomly generated independently of training data. Furthermore, a unified ELM was proposed, providing a single framework to simplify and unify different learning methods, such as SLFNs, least square support vector machines, proximal support vector machines, and so on. However, the solution of unified ELM is dense, and thus, usually plenty of storage space and testing time are required for large-scale applications. In this paper, a sparse ELM is proposed as an alternative solution for classification, reducing storage space and testing time. In addition, unified ELM obtains the solution by matrix inversion, whose computational complexity is between quadratic and cubic with respect to the training size. It still requires plenty of training time for large-scale problems, even though it is much faster than many other traditional methods. In this paper, an efficient training algorithm is specifically developed for sparse ELM. The quadratic programming problem involved in sparse ELM is divided into a series of smallest possible sub-problems, each of which are solved analytically. Compared with SVM, sparse ELM obtains better generalization performance with much faster training speed. Compared with unified ELM, sparse ELM achieves similar generalization performance for binary classification applications, and when dealing with large-scale binary classification problems, sparse ELM realizes even faster training speed than unified ELM.

  8. Machine learning strategies for systems with invariance properties

    Science.gov (United States)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  9. Weka machine learning for predicting the phospholipidosis inducing potential.

    Science.gov (United States)

    Ivanciuc, Ovidiu

    2008-01-01

    The drug discovery and development process is lengthy and expensive, and bringing a drug to market may take up to 18 years and may cost up to 2 billion $US. The extensive use of computer-assisted drug design techniques may considerably increase the chances of finding valuable drug candidates, thus decreasing the drug discovery time and costs. The most important computational approach is represented by structure-activity relationships that can discriminate between sets of chemicals that are active/inactive towards a certain biological receptor. An adverse effect of some cationic amphiphilic drugs is phospholipidosis that manifests as an intracellular accumulation of phospholipids and formation of concentric lamellar bodies. Here we present structure-activity relationships (SAR) computed with a wide variety of machine learning algorithms trained to identify drugs that have phospholipidosis inducing potential. All SAR models are developed with the machine learning software Weka, and include both classical algorithms, such as k-nearest neighbors and decision trees, as well as recently introduced methods, such as support vector machines and artificial immune systems. The best predictions are obtained with support vector machines, followed by perceptron artificial neural network, logistic regression, and k-nearest neighbors.

  10. Using Machine Learning in Adversarial Environments.

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Warren Leon [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-02-01

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approaches only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.

  11. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    Science.gov (United States)

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  12. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    Science.gov (United States)

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  13. An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge

    Science.gov (United States)

    Mivule, Kato

    2014-01-01

    The purpose of this investigation is to study and pursue a user-defined approach in preserving data privacy while maintaining an acceptable level of data utility using machine learning classification techniques as a gauge in the generation of synthetic data sets. This dissertation will deal with data privacy, data utility, machine learning…

  14. An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge

    Science.gov (United States)

    Mivule, Kato

    2014-01-01

    The purpose of this investigation is to study and pursue a user-defined approach in preserving data privacy while maintaining an acceptable level of data utility using machine learning classification techniques as a gauge in the generation of synthetic data sets. This dissertation will deal with data privacy, data utility, machine learning…

  15. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    Science.gov (United States)

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-09-12

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  16. Machine Learning Interface for Medical Image Analysis.

    Science.gov (United States)

    Zhang, Yi C; Kagen, Alexander C

    2016-10-11

    TensorFlow is a second-generation open-source machine learning software library with a built-in framework for implementing neural networks in wide variety of perceptual tasks. Although TensorFlow usage is well established with computer vision datasets, the TensorFlow interface with DICOM formats for medical imaging remains to be established. Our goal is to extend the TensorFlow API to accept raw DICOM images as input; 1513 DaTscan DICOM images were obtained from the Parkinson's Progression Markers Initiative (PPMI) database. DICOM pixel intensities were extracted and shaped into tensors, or n-dimensional arrays, to populate the training, validation, and test input datasets for machine learning. A simple neural network was constructed in TensorFlow to classify images into normal or Parkinson's disease groups. Training was executed over 1000 iterations for each cross-validation set. The gradient descent optimization and Adagrad optimization algorithms were used to minimize cross-entropy between the predicted and ground-truth labels. Cross-validation was performed ten times to produce a mean accuracy of 0.938 ± 0.047 (95 % CI 0.908-0.967). The mean sensitivity was 0.974 ± 0.043 (95 % CI 0.947-1.00) and mean specificity was 0.822 ± 0.207 (95 % CI 0.694-0.950). We extended the TensorFlow API to enable DICOM compatibility in the context of DaTscan image analysis. We implemented a neural network classifier that produces diagnostic accuracies on par with excellent results from previous machine learning models. These results indicate the potential role of TensorFlow as a useful adjunct diagnostic tool in the clinical setting.

  17. Machine Learning Algorithms in Web Page Classification

    Directory of Open Access Journals (Sweden)

    W.A.AWAD

    2012-11-01

    Full Text Available In this paper we use machine learning algorithms like SVM, KNN and GIS to perform a behaviorcomparison on the web pages classifications problem, from the experiment we see in the SVM with smallnumber of negative documents to build the centroids has the smallest storage requirement and the least online test computation cost. But almost all GIS with different number of nearest neighbors have an evenhigher storage requirement and on line test computation cost than KNN. This suggests that some futurework should be done to try to reduce the storage requirement and on list test cost of GIS.

  18. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    OpenAIRE

    C. V. Subbulakshmi; Deepa, S. N.

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learni...

  19. Predicting Networked Strategic Behavior via Machine Learning and Game Theory

    Science.gov (United States)

    2015-01-13

    Report: Predicting Networked Strategic Behavior via Machine Learning and Game Theory The views, opinions and/or findings contained in this report...2211 machine learning, game theory , microeconomics, behavioral data REPORT DOCUMENTATION PAGE 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 10. SPONSOR...Strategic Behavior via Machine Learning and Game Theory Report Title The funding for this project was used to develop basic models, methodology

  20. Performance of machine learning methods for classification tasks

    OpenAIRE

    B. Krithika; Dr. V. Ramalingam; Rajan, K

    2013-01-01

    In this paper, the performance of various machine learning methods on pattern classification and recognition tasks are proposed. The proposed method for evaluating performance will be based on the feature representation, feature selection and setting model parameters. The nature of the data, the methods of feature extraction and feature representation are discussed. The results of the Machine Learning algorithms on the classification task are analysed. The performance of Machine Learning meth...

  1. Formation enthalpies for transition metal alloys using machine learning

    Science.gov (United States)

    Ubaru, Shashanka; Miedlar, Agnieszka; Saad, Yousef; Chelikowsky, James R.

    2017-06-01

    The enthalpy of formation is an important thermodynamic property. Developing fast and accurate methods for its prediction is of practical interest in a variety of applications. Material informatics techniques based on machine learning have recently been introduced in the literature as an inexpensive means of exploiting materials data, and can be used to examine a variety of thermodynamics properties. We investigate the use of such machine learning tools for predicting the formation enthalpies of binary intermetallic compounds that contain at least one transition metal. We consider certain easily available properties of the constituting elements complemented by some basic properties of the compounds, to predict the formation enthalpies. We show how choosing these properties (input features) based on a literature study (using prior physics knowledge) seems to outperform machine learning based feature selection methods such as sensitivity analysis and LASSO (least absolute shrinkage and selection operator) based methods. A nonlinear kernel based support vector regression method is employed to perform the predictions. The predictive ability of our model is illustrated via several experiments on a dataset containing 648 binary alloys. We train and validate the model using the formation enthalpies calculated using a model by Miedema, which is a popular semiempirical model used for the prediction of formation enthalpies of metal alloys.

  2. A Machine Learning Framework for Plan Payment Risk Adjustment.

    Science.gov (United States)

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R(2) . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  3. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  4. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier.

    Science.gov (United States)

    Subbulakshmi, C V; Deepa, S N

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  5. Machine Learning: developing an image recognition program : with Python, Scikit Learn and OpenCV

    OpenAIRE

    Nguyen, Minh

    2016-01-01

    Machine Learning is one of the most debated topic in computer world these days, especially after the first Computer Go program has beaten human Go world champion. Among endless application of Machine Learning, image recognition, which problem is processing enormous amount of data from dynamic input. This thesis will present the basic concept of Machine Learning, Machine Learning algorithms, Python programming language and Scikit Learn – a simple and efficient tool for data analysis in P...

  6. Development of techniques to enhance man/machine communication

    Science.gov (United States)

    Targ, R.; Cole, P.; Puthoff, H.

    1974-01-01

    A four-state random stimulus generator, considered to function as an ESP teaching machine was used to investigate an approach to facilitating interactions between man and machines. A subject tries to guess in which of four states the machine is. The machine offers the user feedback and reinforcement as to the correctness of his choice. Using this machine, 148 volunteer subjects were screened under various protocols. Several whose learning slope and/or mean score departed significantly from chance expectation were identified. Direct physiological evidence of perception of remote stimuli not presented to any known sense of the percipient using electroencephalographic (EEG) output when a light was flashed in a distant room was also studied.

  7. Machine-learning-assisted materials discovery using failed experiments

    Science.gov (United States)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted

  8. Machine-learning-assisted materials discovery using failed experiments.

    Science.gov (United States)

    Raccuglia, Paul; Elbert, Katherine C; Adler, Philip D F; Falk, Casey; Wenny, Malia B; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A; Schrier, Joshua; Norquist, Alexander J

    2016-05-05

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions

  9. Machine-Learning Approach for Design of Nanomagnetic-Based Antennas

    Science.gov (United States)

    Gianfagna, Carmine; Yu, Huan; Swaminathan, Madhavan; Pulugurtha, Raj; Tummala, Rao; Antonini, Giulio

    2017-08-01

    We propose a machine-learning approach for design of planar inverted-F antennas with a magneto-dielectric nanocomposite substrate. It is shown that machine-learning techniques can be efficiently used to characterize nanomagnetic-based antennas by accurately mapping the particle radius and volume fraction of the nanomagnetic material to antenna parameters such as gain, bandwidth, radiation efficiency, and resonant frequency. A modified mixing rule model is also presented. In addition, the inverse problem is addressed through machine learning as well, where given the antenna parameters, the corresponding design space of possible material parameters is identified.

  10. Machine Learning Based Diagnosis of Lithium Batteries

    Science.gov (United States)

    Ibe-Ekeocha, Chinemerem Christopher

    The depletion of the world's current petroleum reserve, coupled with the negative effects of carbon monoxide and other harmful petrochemical by-products on the environment, is the driving force behind the movement towards renewable and sustainable energy sources. Furthermore, the growing transportation sector consumes a significant portion of the total energy used in the United States. A complete electrification of this sector would require a significant development in electric vehicles (EVs) and hybrid electric vehicles (HEVs), thus translating to a reduction in the carbon footprint. As the market for EVs and HEVs grows, their battery management systems (BMS) need to be improved accordingly. The BMS is not only responsible for optimally charging and discharging the battery, but also monitoring battery's state of charge (SOC) and state of health (SOH). SOC, similar to an energy gauge, is a representation of a battery's remaining charge level as a percentage of its total possible charge at full capacity. Similarly, SOH is a measure of deterioration of a battery; thus it is a representation of the battery's age. Both SOC and SOH are not measurable, so it is important that these quantities are estimated accurately. An inaccurate estimation could not only be inconvenient for EV consumers, but also potentially detrimental to battery's performance and life. Such estimations could be implemented either online, while battery is in use, or offline when battery is at rest. This thesis presents intelligent online SOC and SOH estimation methods using machine learning tools such as artificial neural network (ANN). ANNs are a powerful generalization tool if programmed and trained effectively. Unlike other estimation strategies, the techniques used require no battery modeling or knowledge of battery internal parameters but rather uses battery's voltage, charge/discharge current, and ambient temperature measurements to accurately estimate battery's SOC and SOH. The developed

  11. A Fast Reduced Kernel Extreme Learning Machine.

    Science.gov (United States)

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred.

  12. Measure Transformer Semantics for Bayesian Machine Learning

    Science.gov (United States)

    Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen

    The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

  13. Rule based systems for big data a machine learning approach

    CERN Document Server

    Liu, Han; Cocea, Mihaela

    2016-01-01

    The ideas introduced in this book explore the relationships among rule based systems, machine learning and big data. Rule based systems are seen as a special type of expert systems, which can be built by using expert knowledge or learning from real data. The book focuses on the development and evaluation of rule based systems in terms of accuracy, efficiency and interpretability. In particular, a unified framework for building rule based systems, which consists of the operations of rule generation, rule simplification and rule representation, is presented. Each of these operations is detailed using specific methods or techniques. In addition, this book also presents some ensemble learning frameworks for building ensemble rule based systems.

  14. Accelerating the BSM interpretation of LHC data with machine learning

    CERN Document Server

    Bertone, Gianfranco; Kim, Jong Soo; Liem, Sebastian; de Austri, Roberto Ruiz; Welling, Max

    2016-01-01

    The interpretation of Large Hadron Collider (LHC) data in the framework of Beyond the Standard Model (BSM) theories is hampered by the need to run computationally expensive event generators and detector simulators. Performing statistically convergent scans of high-dimensional BSM theories is consequently challenging, and in practice unfeasible for very high-dimensional BSM theories. We present here a new machine learning method that accelerates the interpretation of LHC data, by learning the relationship between BSM theory parameters and data. As a proof-of-concept, we demonstrate that this technique accurately predicts natural SUSY signal events in two signal regions at the High Luminosity LHC, up to four orders of magnitude faster than standard techniques. The new approach makes it possible to rapidly and accurately reconstruct the theory parameters of complex BSM theories, should an excess in the data be discovered at the LHC.

  15. Characterizing EMG data using machine-learning tools.

    Science.gov (United States)

    Yousefi, Jamileh; Hamilton-Wright, Andrew

    2014-08-01

    Effective electromyographic (EMG) signal characterization is critical in the diagnosis of neuromuscular disorders. Machine-learning based pattern classification algorithms are commonly used to produce such characterizations. Several classifiers have been investigated to develop accurate and computationally efficient strategies for EMG signal characterization. This paper provides a critical review of some of the classification methodologies used in EMG characterization, and presents the state-of-the-art accomplishments in this field, emphasizing neuromuscular pathology. The techniques studied are grouped by their methodology, and a summary of the salient findings associated with each method is presented.

  16. Machine learning, computer vision, and probabilistic models in jet physics

    CERN Document Server

    CERN. Geneva; NACHMAN, Ben

    2015-01-01

    In this talk we present recent developments in the application of machine learning, computer vision, and probabilistic models to the analysis and interpretation of LHC events. First, we will introduce the concept of jet-images and computer vision techniques for jet tagging. Jet images enabled the connection between jet substructure and tagging with the fields of computer vision and image processing for the first time, improving the performance to identify highly boosted W bosons with respect to state-of-the-art methods, and providing a new way to visualize the discriminant features of different classes of jets, adding a new capability to understand the physics within jets and to design more powerful jet tagging methods. Second, we will present Fuzzy jets: a new paradigm for jet clustering using machine learning methods. Fuzzy jets view jet clustering as an unsupervised learning task and incorporate a probabilistic assignment of particles to jets to learn new features of the jet structure. In particular, we wi...

  17. Geological applications of machine learning on hyperspectral remote sensing data

    Science.gov (United States)

    Tse, C. H.; Li, Yi-liang; Lam, Edmund Y.

    2015-02-01

    The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.

  18. Mining the Galaxy Zoo Database: Machine Learning Applications

    Science.gov (United States)

    Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.

    2010-01-01

    The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.

  19. Quantum machine learning what quantum computing means to data mining

    CERN Document Server

    Wittek, Peter

    2014-01-01

    Quantum Machine Learning bridges the gap between abstract developments in quantum computing and the applied research on machine learning. Paring down the complexity of the disciplines involved, it focuses on providing a synthesis that explains the most important machine learning algorithms in a quantum framework. Theoretical advances in quantum computing are hard to follow for computer scientists, and sometimes even for researchers involved in the field. The lack of a step-by-step guide hampers the broader understanding of this emergent interdisciplinary body of research. Quantum Machine L

  20. Fundamentals of Machine Learning for Neural Machine Translation

    OpenAIRE

    Kelleher, John

    2016-01-01

    This paper presents a short introduction to neural networks and how they are used for machine translation and concludes with some discussion on the current research challenges being addressed by neural machine translation (NMT) research. The primary goal of this paper is to give a no-tears introduction to NMT to readers that do not have a computer science or mathematical background. The secondary goal is to provide the reader with a deep enough understanding of NMT that they can appreciate th...

  1. Distributed Extreme Learning Machine for Nonlinear Learning over Network

    Directory of Open Access Journals (Sweden)

    Songyan Huang

    2015-02-01

    Full Text Available Distributed data collection and analysis over a network are ubiquitous, especially over a wireless sensor network (WSN. To our knowledge, the data model used in most of the distributed algorithms is linear. However, in real applications, the linearity of systems is not always guaranteed. In nonlinear cases, the single hidden layer feedforward neural network (SLFN with radial basis function (RBF hidden neurons has the ability to approximate any continuous functions and, thus, may be used as the nonlinear learning system. However, confined by the communication cost, using the distributed version of the conventional algorithms to train the neural network directly is usually prohibited. Fortunately, based on the theorems provided in the extreme learning machine (ELM literature, we only need to compute the output weights of the SLFN. Computing the output weights itself is a linear learning problem, although the input-output mapping of the overall SLFN is still nonlinear. Using the distributed algorithmto cooperatively compute the output weights of the SLFN, we obtain a distributed extreme learning machine (dELM for nonlinear learning in this paper. This dELM is applied to the regression problem and classification problem to demonstrate its effectiveness and advantages.

  2. 2nd Machine Learning School for High Energy Physics

    CERN Document Server

    2016-01-01

    The Second Machine Learning summer school organized by Yandex School of Data Analysis and Laboratory of Methods for Big Data Analysis of National Research University Higher School of Economics will be held in Lund, Sweden from 20 to 26 June 2016. It is hosted by Lund University. The school is intended to cover the relatively young area of data analysis and computational research that has started to emerge in High Energy Physics (HEP). It is known by several names including “Multivariate Analysis”, “Neural Networks”, “Classification/Clusterization techniques”. In more generic terms, these techniques belong to the field of “Machine Learning”, which is an area that is based on research performed in Statistics and has received a lot of attention from the Data Science community. There are plenty of essential problems in High energy Physics that can be solved using Machine Learning methods. These vary from online data filtering and reconstruction to offline data analysis. Students of the school w...

  3. Fall classification by machine learning using mobile phones.

    Directory of Open Access Journals (Sweden)

    Mark V Albert

    Full Text Available Fall prevention is a critical component of health care; falls are a common source of injury in the elderly and are associated with significant levels of mortality and morbidity. Automatically detecting falls can allow rapid response to potential emergencies; in addition, knowing the cause or manner of a fall can be beneficial for prevention studies or a more tailored emergency response. The purpose of this study is to demonstrate techniques to not only reliably detect a fall but also to automatically classify the type. We asked 15 subjects to simulate four different types of falls-left and right lateral, forward trips, and backward slips-while wearing mobile phones and previously validated, dedicated accelerometers. Nine subjects also wore the devices for ten days, to provide data for comparison with the simulated falls. We applied five machine learning classifiers to a large time-series feature set to detect falls. Support vector machines and regularized logistic regression were able to identify a fall with 98% accuracy and classify the type of fall with 99% accuracy. This work demonstrates how current machine learning approaches can simplify data collection for prevention in fall-related research as well as improve rapid response to potential injuries due to falls.

  4. Machine Learning Approaches: From Theory to Application in Schizophrenia

    Directory of Open Access Journals (Sweden)

    Elisa Veronese

    2013-01-01

    Full Text Available In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice.

  5. An active role for machine learning in drug development

    Science.gov (United States)

    Murphy, Robert F.

    2014-01-01

    Due to the complexity of biological systems, cutting-edge machine-learning methods will be critical for future drug development. In particular, machine-vision methods to extract detailed information from imaging assays and active-learning methods to guide experimentation will be required to overcome the dimensionality problem in drug development. PMID:21587249

  6. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  7. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    Science.gov (United States)

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  8. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  9. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    Science.gov (United States)

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  10. IEEE International Workshop on Machine Learning for Signal Processing: Preface

    DEFF Research Database (Denmark)

    Tao, Jianhua

    The 21st IEEE International Workshop on Machine Learning for Signal Processing will be held in Beijing, China, on September 18–21, 2011. The workshop series is the major annual technical event of the IEEE Signal Processing Society's Technical Committee on Machine Learning for Signal Processing...

  11. Proceedings of the IEEE Machine Learning for Signal Processing XVII

    DEFF Research Database (Denmark)

    The seventeenth of a series of workshops sponsored by the IEEE Signal Processing Society and organized by the Machine Learning for Signal Processing Technical Committee (MLSP-TC). The field of machine learning has matured considerably in both methodology and real-world application domains and has...

  12. IEEE International Workshop on Machine Learning for Signal Processing: Preface

    DEFF Research Database (Denmark)

    Tao, Jianhua

    The 21st IEEE International Workshop on Machine Learning for Signal Processing will be held in Beijing, China, on September 18–21, 2011. The workshop series is the major annual technical event of the IEEE Signal Processing Society's Technical Committee on Machine Learning for Signal Processing...

  13. Amp: A modular approach to machine learning in atomistic simulations

    Science.gov (United States)

    Khorshidi, Alireza; Peterson, Andrew A.

    2016-10-01

    Electronic structure calculations, such as those employing Kohn-Sham density functional theory or ab initio wavefunction theories, have allowed for atomistic-level understandings of a wide variety of phenomena and properties of matter at small scales. However, the computational cost of electronic structure methods drastically increases with length and time scales, which makes these methods difficult for long time-scale molecular dynamics simulations or large-sized systems. Machine-learning techniques can provide accurate potentials that can match the quality of electronic structure calculations, provided sufficient training data. These potentials can then be used to rapidly simulate large and long time-scale phenomena at similar quality to the parent electronic structure approach. Machine-learning potentials usually take a bias-free mathematical form and can be readily developed for a wide variety of systems. Electronic structure calculations have favorable properties-namely that they are noiseless and targeted training data can be produced on-demand-that make them particularly well-suited for machine learning. This paper discusses our modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package (Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems. Potentials developed through the atom-centered approach are simultaneously applicable for systems with various sizes. Interpolation can be enhanced by introducing custom descriptors of the local environment. We demonstrate this in the current work for Gaussian-type, bispectrum, and Zernike-type descriptors. Amp has an intuitive and modular structure with an interface through the python scripting language yet has parallelizable fortran components for demanding tasks; it is designed to integrate closely with the widely used Atomic Simulation Environment (ASE), which

  14. A Machine-Learning-Driven Sky Model.

    Science.gov (United States)

    Satylmys, Pynar; Bashford-Rogers, Thomas; Chalmers, Alan; Debattista, Kurt

    2017-01-01

    Sky illumination is responsible for much of the lighting in a virtual environment. A machine-learning-based approach can compactly represent sky illumination from both existing analytic sky models and from captured environment maps. The proposed approach can approximate the captured lighting at a significantly reduced memory cost and enable smooth transitions of sky lighting to be created from a small set of environment maps captured at discrete times of day. The author's results demonstrate accuracy close to the ground truth for both analytical and capture-based methods. The approach has a low runtime overhead, so it can be used as a generic approach for both offline and real-time applications.

  15. Optimal sensor placement using machine learning

    CERN Document Server

    Semaan, Richard

    2016-01-01

    A new method for optimal sensor placement based on variable importance of machine learned models is proposed. With its simplicity, adaptivity, and low computational cost, the method offers many advantages over existing approaches. The new method is implemented on an airfoil equipped with a Coanda actuator. The analysis is based on flow field data obtained from 2D unsteady Reynolds averaged Navier-Stokes (URANS) simulations with different actuation conditions. The optimal sensor locations is compared against the current de-facto standard of maximum POD modal amplitude location, and against a brute force approach that scans all possible sensor combinations. The results show that both the flow conditions and the type of sensor have an effect on the optimal sensor placement, whereas the choice of the response function appears to have limited influence.

  16. Machine learning research 1989-90

    Science.gov (United States)

    Porter, Bruce W.; Souther, Arthur

    1990-01-01

    Multifunctional knowledge bases offer a significant advance in artificial intelligence because they can support numerous expert tasks within a domain. As a result they amortize the costs of building a knowledge base over multiple expert systems and they reduce the brittleness of each system. Due to the inevitable size and complexity of multifunctional knowledge bases, their construction and maintenance require knowledge engineering and acquisition tools that can automatically identify interactions between new and existing knowledge. Furthermore, their use requires software for accessing those portions of the knowledge base that coherently answer questions. Considerable progress was made in developing software for building and accessing multifunctional knowledge bases. A language was developed for representing knowledge, along with software tools for editing and displaying knowledge, a machine learning program for integrating new information into existing knowledge, and a question answering system for accessing the knowledge base.

  17. Lane Detection Based on Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Chao Fan

    2013-09-01

    Full Text Available In order to improve accuracy and robustness of the lane detection in complex conditions, such as the shadows and illumination changing, a novel detection algorithm was proposed based on machine learning. After pretreatment, a set of haar-like filters were used to calculate the eigenvalue in the gray image f(x,y and edge e(x,y. Then these features were trained by using improved boosting algorithm and the final class function g(x was obtained, which was used to judge whether the point x belonging to the lane or not. To avoid the over fitting in traditional boosting, Fisher discriminant analysis was used to initialize the weights of samples. After testing by many road in all conditions, it showed that this algorithm had good robustness and real-time to recognize the lane in all challenging conditions.

  18. Supervised Machine Learning Methods Applied to Predict Ligand- Binding Affinity.

    Science.gov (United States)

    Heck, Gabriela S; Pintro, Val O; Pereira, Richard R; de Ávila, Mauricio B; Levin, Nayara M B; de Azevedo, Walter F

    2017-01-01

    Calculation of ligand-binding affinity is an open problem in computational medicinal chemistry. The ability to computationally predict affinities has a beneficial impact in the early stages of drug development, since it allows a mathematical model to assess protein-ligand interactions. Due to the availability of structural and binding information, machine learning methods have been applied to generate scoring functions with good predictive power. Our goal here is to review recent developments in the application of machine learning methods to predict ligand-binding affinity. We focus our review on the application of computational methods to predict binding affinity for protein targets. In addition, we also describe the major available databases for experimental binding constants and protein structures. Furthermore, we explain the most successful methods to evaluate the predictive power of scoring functions. Association of structural information with ligand-binding affinity makes it possible to generate scoring functions targeted to a specific biological system. Through regression analysis, this data can be used as a base to generate mathematical models to predict ligandbinding affinities, such as inhibition constant, dissociation constant and binding energy. Experimental biophysical techniques were able to determine the structures of over 120,000 macromolecules. Considering also the evolution of binding affinity information, we may say that we have a promising scenario for development of scoring functions, making use of machine learning techniques. Recent developments in this area indicate that building scoring functions targeted to the biological systems of interest shows superior predictive performance, when compared with other approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  19. Tracking medical genetic literature through machine learning.

    Science.gov (United States)

    Bornstein, Aaron T; McLoughlin, Matthew H; Aguilar, Jesus; Wong, Wendy S W; Solomon, Benjamin D

    2016-08-01

    There has been remarkable progress in identifying the causes of genetic conditions as well as understanding how changes in specific genes cause disease. Though difficult (and often superficial) to parse, an interesting tension involves emphasis on basic research aimed to dissect normal and abnormal biology versus more clearly clinical and therapeutic investigations. To examine one facet of this question and to better understand progress in Mendelian-related research, we developed an algorithm that classifies medical literature into three categories (Basic, Clinical, and Management) and conducted a retrospective analysis. We built a supervised machine learning classification model using the Azure Machine Learning (ML) Platform and analyzed the literature (1970-2014) from NCBI's Entrez Gene2Pubmed Database (http://www.ncbi.nlm.nih.gov/gene) using genes from the NHGRI's Clinical Genomics Database (http://research.nhgri.nih.gov/CGD/). We applied our model to 376,738 articles: 288,639 (76.6%) were classified as Basic, 54,178 (14.4%) as Clinical, and 24,569 (6.5%) as Management. The average classification accuracy was 92.2%. The rate of Clinical publication was significantly higher than Basic or Management. The rate of publication of article types differed significantly when divided into key eras: Human Genome Project (HGP) planning phase (1984-1990); HGP launch (1990) to publication (2001); following HGP completion to the "Next Generation" advent (2009); the era following 2009. In conclusion, in addition to the findings regarding the pace and focus of genetic progress, our algorithm produced a database that can be used in a variety of contexts including automating the identification of management-related literature.

  20. Machine learning, medical diagnosis, and biomedical engineering research - commentary.

    Science.gov (United States)

    Foster, Kenneth R; Koprowski, Robert; Skufca, Joseph D

    2014-07-05

    A large number of papers are appearing in the biomedical engineering literature that describe the use of machine learning techniques to develop classifiers for detection or diagnosis of disease. However, the usefulness of this approach in developing clinically validated diagnostic techniques so far has been limited and the methods are prone to overfitting and other problems which may not be immediately apparent to the investigators. This commentary is intended to help sensitize investigators as well as readers and reviewers of papers to some potential pitfalls in the development of classifiers, and suggests steps that researchers can take to help avoid these problems. Building classifiers should be viewed not simply as an add-on statistical analysis, but as part and parcel of the experimental process. Validation of classifiers for diagnostic applications should be considered as part of a much larger process of establishing the clinical validity of the diagnostic technique.

  1. Machine Learning-Based Content Analysis: Automating the analysis of frames and agendas in political communication research

    NARCIS (Netherlands)

    Burscher, B.

    2016-01-01

    We used machine learning to study policy issues and frames in political messages. With regard to frames, we investigated the automation of two content-analytical tasks: frame coding and frame identification. We found that both tasks can be successfully automated by means of machine learning techniqu

  2. Machine Learning-Based Content Analysis: Automating the analysis of frames and agendas in political communication research

    NARCIS (Netherlands)

    Burscher, B.

    2016-01-01

    We used machine learning to study policy issues and frames in political messages. With regard to frames, we investigated the automation of two content-analytical tasks: frame coding and frame identification. We found that both tasks can be successfully automated by means of machine learning techniqu

  3. Combining Psychological Models with Machine Learning to Better Predict People’s Decisions

    Science.gov (United States)

    2012-03-09

    in some applications (Kaelbling, Littman, & Cassandra, 1998; Neumann & Morgenstern, 1944; Russell & Norvig , 2003). However, research into people’s...scientists often model peoples’ decisions through machine learning techniques (Russell & Norvig , 2003). These models are based on statistical methods such as...A., & Kraus, S. (2011). Using aspiration adaptation theory to improve learning. In Aamas (p. 423-430). Russell, S. J., & Norvig , P. (2003

  4. Machine learning in radiation oncology theory and applications

    CERN Document Server

    El Naqa, Issam; Murphy, Martin J

    2015-01-01

    ​This book provides a complete overview of the role of machine learning in radiation oncology and medical physics, covering basic theory, methods, and a variety of applications in medical physics and radiotherapy. An introductory section explains machine learning, reviews supervised and unsupervised learning methods, discusses performance evaluation, and summarizes potential applications in radiation oncology. Detailed individual sections are then devoted to the use of machine learning in quality assurance; computer-aided detection, including treatment planning and contouring; image-guided rad

  5. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  6. Using Machine Learning to Search for MSSM Higgs Bosons

    CERN Document Server

    Diesing, Rebecca

    2016-01-01

    This paper examines the performance of machine learning in the identification of Minimally Su- persymmetric Standard Model (MSSM) Higgs Bosons, and compares this performance to that of traditional cut strategies. Two boosted decision tree algorithms were tested, scikit-learn and XGBoost. These tests indicated that machine learning can perform significantly better than traditional cuts. However, since machine learning in this form cannot be directly implemented in a real MSSM Higgs analysis, this performance information was instead used to better understand the relationships between training variables. Further studies might use this information to construct an improved cut strategy.

  7. Improved detection of chemical substances from colorimetric sensor data using probabilistic machine learning

    DEFF Research Database (Denmark)

    Mølgaard, Lasse Lohilahti; Buus, Ole Thomsen; Larsen, Jan

    2017-01-01

    We present a data-driven machine learning approach to detect drug- and explosives-precursors using colorimetric sensor technology for air-sampling. The sensing technology has been developed in the context of the CRIM-TRACK project. At present a fully- integrated portable prototype for air sampling...... of the highly multi-variate data produced from the colorimetric chip a number of machine learning techniques are employed to provide reliable classification of target analytes from confounders found in the air streams. We demonstrate that a data-driven machine learning method using dimensionality reduction...... in combination with a probabilistic classifier makes it possible to produce informative features and a high detection rate of analytes. Furthermore, the probabilistic machine learning approach provides a means of automatically identifying unreliable measurements that could produce false predictions...

  8. Machine Learning Approaches for High-resolution Urban Land Cover Classification: A Comparative Study

    Energy Technology Data Exchange (ETDEWEB)

    Vatsavai, Raju [ORNL; Chandola, Varun [ORNL; Cheriyadat, Anil M [ORNL; Bright, Eddie A [ORNL; Bhaduri, Budhendra L [ORNL; Graesser, Jordan B [ORNL

    2011-01-01

    The proliferation of several machine learning approaches makes it difficult to identify a suitable classification technique for analyzing high-resolution remote sensing images. In this study, ten classification techniques were compared from five broad machine learning categories. Surprisingly, the performance of simple statistical classification schemes like maximum likelihood and Logistic regression over complex and recent techniques is very close. Given that these two classifiers require little input from the user, they should still be considered for most classification tasks. Multiple classifier systems is a good choice if the resources permit.

  9. Teaching an Old Log New Tricks with Machine Learning.

    Science.gov (United States)

    Schnell, Krista; Puri, Colin; Mahler, Paul; Dukatz, Carl

    2014-03-01

    To most people, the log file would not be considered an exciting area in technology today. However, these relatively benign, slowly growing data sources can drive large business transformations when combined with modern-day analytics. Accenture Technology Labs has built a new framework that helps to expand existing vendor solutions to create new methods of gaining insights from these benevolent information springs. This framework provides a systematic and effective machine-learning mechanism to understand, analyze, and visualize heterogeneous log files. These techniques enable an automated approach to analyzing log content in real time, learning relevant behaviors, and creating actionable insights applicable in traditionally reactive situations. Using this approach, companies can now tap into a wealth of knowledge residing in log file data that is currently being collected but underutilized because of its overwhelming variety and volume. By using log files as an important data input into the larger enterprise data supply chain, businesses have the opportunity to enhance their current operational log management solution and generate entirely new business insights-no longer limited to the realm of reactive IT management, but extending from proactive product improvement to defense from attacks. As we will discuss, this solution has immediate relevance in the telecommunications and security industries. However, the most forward-looking companies can take it even further. How? By thinking beyond the log file and applying the same machine-learning framework to other log file use cases (including logistics, social media, and consumer behavior) and any other transactional data source.

  10. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...... and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior...

  11. Geminivirus data warehouse: a database enriched with machine learning approaches.

    Science.gov (United States)

    Silva, Jose Cleydson F; Carvalho, Thales F M; Basso, Marcos F; Deguchi, Michihito; Pereira, Welison A; Sobrinho, Roberto R; Vidigal, Pedro M P; Brustolini, Otávio J B; Silva, Fabyano F; Dal-Bianco, Maximiller; Fontes, Renildes L F; Santos, Anésia A; Zerbini, Francisco Murilo; Cerqueira, Fabio R; Fontes, Elizabeth P B

    2017-05-05

    The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.

  12. Machine Learning and the Traveling Repairman

    CERN Document Server

    Tulabandhula, Theja; Jaillet, Patrick

    2011-01-01

    The goal of the Machine Learning and Traveling Repairman Problem (ML&TRP) is to determine a route for a "repair crew," which repairs nodes on a graph. The repair crew aims to minimize the cost of failures at the nodes, but as in many real situations, the failure probabilities are not known and must be estimated. We introduce two formulations for the ML&TRP, where the first formulation is sequential: failure probabilities are estimated at each node, and then a weighted version of the traveling repairman problem is used to construct the route from the failure cost. We develop two models for the failure cost, based on whether repeat failures are considered, or only the first failure on a node. Our second formulation is a multi-objective learning problem for ranking on graphs. Here, we are estimating failure probabilities simultaneously with determining the graph traversal route; the choice of route influences the estimated failure probabilities. This is in accordance with a prior belief that probabilitie...

  13. Active Learning of Nondeterministic Finite State Machines

    Directory of Open Access Journals (Sweden)

    Warawoot Pacharoen

    2013-01-01

    Full Text Available We consider the problem of learning nondeterministic finite state machines (NFSMs from systems where their internal structures are implicit and nondeterministic. Recently, an algorithm for inferring observable NFSMs (ONFSMs, which are the potentially learnable subclass of NFSMs, has been proposed based on the hypothesis that the complete testing assumption is satisfied. According to this assumption, with an input sequence (query, the complete set of all possible output sequences is given by the so-called Teacher, so the number of times for asking the same query is not taken into account in the algorithm. In this paper, we propose LNM*, a refined ONFSM learning algorithm that considers the amount for repeating the same query as one parameter. Unlike the previous work, our approach does not require all possible output sequences in one answer. Instead, it tries to observe the possible output sequences by asking the same query many times to the Teacher. We have proved that LNM* can infer the corresponding ONFSMs of the unknown systems when the number of tries for the same query is adequate to guarantee the complete testing assumption. Moreover, the proof shows that our algorithm will eventually terminate no matter whether the assumption is fulfilled or not. We also present the theoretical time complexity analysis of LNM*. In addition, experimental results demonstrate the practical efficiency of our approach.

  14. Predicting the Plant Root-Associated Ecological Niche of 21 Pseudomonas Species Using Machine Learning and Metabolic Modeling

    OpenAIRE

    Chien, Jennifer; Larsen, Peter

    2017-01-01

    Plants rarely occur in isolated systems. Bacteria can inhabit either the endosphere, the region inside the plant root, or the rhizosphere, the soil region just outside the plant root. Our goal is to understand if using genomic data and media dependent metabolic model information is better for training machine learning of predicting bacterial ecological niche than media independent models or pure genome based species trees. We considered three machine learning techniques: support vector machin...

  15. MLBCD: a machine learning tool for big clinical data.

    Science.gov (United States)

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  16. Leveraging Experiential Learning Techniques for Transfer

    Science.gov (United States)

    Furman, Nate; Sibthorp, Jim

    2013-01-01

    Experiential learning techniques can be helpful in fostering learning transfer. Techniques such as project-based learning, reflective learning, and cooperative learning provide authentic platforms for developing rich learning experiences. In contrast to more didactic forms of instruction, experiential learning techniques foster a depth of learning…

  17. Electrical test prediction using hybrid metrology and machine learning

    Science.gov (United States)

    Breton, Mary; Chao, Robin; Muthinti, Gangadhara Raja; de la Peña, Abraham A.; Simon, Jacques; Cepler, Aron J.; Sendelbach, Matthew; Gaudiello, John; Emans, Susan; Shifrin, Michael; Etzioni, Yoav; Urenski, Ronen; Lee, Wei Ti

    2017-03-01

    Electrical test measurement in the back-end of line (BEOL) is crucial for wafer and die sorting as well as comparing intended process splits. Any in-line, nondestructive technique in the process flow to accurately predict these measurements can significantly improve mean-time-to-detect (MTTD) of defects and improve cycle times for yield and process learning. Measuring after BEOL metallization is commonly done for process control and learning, particularly with scatterometry (also called OCD (Optical Critical Dimension)), which can solve for multiple profile parameters such as metal line height or sidewall angle and does so within patterned regions. This gives scatterometry an advantage over inline microscopy-based techniques, which provide top-down information, since such techniques can be insensitive to sidewall variations hidden under the metal fill of the trench. But when faced with correlation to electrical test measurements that are specific to the BEOL processing, both techniques face the additional challenge of sampling. Microscopy-based techniques are sampling-limited by their small probe size, while scatterometry is traditionally limited (for microprocessors) to scribe targets that mimic device ground rules but are not necessarily designed to be electrically testable. A solution to this sampling challenge lies in a fast reference-based machine learning capability that allows for OCD measurement directly of the electrically-testable structures, even when they are not OCD-compatible. By incorporating such direct OCD measurements, correlation to, and therefore prediction of, resistance of BEOL electrical test structures is significantly improved. Improvements in prediction capability for multiple types of in-die electrically-testable device structures is demonstrated. To further improve the quality of the prediction of the electrical resistance measurements, hybrid metrology using the OCD measurements as well as X-ray metrology (XRF) is used. Hybrid metrology

  18. Housing Value Forecasting Based on Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Jingyi Mu

    2014-01-01

    Full Text Available In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM, least squares support vector machine (LSSVM, and partial least squares (PLS methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.

  19. Application of machine learning in SNP discovery

    Directory of Open Access Journals (Sweden)

    Cregan Perry B

    2006-01-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNP constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures. Results The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False. The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites (12 Mb in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb. SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV (i.e., fraction of candidate SNP being real were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes. Conclusion A machine learning (ML method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study

  20. Studying depression using imaging and machine learning methods.

    Science.gov (United States)

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  1. Machine learning for the automatic detection of anomalous events

    Science.gov (United States)

    Fisher, Wendy D.

    In this dissertation, we describe our research contributions for a novel approach to the application of machine learning for the automatic detection of anomalous events. We work in two different domains to ensure a robust data-driven workflow that could be generalized for monitoring other systems. Specifically, in our first domain, we begin with the identification of internal erosion events in earth dams and levees (EDLs) using geophysical data collected from sensors located on the surface of the levee. As EDLs across the globe reach the end of their design lives, effectively monitoring their structural integrity is of critical importance. The second domain of interest is related to mobile telecommunications, where we investigate a system for automatically detecting non-commercial base station routers (BSRs) operating in protected frequency space. The presence of non-commercial BSRs can disrupt the connectivity of end users, cause service issues for the commercial providers, and introduce significant security concerns. We provide our motivation, experimentation, and results from investigating a generalized novel data-driven workflow using several machine learning techniques. In Chapter 2, we present results from our performance study that uses popular unsupervised clustering algorithms to gain insights to our real-world problems, and evaluate our results using internal and external validation techniques. Using EDL passive seismic data from an experimental laboratory earth embankment, results consistently show a clear separation of events from non-events in four of the five clustering algorithms applied. Chapter 3 uses a multivariate Gaussian machine learning model to identify anomalies in our experimental data sets. For the EDL work, we used experimental data from two different laboratory earth embankments. Additionally, we explore five wavelet transform methods for signal denoising. The best performance is achieved with the Haar wavelets. We achieve up to 97

  2. Memory Based Machine Intelligence Techniques in VLSI hardware

    OpenAIRE

    James, Alex Pappachen

    2012-01-01

    We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high ...

  3. Memory Based Machine Intelligence Techniques in VLSI hardware

    CERN Document Server

    James, Alex Pappachen

    2012-01-01

    We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high level intelligence problems such as sparse coding and contextual processing.

  4. On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products.

    Science.gov (United States)

    Varshney, Kush R; Alemzadeh, Homa

    2017-09-01

    Machine learning algorithms increasingly influence our decisions and interact with us in all parts of our daily lives. Therefore, just as we consider the safety of power plants, highways, and a variety of other engineered socio-technical systems, we must also take into account the safety of systems involving machine learning. Heretofore, the definition of safety has not been formalized in a machine learning context. In this article, we do so by defining machine learning safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. We then use this definition to examine safety in all sorts of applications in cyber-physical systems, decision sciences, and data products. We find that the foundational principle of modern statistical machine learning, empirical risk minimization, is not always a sufficient objective. We discuss how four different categories of strategies for achieving safety in engineering, including inherently safe design, safety reserves, safe fail, and procedural safeguards can be mapped to a machine learning context. We then discuss example techniques that can be adopted in each category, such as considering interpretability and causality of predictive models, objective functions beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software and open data.

  5. DR-predictor: incorporating flexible docking with specialized electronic reactivity and machine learning techniques to predict CYP-mediated sites of metabolism.

    Science.gov (United States)

    Huang, Tao-wei; Zaretzki, Jed; Bergeron, Charles; Bennett, Kristin P; Breneman, Curt M

    2013-12-23

    Computational methods that can identify CYP-mediated sites of metabolism (SOMs) of drug-like compounds have become required tools for early stage lead optimization. In recent years, methods that combine CYP binding site features with CYP/ligand binding information have been sought in order to increase the prediction accuracy of such hybrid models over those that use only one representation. Two challenges that any hybrid ligand/structure-based method must overcome are (1) identification of the best binding pose for a specific ligand with a given CYP and (2) appropriately incorporating the results of docking with ligand reactivity. To address these challenges we have created Docking-Regioselectivity-Predictor (DR-Predictor)--a method that incorporates flexible docking-derived information with specialized electronic reactivity and multiple-instance-learning methods to predict CYP-mediated SOMs. In this study, the hybrid ligand-structure-based DR-Predictor method was tested on substrate sets for CYP 1A2 and CYP 2A6. For these data, the DR-Predictor model was found to identify the experimentally observed SOM within the top two predicted rank-positions for 86% of the 261 1A2 substrates and 83% of the 100 2A6 substrates. Given the accuracy and extendibility of the DR-Predictor method, we anticipate that it will further facilitate the prediction of CYP metabolism liabilities and aid in in-silico ADMET assessment of novel structures.

  6. Graphs in machine learning: an introduction

    CERN Document Server

    Latouche, Pierre

    2015-01-01

    Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globall...

  7. Optimal interference code based on machine learning

    Science.gov (United States)

    Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

    2016-10-01

    In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.

  8. Perspective: Machine learning potentials for atomistic simulations

    Science.gov (United States)

    Behler, Jörg

    2016-11-01

    Nowadays, computer simulations have become a standard tool in essentially all fields of chemistry, condensed matter physics, and materials science. In order to keep up with state-of-the-art experiments and the ever growing complexity of the investigated problems, there is a constantly increasing need for simulations of more realistic, i.e., larger, model systems with improved accuracy. In many cases, the availability of sufficiently efficient interatomic potentials providing reliable energies and forces has become a serious bottleneck for performing these simulations. To address this problem, currently a paradigm change is taking place in the development of interatomic potentials. Since the early days of computer simulations simplified potentials have been derived using physical approximations whenever the direct application of electronic structure methods has been too demanding. Recent advances in machine learning (ML) now offer an alternative approach for the representation of potential-energy surfaces by fitting large data sets from electronic structure calculations. In this perspective, the central ideas underlying these ML potentials, solved problems and remaining challenges are reviewed along with a discussion of their current applicability and limitations.

  9. Machine Learning and Cosmological Simulations I: Semi-Analytical Models

    CERN Document Server

    Kamdar, Harshil M; Brunner, Robert J

    2016-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated machine learning algorithms (k-Nearest Neighbors, decision trees, random forests and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of $M>10^{12} M_{\\odot}$ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore...

  10. Acceleration of saddle-point searches with machine learning.

    Science.gov (United States)

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  11. Acceleration of saddle-point searches with machine learning

    Science.gov (United States)

    Peterson, Andrew A.

    2016-08-01

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  12. A Protein Classification Benchmark collection for machine learning

    NARCIS (Netherlands)

    Sonego, P.; Pacurar, M.; Dhir, S.; Kertész-Farkas, A.; Kocsor, A.; Gáspári, Z.; Leunissen, J.A.M.; Pongor, S.

    2007-01-01

    Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection (http://hydra.icgeb.trieste.it/benchmark) was created in order to provide standard datasets on which the performance of machin

  13. Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

    OpenAIRE

    Kurbatsky, Victor G.; Spiryaev, Vadim A.; Tomin, Nikita V.; Leahy, Paul G.; Sidorov, Denis N.; Zhukov, Alexei V.

    2014-01-01

    A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to ra...

  14. Probabilistic models and machine learning in structural bioinformatics

    DEFF Research Database (Denmark)

    Hamelryck, Thomas

    2009-01-01

    . Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis...

  15. Sparse Machine Learning Methods for Understanding Large Text Corpora

    Data.gov (United States)

    National Aeronautics and Space Administration — Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational...

  16. Parameter Identifiability in Statistical Machine Learning: A Review.

    Science.gov (United States)

    Ran, Zhi-Yong; Hu, Bao-Gang

    2017-05-01

    This review examines the relevance of parameter identifiability for statistical models used in machine learning. In addition to defining main concepts, we address several issues of identifiability closely related to machine learning, showing the advantages and disadvantages of state-of-the-art research and demonstrating recent progress. First, we review criteria for determining the parameter structure of models from the literature. This has three related issues: parameter identifiability, parameter redundancy, and reparameterization. Second, we review the deep influence of identifiability on various aspects of machine learning from theoretical and application viewpoints. In addition to illustrating the utility and influence of identifiability, we emphasize the interplay among identifiability theory, machine learning, mathematical statistics, information theory, optimization theory, information geometry, Riemann geometry, symbolic computation, Bayesian inference, algebraic geometry, and others. Finally, we present a new perspective together with the associated challenges.

  17. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    National Research Council Canada - National Science Library

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-01-01

    .... The polymers are 'fingerprinted' as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand...

  18. A Machine Learning System for Recognizing Subclasses (Demo)

    Energy Technology Data Exchange (ETDEWEB)

    Vatsavai, Raju [ORNL

    2012-01-01

    Thematic information extraction from remote sensing images is a complex task. In this demonstration, we present *Miner machine learning system. In particular, we demonstrate an advanced subclass recognition algorithm that is specifically designed to extract finer classes from aggregate classes.

  19. Performance of machine learning methods for classification tasks

    Directory of Open Access Journals (Sweden)

    B. Krithika

    2013-06-01

    Full Text Available In this paper, the performance of various machine learning methods on pattern classification and recognition tasks are proposed. The proposed method for evaluating performance will be based on the feature representation, feature selection and setting model parameters. The nature of the data, the methods of feature extraction and feature representation are discussed. The results of the Machine Learning algorithms on the classification task are analysed. The performance of Machine Learning methods on classifying Tamil word patterns, i.e., classification of noun and verbs are analysed.The software WEKA (data mining tool is used for evaluating the performance. WEKA has several machine learning algorithms like Bayes, Trees, Lazy, Rule based classifiers.

  20. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  1. Double/Debiased/Neyman Machine Learning of Treatment Effects

    OpenAIRE

    Chernozhukov, Victor; Chetverikov, Denis; Demirer, Mert; Duflo, Esther; Hansen, Christian; Newey, Whitney

    2017-01-01

    Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machine learning methods. In this note, we illustrate the application of this method in the context of ...

  2. Designing Contestability: Interaction Design, Machine Learning, and Mental Health.

    Science.gov (United States)

    Hirsch, Tad; Merced, Kritzia; Narayanan, Shrikanth; Imel, Zac E; Atkins, David C

    2017-06-01

    We describe the design of an automated assessment and training tool for psychotherapists to illustrate challenges with creating interactive machine learning (ML) systems, particularly in contexts where human life, livelihood, and wellbeing are at stake. We explore how existing theories of interaction design and machine learning apply to the psychotherapy context, and identify "contestability" as a new principle for designing systems that evaluate human behavior. Finally, we offer several strategies for making ML systems more accountable to human actors.

  3. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  4. A machine learning-based automatic currency trading system

    OpenAIRE

    Brvar, Anže

    2012-01-01

    The main goal of this thesis was to develop an automated trading system for Forex trading, which would use machine learning methods and their prediction models for deciding about trading actions. A training data set was obtained from exchange rates and values of technical indicators, which describe conditions on currency market. We estimated selected machine learning algorithms and their parameters with validation with sampling. We have prepared a set of automated trading systems with various...

  5. Processing of rock core microtomography images: Using seven different machine learning algorithms

    Science.gov (United States)

    Chauhan, Swarup; Rühaak, Wolfram; Khan, Faisal; Enzmann, Frieder; Mielke, Philipp; Kersten, Michael; Sass, Ingo

    2016-01-01

    The abilities of machine learning algorithms to process X-ray microtomographic rock images were determined. The study focused on the use of unsupervised, supervised, and ensemble clustering techniques, to segment X-ray computer microtomography rock images and to estimate the pore spaces and pore size diameters in the rocks. The unsupervised k-means technique gave the fastest processing time and the supervised least squares support vector machine technique gave the slowest processing time. Multiphase assemblages of solid phases (minerals and finely grained minerals) and the pore phase were found on visual inspection of the images. In general, the accuracy in terms of porosity values and pore size distribution was found to be strongly affected by the feature vectors selected. Relative porosity average value of 15.92±1.77% retrieved from all the seven machine learning algorithm is in very good agreement with the experimental results of 17±2%, obtained using gas pycnometer. Of the supervised techniques, the least square support vector machine technique is superior to feed forward artificial neural network because of its ability to identify a generalized pattern. In the ensemble classification techniques boosting technique converged faster compared to bragging technique. The k-means technique outperformed the fuzzy c-means and self-organized maps techniques in terms of accuracy and speed.

  6. MLSA15 - Proceedings of "Machine Learning and Data Mining for Sports Analytics", workshop @ ECML/PKDD 2015

    OpenAIRE

    Van Haaren, Jan; Zimmermann, Albrecht; Davis, Jesse

    2016-01-01

    This document contains the papers presented at MLSA15, which is the second workshop on Machine Learning and Data Mining for Sports Analytics, organized at ECML/PKDD 2015, held in Porto. The application of analytical techniques is rapidly gaining traction in both professional and amateur sports circles. The majority of techniques used in the field so far are statistical. While there has been some interest in the Machine Learning and Data Mining community, it has been somewhat muted so far. The...

  7. Hydrological data assimilation using Extreme Learning Machines

    Science.gov (United States)

    Boucher, Marie-Amélie; Quilty, John; Adamowski, Jan

    2017-04-01

    Data assimilation refers to any process that allows for updating state variables in a model to represent reality more accurately than the initial (open loop) simulation. In hydrology, data assimilation is often a pre-requisite for forecasting. In practice, many operational agencies rely on "manual" data assimilation: perturbations are added manually to meteorological inputs or directly to state variables based on "expert knowledge" until the simulated streamflow matches the observed streamflow closely. The corrected state variables are then considered as representative of the "true", unknown, state of the watershed just before the forecasting period. However, manual data assimilation raises concerns, mainly regarding reproducibility and high reliance on "expert knowledge". For those reasons, automatic data assimilation methods have been proposed in the literature. Automatic data assimilation also allows for the assessment and reduction of state variable uncertainty, which is predominant for short-term streamflow forecasts (e.g. Thiboult et al. 2016). The goal of this project is to explore the potential of Extreme Learning Machines (ELM, Zang and Liu 2015) for data assimilation. ELMs are an emerging type of neural network that does not require iterative optimisation of their weights and biases and therefore are much faster to calibrate than typical feed-forward backpropagation neural networks. We explore ELM for updating state variables of the lumped conceptual hydrological model GR4J. The GR4J model has two state variables: the level of water in the production and routing reservoirs. Although these two variables are sufficient to describe the state of a snow-free watershed, they are modelling artifices that are not measurable. Consequently, their "true" values can only be verified indirectly through a comparison of simulated and observed streamflow and their values are highly uncertain. GR4J can also be coupled with the snow model CemaNeige, which adds two other

  8. Generative Modeling for Machine Learning on the D-Wave

    Energy Technology Data Exchange (ETDEWEB)

    Thulasidasan, Sunil [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Information Sciences Group

    2016-11-15

    These are slides on Generative Modeling for Machine Learning on the D-Wave. The following topics are detailed: generative models; Boltzmann machines: a generative model; restricted Boltzmann machines; learning parameters: RBM training; practical ways to train RBM; D-Wave as a Boltzmann sampler; mapping RBM onto the D-Wave; Chimera restricted RBM; mapping binary RBM to Ising model; experiments; data; D-Wave effective temperature, parameters noise, etc.; experiments: contrastive divergence (CD) 1 step; after 50 steps of CD; after 100 steps of CD; D-Wave (experiments 1, 2, 3); D-Wave observations.

  9. Machine learning methods for metabolic pathway prediction

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2010-01-01

    Full Text Available Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

  10. Machine Learning Assessments of Soil Drying

    Science.gov (United States)

    Coopersmith, E. J.; Minsker, B. S.; Wenzel, C.; Gilmore, B. J.

    2011-12-01

    Agricultural activities require the use of heavy equipment and vehicles on unpaved farmlands. When soil conditions are wet, equipment can cause substantial damage, leaving deep ruts. In extreme cases, implements can sink and become mired, causing considerable delays and expense to extricate the equipment. Farm managers, who are often located remotely, cannot assess sites before allocating equipment, causing considerable difficulty in reliably assessing conditions of countless sites with any reliability and frequency. For example, farmers often trace serpentine paths of over one hundred miles each day to assess the overall status of various tracts of land spanning thirty, forty, or fifty miles in each direction. One means of assessing the moisture content of a field lies in the strategic positioning of remotely-monitored in situ sensors. Unfortunately, land owners are often reluctant to place sensors across their properties due to the significant monetary cost and complexity. This work aspires to overcome these limitations by modeling the process of wetting and drying statistically - remotely assessing field readiness using only information that is publically accessible. Such data includes Nexrad radar and state climate network sensors, as well as Twitter-based reports of field conditions for validation. Three algorithms, classification trees, k-nearest-neighbors, and boosted perceptrons are deployed to deliver statistical field readiness assessments of an agricultural site located in Urbana, IL. Two of the three algorithms performed with 92-94% accuracy, with the majority of misclassifications falling within the calculated margins of error. This demonstrates the feasibility of using a machine learning framework with only public data, knowledge of system memory from previous conditions, and statistical tools to assess "readiness" without the need for real-time, on-site physical observation. Future efforts will produce a workflow assimilating Nexrad, climate network

  11. Pulsar Search Using Supervised Machine Learning

    Science.gov (United States)

    Ford, John M.

    2017-05-01

    Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require a large ensemble of pulsars to provide enough statistical information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very labor-intensive process, currently requiring skilled people to examine and interpret plots of data output by analysis programs. An automated system for screening the plots will speed up the search for pulsars by a very large factor. Research to date on using machine learning and pattern recognition has not yielded a completely satisfactory system, as systems with the desired near 100% recall have false positive rates that are higher than desired, causing more manual labor in the classification of pulsars. This work proposed to research, identify, propose and develop methods to overcome the barriers to building an improved classification system with a false positive rate of less than 1% and a recall of near 100% that will be useful for the current and next generation of large pulsar surveys. The results show that it is possible to generate classifiers that perform as needed from the available training data. While a false positive rate of 1% was not reached, recall of over 99% was achieved with a false positive rate of less than 2%. Methods of mitigating the imbalanced training and test data were explored and found to be highly effective in enhancing classification accuracy.

  12. Machine Learning Predictions of Flash Floods

    Science.gov (United States)

    Clark, R. A., III; Flamig, Z.; Gourley, J. J.; Hong, Y.

    2016-12-01

    This study concerns the development, assessment, and use of machine learning (ML) algorithms to automatically generate predictions of flash floods around the world from numerical weather prediction (NWP) output. Using an archive of NWP outputs from the Global Forecast System (GFS) model and a historical archive of reports of flash floods across the U.S. and Europe, we developed a set of ML models that output forecasts of the probability of a flash flood given a certain set of atmospheric conditions. Using these ML models, real-time global flash flood predictions from NWP data have been generated in research mode since February 2016. These ML models provide information about which atmospheric variables are most important in the flash flood prediction process. The raw ML predictions can be calibrated against historical events to generate reliable flash flood probabilities. The automatic system was tested in a research-to-operations testbed enviroment with National Weather Service forecasters. The ML models are quite successful at incorporating large amounts of information in a computationally-efficient manner and and result in reasonably skillful predictions. The system is largely successful at identifying flash floods resulting from synoptically-forced events, but struggles with isolated flash floods that arise as a result of weather systems largely unresolvable by the coarse resolution of a global NWP system. The results from this collection of studies suggest that automatic probabilistic predictions of flash floods are a plausible way forward in operational forecasting, but that future research could focus upon applying these methods to finer-scale NWP guidance, to NWP ensembles, and to forecast lead times beyond 24 hours.

  13. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems

    Science.gov (United States)

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A.; Shavlik, Jude; Burnside, Elizabeth S.

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy. PMID:26306246

  14. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems.

    Science.gov (United States)

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A; Shavlik, Jude; Burnside, Elizabeth S

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy.

  15. Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology

    Science.gov (United States)

    Swan, Anna Louise; Mobasheri, Ali; Allaway, David; Liddell, Susan

    2013-01-01

    Abstract Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. PMID:24116388

  16. Combining satellite imagery and machine learning to predict poverty.

    Science.gov (United States)

    Jean, Neal; Burke, Marshall; Xie, Michael; Davis, W Matthew; Lobell, David B; Ermon, Stefano

    2016-08-19

    Reliable data on economic livelihoods remain scarce in the developing world, hampering efforts to study these outcomes and to design policies that improve them. Here we demonstrate an accurate, inexpensive, and scalable method for estimating consumption expenditure and asset wealth from high-resolution satellite imagery. Using survey and satellite data from five African countries--Nigeria, Tanzania, Uganda, Malawi, and Rwanda--we show how a convolutional neural network can be trained to identify image features that can explain up to 75% of the variation in local-level economic outcomes. Our method, which requires only publicly available data, could transform efforts to track and target poverty in developing countries. It also demonstrates how powerful machine learning techniques can be applied in a setting with limited training data, suggesting broad potential application across many scientific domains.

  17. Machine Learning of Calabi-Yau Volumes arXiv

    CERN Document Server

    Krefl, Daniel

    We employ machine learning techniques to investigate the volume minimum of Sasaki-Einstein base manifolds of non-compact toric Calabi-Yau 3-folds. We find that the minimum volume can be approximated via a second order multiple linear regression on standard topological quantities obtained from the corresponding toric diagram. The approximation improves further after invoking a convolutional neural network with the full toric diagram of the Calabi-Yau 3-folds as the input. We are thereby able to circumvent any minimization procedure that was previously necessary and find an explicit mapping between the minimum volume and the topological quantities of the toric diagram. Under the AdS/CFT correspondence, the minimum volumes of Sasaki-Einstein manifolds correspond to central charges of a class of 4d N=1 superconformal field theories. We therefore find empirical evidence for a function that gives values of central charges without the usual extremization procedure.

  18. Neural architecture design based on extreme learning machine.

    Science.gov (United States)

    Bueno-Crespo, Andrés; García-Laencina, Pedro J; Sancho-Gómez, José-Luis

    2013-12-01

    Selection of the optimal neural architecture to solve a pattern classification problem entails to choose the relevant input units, the number of hidden neurons and its corresponding interconnection weights. This problem has been widely studied in many research works but their solutions usually involve excessive computational cost in most of the problems and they do not provide a unique solution. This paper proposes a new technique to efficiently design the MultiLayer Perceptron (MLP) architecture for classification using the Extreme Learning Machine (ELM) algorithm. The proposed method provides a high generalization capability and a unique solution for the architecture design. Moreover, the selected final network only retains those input connections that are relevant for the classification task. Experimental results show these advantages. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. METAPHOR: Probability density estimation for machine learning based photometric redshifts

    Science.gov (United States)

    Amaro, V.; Cavuoti, S.; Brescia, M.; Vellucci, C.; Tortora, C.; Longo, G.

    2017-06-01

    We present METAPHOR (Machine-learning Estimation Tool for Accurate PHOtometric Redshifts), a method able to provide a reliable PDF for photometric galaxy redshifts estimated through empirical techniques. METAPHOR is a modular workflow, mainly based on the MLPQNA neural network as internal engine to derive photometric galaxy redshifts, but giving the possibility to easily replace MLPQNA with any other method to predict photo-z's and their PDF. We present here the results about a validation test of the workflow on the galaxies from SDSS-DR9, showing also the universality of the method by replacing MLPQNA with KNN and Random Forest models. The validation test include also a comparison with the PDF's derived from a traditional SED template fitting method (Le Phare).

  20. A Machine Learning Approach to Test Data Generation

    DEFF Research Database (Denmark)

    Christiansen, Henning; Dahmcke, Christina Mackeprang

    2007-01-01

    Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. This has lead to test methods based on semiartificially produced test data, often produced by {\\em ad hoc......} techniques complemented by statistical models such as Hidden Markov Models (HMM). The quality of such a test method depends on how well the test data reflect the regularities in known data and how well they generalize these regularities. So far only very simplified and generalized, artificial data sets have...... been tested, and a more thorough statistical foundation is required. We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated with the generation of new, artificial data. More specifically, we suggest to use the PRISM...