WorldWideScience

Sample records for machine learning based

  1. Model-based machine learning.

    Science.gov (United States)

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  2. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  3. Prototype-based models in machine learning.

    Science.gov (United States)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We discuss basic schemes of competitive vector quantization as well as the so-called neural gas approach and Kohonen's topology-preserving self-organizing map. Supervised learning in prototype systems is exemplified in terms of learning vector quantization. Most frequently, the familiar Euclidean distance serves as a dissimilarity measure. We present extensions of the framework to nonstandard measures and give an introduction to the use of adaptive distances in relevance learning. © 2016 Wiley Periodicals, Inc.

  4. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  5. Research on machine learning framework based on random forest algorithm

    Science.gov (United States)

    Ren, Qiong; Cheng, Hui; Han, Hai

    2017-03-01

    With the continuous development of machine learning, industry and academia have released a lot of machine learning frameworks based on distributed computing platform, and have been widely used. However, the existing framework of machine learning is limited by the limitations of machine learning algorithm itself, such as the choice of parameters and the interference of noises, the high using threshold and so on. This paper introduces the research background of machine learning framework, and combined with the commonly used random forest algorithm in machine learning classification algorithm, puts forward the research objectives and content, proposes an improved adaptive random forest algorithm (referred to as ARF), and on the basis of ARF, designs and implements the machine learning framework.

  6. BEBP: An Poisoning Method Against Machine Learning Based IDSs

    OpenAIRE

    Li, Pan; Liu, Qiang; Zhao, Wentao; Wang, Dongxu; Wang, Siqi

    2018-01-01

    In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). However, practical IDSs generally update their decision module by feeding new data then retraining learning models in a periodical way. Hence, some attacks that comprise the data for training or testing classifiers significantly challenge the detecting capability of machine learning-based IDSs. Poisoning attack, which is one of the most recognized security threats towards machine learning...

  7. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  8. IoT Security Techniques Based on Machine Learning

    OpenAIRE

    Xiao, Liang; Wan, Xiaoyue; Lu, Xiaozhen; Zhang, Yanyong; Wu, Di

    2018-01-01

    Internet of things (IoT) that integrate a variety of devices into networks to provide advanced and intelligent services have to protect user privacy and address attacks such as spoofing attacks, denial of service attacks, jamming and eavesdropping. In this article, we investigate the attack model for IoT systems, and review the IoT security solutions based on machine learning techniques including supervised learning, unsupervised learning and reinforcement learning. We focus on the machine le...

  9. Optimal interference code based on machine learning

    Science.gov (United States)

    Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

    2016-10-01

    In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.

  10. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  11. Machine Learning.

    Science.gov (United States)

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  12. Machine Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  13. Machine learning based analysis of cardiovascular images

    NARCIS (Netherlands)

    Wolterink, JM

    2017-01-01

    Cardiovascular diseases (CVDs), including coronary artery disease (CAD) and congenital heart disease (CHD) are the global leading cause of death. Computed tomography (CT) and magnetic resonance imaging (MRI) allow non-invasive imaging of cardiovascular structures. This thesis presents machine

  14. Machine Learning Based Diagnosis of Lithium Batteries

    Science.gov (United States)

    Ibe-Ekeocha, Chinemerem Christopher

    The depletion of the world's current petroleum reserve, coupled with the negative effects of carbon monoxide and other harmful petrochemical by-products on the environment, is the driving force behind the movement towards renewable and sustainable energy sources. Furthermore, the growing transportation sector consumes a significant portion of the total energy used in the United States. A complete electrification of this sector would require a significant development in electric vehicles (EVs) and hybrid electric vehicles (HEVs), thus translating to a reduction in the carbon footprint. As the market for EVs and HEVs grows, their battery management systems (BMS) need to be improved accordingly. The BMS is not only responsible for optimally charging and discharging the battery, but also monitoring battery's state of charge (SOC) and state of health (SOH). SOC, similar to an energy gauge, is a representation of a battery's remaining charge level as a percentage of its total possible charge at full capacity. Similarly, SOH is a measure of deterioration of a battery; thus it is a representation of the battery's age. Both SOC and SOH are not measurable, so it is important that these quantities are estimated accurately. An inaccurate estimation could not only be inconvenient for EV consumers, but also potentially detrimental to battery's performance and life. Such estimations could be implemented either online, while battery is in use, or offline when battery is at rest. This thesis presents intelligent online SOC and SOH estimation methods using machine learning tools such as artificial neural network (ANN). ANNs are a powerful generalization tool if programmed and trained effectively. Unlike other estimation strategies, the techniques used require no battery modeling or knowledge of battery internal parameters but rather uses battery's voltage, charge/discharge current, and ambient temperature measurements to accurately estimate battery's SOC and SOH. The developed

  15. Machine learning versus knowledge based classification of legal texts

    NARCIS (Netherlands)

    de Maat, E.; Krabben, K.; Winkels, R.; Winkels, R.G.F.

    2010-01-01

    This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but

  16. Machine Learning Based Classifier for Falsehood Detection

    Science.gov (United States)

    Mallikarjun, H. M.; Manimegalai, P., Dr.; Suresh, H. N., Dr.

    2017-08-01

    The investigation of physiological techniques for Falsehood identification tests utilizing the enthusiastic aggravations started as a part of mid 1900s. The need of Falsehood recognition has been a piece of our general public from hundreds of years back. Different requirements drifted over the general public raising the need to create trick evidence philosophies for Falsehood identification. The established similar addressing tests have been having a tendency to gather uncertain results against which new hearty strategies are being explored upon for acquiring more productive Falsehood discovery set up. Electroencephalography (EEG) is a non-obtrusive strategy to quantify the action of mind through the anodes appended to the scalp of a subject. Electroencephalogram is a record of the electric signs produced by the synchronous activity of mind cells over a timeframe. The fundamental goal is to accumulate and distinguish the important information through this action which can be acclimatized for giving surmising to Falsehood discovery in future analysis. This work proposes a strategy for Falsehood discovery utilizing EEG database recorded on irregular people of various age gatherings and social organizations. The factual investigation is directed utilizing MATLAB v-14. It is a superior dialect for specialized registering which spares a considerable measure of time with streamlined investigation systems. In this work center is made on Falsehood Classification by Support Vector Machine (SVM). 72 Samples are set up by making inquiries from standard poll with a Wright and wrong replies in a diverse era from the individual in wearable head unit. 52 samples are trained and 20 are tested. By utilizing Bluetooth based Neurosky’s Mindwave kit, brain waves are recorded and qualities are arranged appropriately. In this work confusion matrix is derived by matlab programs and accuracy of 56.25 % is achieved.

  17. Machine Learning Based Localization and Classification with Atomic Magnetometers

    Science.gov (United States)

    Deans, Cameron; Griffin, Lewis D.; Marmugi, Luca; Renzoni, Ferruccio

    2018-01-01

    We demonstrate identification of position, material, orientation, and shape of objects imaged by a Rb 85 atomic magnetometer performing electromagnetic induction imaging supported by machine learning. Machine learning maximizes the information extracted from the images created by the magnetometer, demonstrating the use of hidden data. Localization 2.6 times better than the spatial resolution of the imaging system and successful classification up to 97% are obtained. This circumvents the need of solving the inverse problem and demonstrates the extension of machine learning to diffusive systems, such as low-frequency electrodynamics in media. Automated collection of task-relevant information from quantum-based electromagnetic imaging will have a relevant impact from biomedicine to security.

  18. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  19. Runtime Optimizations for Tree-Based Machine Learning Models

    NARCIS (Netherlands)

    N. Asadi; J.J.P. Lin (Jimmy); A.P. de Vries (Arjen)

    2014-01-01

    htmlabstractTree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression

  20. Machine learning for network-based malware detection

    DEFF Research Database (Denmark)

    Stevanovic, Matija

    and based on different, mutually complementary, principles of traffic analysis. The proposed approaches rely on machine learning algorithms (MLAs) for automated and resource-efficient identification of the patterns of malicious network traffic. We evaluated the proposed methods through extensive evaluations...

  1. Improved Extreme Learning Machine based on the Sensitivity Analysis

    Science.gov (United States)

    Cui, Licheng; Zhai, Huawei; Wang, Benchao; Qu, Zengtang

    2018-03-01

    Extreme learning machine and its improved ones is weak in some points, such as computing complex, learning error and so on. After deeply analyzing, referencing the importance of hidden nodes in SVM, an novel analyzing method of the sensitivity is proposed which meets people’s cognitive habits. Based on these, an improved ELM is proposed, it could remove hidden nodes before meeting the learning error, and it can efficiently manage the number of hidden nodes, so as to improve the its performance. After comparing tests, it is better in learning time, accuracy and so on.

  2. Machine learning-based dual-energy CT parametric mapping.

    Science.gov (United States)

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Al Helo, Rose; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C; Rassouli, Negin; Gilkeson, Robert C; Traughber, Bryan J; Cheng, Chee-Wai; Muzic, Raymond F

    2018-05-22

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Zeff), relative electron density (ρe), mean excitation energy (Ix), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 seconds. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency. . © 2018 Institute of Physics and Engineering in

  3. Rule based systems for big data a machine learning approach

    CERN Document Server

    Liu, Han; Cocea, Mihaela

    2016-01-01

    The ideas introduced in this book explore the relationships among rule based systems, machine learning and big data. Rule based systems are seen as a special type of expert systems, which can be built by using expert knowledge or learning from real data. The book focuses on the development and evaluation of rule based systems in terms of accuracy, efficiency and interpretability. In particular, a unified framework for building rule based systems, which consists of the operations of rule generation, rule simplification and rule representation, is presented. Each of these operations is detailed using specific methods or techniques. In addition, this book also presents some ensemble learning frameworks for building ensemble rule based systems.

  4. Machine learning based Intelligent cognitive network using fog computing

    Science.gov (United States)

    Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

    2017-05-01

    In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.

  5. An Android malware detection system based on machine learning

    Science.gov (United States)

    Wen, Long; Yu, Haiyang

    2017-08-01

    The Android smartphone, with its open source character and excellent performance, has attracted many users. However, the convenience of the Android platform also has motivated the development of malware. The traditional method which detects the malware based on the signature is unable to detect unknown applications. The article proposes a machine learning-based lightweight system that is capable of identifying malware on Android devices. In this system we extract features based on the static analysis and the dynamitic analysis, then a new feature selection approach based on principle component analysis (PCA) and relief are presented in the article to decrease the dimensions of the features. After that, a model will be constructed with support vector machine (SVM) for classification. Experimental results show that our system provides an effective method in Android malware detection.

  6. Sample-Based Extreme Learning Machine with Missing Data

    Directory of Open Access Journals (Sweden)

    Hang Gao

    2015-01-01

    Full Text Available Extreme learning machine (ELM has been extensively studied in machine learning community during the last few decades due to its high efficiency and the unification of classification, regression, and so forth. Though bearing such merits, existing ELM algorithms cannot efficiently handle the issue of missing data, which is relatively common in practical applications. The problem of missing data is commonly handled by imputation (i.e., replacing missing values with substituted values according to available information. However, imputation methods are not always effective. In this paper, we propose a sample-based learning framework to address this issue. Based on this framework, we develop two sample-based ELM algorithms for classification and regression, respectively. Comprehensive experiments have been conducted in synthetic data sets, UCI benchmark data sets, and a real world fingerprint image data set. As indicated, without introducing extra computational complexity, the proposed algorithms do more accurate and stable learning than other state-of-the-art ones, especially in the case of higher missing ratio.

  7. NMF-Based Image Quality Assessment Using Extreme Learning Machine.

    Science.gov (United States)

    Wang, Shuigen; Deng, Chenwei; Lin, Weisi; Huang, Guang-Bin; Zhao, Baojun

    2017-01-01

    Numerous state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage process: distortion description followed by distortion effects pooling. As for the first stage, the distortion descriptors or measurements are expected to be effective representatives of human visual variations, while the second stage should well express the relationship among quality descriptors and the perceptual visual quality. However, most of the existing quality descriptors (e.g., luminance, contrast, and gradient) do not seem to be consistent with human perception, and the effects pooling is often done in ad-hoc ways. In this paper, we propose a novel full-reference IQA metric. It applies non-negative matrix factorization (NMF) to measure image degradations by making use of the parts-based representation of NMF. On the other hand, a new machine learning technique [extreme learning machine (ELM)] is employed to address the limitations of the existing pooling techniques. Compared with neural networks and support vector regression, ELM can achieve higher learning accuracy with faster learning speed. Extensive experimental results demonstrate that the proposed metric has better performance and lower computational complexity in comparison with the relevant state-of-the-art approaches.

  8. Machine Learning-based Intelligent Formal Reasoning and Proving System

    Science.gov (United States)

    Chen, Shengqing; Huang, Xiaojian; Fang, Jiaze; Liang, Jia

    2018-03-01

    The reasoning system can be used in many fields. How to improve reasoning efficiency is the core of the design of system. Through the formal description of formal proof and the regular matching algorithm, after introducing the machine learning algorithm, the system of intelligent formal reasoning and verification has high efficiency. The experimental results show that the system can verify the correctness of propositional logic reasoning and reuse the propositional logical reasoning results, so as to obtain the implicit knowledge in the knowledge base and provide the basic reasoning model for the construction of intelligent system.

  9. Summary of vulnerability related technologies based on machine learning

    Science.gov (United States)

    Zhao, Lei; Chen, Zhihao; Jia, Qiong

    2018-04-01

    As the scale of information system increases by an order of magnitude, the complexity of system software is getting higher. The vulnerability interaction from design, development and deployment to implementation stages greatly increases the risk of the entire information system being attacked successfully. Considering the limitations and lags of the existing mainstream security vulnerability detection techniques, this paper summarizes the development and current status of related technologies based on the machine learning methods applied to deal with massive and irregular data, and handling security vulnerabilities.

  10. Component Pin Recognition Using Algorithms Based on Machine Learning

    Science.gov (United States)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  11. Alumina Concentration Detection Based on the Kernel Extreme Learning Machine.

    Science.gov (United States)

    Zhang, Sen; Zhang, Tao; Yin, Yixin; Xiao, Wendong

    2017-09-01

    The concentration of alumina in the electrolyte is of great significance during the production of aluminum. The amount of the alumina concentration may lead to unbalanced material distribution and low production efficiency and affect the stability of the aluminum reduction cell and current efficiency. The existing methods cannot meet the needs for online measurement because industrial aluminum electrolysis has the characteristics of high temperature, strong magnetic field, coupled parameters, and high nonlinearity. Currently, there are no sensors or equipment that can detect the alumina concentration on line. Most companies acquire the alumina concentration from the electrolyte samples which are analyzed through an X-ray fluorescence spectrometer. To solve the problem, the paper proposes a soft sensing model based on a kernel extreme learning machine algorithm that takes the kernel function into the extreme learning machine. K-fold cross validation is used to estimate the generalization error. The proposed soft sensing algorithm can detect alumina concentration by the electrical signals such as voltages and currents of the anode rods. The predicted results show that the proposed approach can give more accurate estimations of alumina concentration with faster learning speed compared with the other methods such as the basic ELM, BP, and SVM.

  12. Airline Passenger Profiling Based on Fuzzy Deep Machine Learning.

    Science.gov (United States)

    Zheng, Yu-Jun; Sheng, Wei-Guo; Sun, Xing-Ming; Chen, Sheng-Yong

    2017-12-01

    Passenger profiling plays a vital part of commercial aviation security, but classical methods become very inefficient in handling the rapidly increasing amounts of electronic records. This paper proposes a deep learning approach to passenger profiling. The center of our approach is a Pythagorean fuzzy deep Boltzmann machine (PFDBM), whose parameters are expressed by Pythagorean fuzzy numbers such that each neuron can learn how a feature affects the production of the correct output from both the positive and negative sides. We propose a hybrid algorithm combining a gradient-based method and an evolutionary algorithm for training the PFDBM. Based on the novel learning model, we develop a deep neural network (DNN) for classifying normal passengers and potential attackers, and further develop an integrated DNN for identifying group attackers whose individual features are insufficient to reveal the abnormality. Experiments on data sets from Air China show that our approach provides much higher learning ability and classification accuracy than existing profilers. It is expected that the fuzzy deep learning approach can be adapted for a variety of complex pattern analysis tasks.

  13. Cosmic String Detection with Tree-Based Machine Learning

    Science.gov (United States)

    Vafaei Sadr, A.; Farhang, M.; Movahed, S. M. S.; Bassett, B.; Kunz, M.

    2018-05-01

    We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies. The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of the processed CMB maps that boost cosmic string detectability. Our proposed classifiers, after training, give results similar to or better than claimed detectability levels from other methods for string tension, Gμ. They can make 3σ detection of strings with Gμ ≳ 2.1 × 10-10 for noise-free, 0.9΄-resolution CMB observations. The minimum detectable tension increases to Gμ ≳ 3.0 × 10-8 for a more realistic, CMB S4-like (II) strategy, improving over previous results.

  14. Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches

    National Research Council Canada - National Science Library

    Qi, Yuan

    2000-01-01

    In this thesis, we propose two new machine learning schemes, a subband-based Independent Component Analysis scheme and a hybrid Independent Component Analysis/Support Vector Machine scheme, and apply...

  15. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  16. Machine learning and radiology.

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  17. Functional networks inference from rule-based machine learning models.

    Science.gov (United States)

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The

  18. Machine-Learning-Based No Show Prediction in Outpatient Visits

    Directory of Open Access Journals (Sweden)

    Carlos Elvira

    2018-03-01

    Full Text Available A recurring problem in healthcare is the high percentage of patients who miss their appointment, be it a consultation or a hospital test. The present study seeks patient’s behavioural patterns that allow predicting the probability of no- shows. We explore the convenience of using Big Data Machine Learning models to accomplish this task. To begin with, a predictive model based only on variables associated with the target appointment is built. Then the model is improved by considering the patient’s history of appointments. In both cases, the Gradient Boosting algorithm was the predictor of choice. Our numerical results are considered promising given the small amount of information available. However, there seems to be plenty of room to improve the model if we manage to collect additional data for both patients and appointments.

  19. Machine learning based switching model for electricity load forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Fan, Shu; Lee, Wei-Jen [Energy Systems Research Center, The University of Texas at Arlington, 416 S. College Street, Arlington, TX 76019 (United States); Chen, Luonan [Department of Electronics, Information and Communication Engineering, Osaka Sangyo University, 3-1-1 Nakagaito, Daito, Osaka 574-0013 (Japan)

    2008-06-15

    In deregulated power markets, forecasting electricity loads is one of the most essential tasks for system planning, operation and decision making. Based on an integration of two machine learning techniques: Bayesian clustering by dynamics (BCD) and support vector regression (SVR), this paper proposes a novel forecasting model for day ahead electricity load forecasting. The proposed model adopts an integrated architecture to handle the non-stationarity of time series. Firstly, a BCD classifier is applied to cluster the input data set into several subsets by the dynamics of the time series in an unsupervised manner. Then, groups of SVRs are used to fit the training data of each subset in a supervised way. The effectiveness of the proposed model is demonstrated with actual data taken from the New York ISO and the Western Farmers Electric Cooperative in Oklahoma. (author)

  20. Machine learning based switching model for electricity load forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Fan Shu [Energy Systems Research Center, University of Texas at Arlington, 416 S. College Street, Arlington, TX 76019 (United States); Chen Luonan [Department of Electronics, Information and Communication Engineering, Osaka Sangyo University, 3-1-1 Nakagaito, Daito, Osaka 574-0013 (Japan); Lee, Weijen [Energy Systems Research Center, University of Texas at Arlington, 416 S. College Street, Arlington, TX 76019 (United States)], E-mail: wlee@uta.edu

    2008-06-15

    In deregulated power markets, forecasting electricity loads is one of the most essential tasks for system planning, operation and decision making. Based on an integration of two machine learning techniques: Bayesian clustering by dynamics (BCD) and support vector regression (SVR), this paper proposes a novel forecasting model for day ahead electricity load forecasting. The proposed model adopts an integrated architecture to handle the non-stationarity of time series. Firstly, a BCD classifier is applied to cluster the input data set into several subsets by the dynamics of the time series in an unsupervised manner. Then, groups of SVRs are used to fit the training data of each subset in a supervised way. The effectiveness of the proposed model is demonstrated with actual data taken from the New York ISO and the Western Farmers Electric Cooperative in Oklahoma.

  1. A Machine Learning Based Intrusion Impact Analysis Scheme for Clouds

    Directory of Open Access Journals (Sweden)

    Junaid Arshad

    2012-01-01

    Full Text Available Clouds represent a major paradigm shift, inspiring the contemporary approach to computing. They present fascinating opportunities to address dynamic user requirements with the provision of on demand expandable computing infrastructures. However, Clouds introduce novel security challenges which need to be addressed to facilitate widespread adoption. This paper is focused on one such challenge - intrusion impact analysis. In particular, we highlight the significance of intrusion impact analysis for the overall security of Clouds. Additionally, we present a machine learning based scheme to address this challenge in accordance with the specific requirements of Clouds for intrusion impact analysis. We also present rigorous evaluation performed to assess the effectiveness and feasibility of the proposed method to address this challenge for Clouds. The evaluation results demonstrate high degree of effectiveness to correctly determine the impact of an intrusion along with significant reduction with respect to the intrusion response time.

  2. METAPHOR: Probability density estimation for machine learning based photometric redshifts

    Science.gov (United States)

    Amaro, V.; Cavuoti, S.; Brescia, M.; Vellucci, C.; Tortora, C.; Longo, G.

    2017-06-01

    We present METAPHOR (Machine-learning Estimation Tool for Accurate PHOtometric Redshifts), a method able to provide a reliable PDF for photometric galaxy redshifts estimated through empirical techniques. METAPHOR is a modular workflow, mainly based on the MLPQNA neural network as internal engine to derive photometric galaxy redshifts, but giving the possibility to easily replace MLPQNA with any other method to predict photo-z's and their PDF. We present here the results about a validation test of the workflow on the galaxies from SDSS-DR9, showing also the universality of the method by replacing MLPQNA with KNN and Random Forest models. The validation test include also a comparison with the PDF's derived from a traditional SED template fitting method (Le Phare).

  3. Machine learning based switching model for electricity load forecasting

    International Nuclear Information System (INIS)

    Fan Shu; Chen Luonan; Lee, Weijen

    2008-01-01

    In deregulated power markets, forecasting electricity loads is one of the most essential tasks for system planning, operation and decision making. Based on an integration of two machine learning techniques: Bayesian clustering by dynamics (BCD) and support vector regression (SVR), this paper proposes a novel forecasting model for day ahead electricity load forecasting. The proposed model adopts an integrated architecture to handle the non-stationarity of time series. Firstly, a BCD classifier is applied to cluster the input data set into several subsets by the dynamics of the time series in an unsupervised manner. Then, groups of SVRs are used to fit the training data of each subset in a supervised way. The effectiveness of the proposed model is demonstrated with actual data taken from the New York ISO and the Western Farmers Electric Cooperative in Oklahoma

  4. Machine learning based global particle indentification algorithms at LHCb experiment

    CERN Multimedia

    Derkach, Denis; Likhomanenko, Tatiana; Rogozhnikov, Aleksei; Ratnikov, Fedor

    2017-01-01

    One of the most important aspects of data processing at LHC experiments is the particle identification (PID) algorithm. In LHCb, several different sub-detector systems provide PID information: the Ring Imaging CHerenkov (RICH) detector, the hadronic and electromagnetic calorimeters, and the muon chambers. To improve charged particle identification, several neural networks including a deep architecture and gradient boosting have been applied to data. These new approaches provide higher identification efficiencies than existing implementations for all charged particle types. It is also necessary to achieve a flat dependency between efficiencies and spectator variables such as particle momentum, in order to reduce systematic uncertainties during later stages of data analysis. For this purpose, "flat” algorithms that guarantee the flatness property for efficiencies have also been developed. This talk presents this new approach based on machine learning and its performance.

  5. Thutmose - Investigation of Machine Learning-Based Intrusion Detection Systems

    Science.gov (United States)

    2016-06-01

    monitoring. This analyzed payload is within the application layer of the OSI model . The analysis tries to establish whether or not the payload is...24 3.2.5 Model Drift Experiments...ADVERSARIAL ENVIRONMENTS (SPIE DSS 2014) .................................................. 58 APPENDIX C - EVALUATING MODEL DRIFT IN MACHINE LEARNING

  6. Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning.

    Science.gov (United States)

    van Ginneken, Bram

    2017-03-01

    Half a century ago, the term "computer-aided diagnosis" (CAD) was introduced in the scientific literature. Pulmonary imaging, with chest radiography and computed tomography, has always been one of the focus areas in this field. In this study, I describe how machine learning became the dominant technology for tackling CAD in the lungs, generally producing better results than do classical rule-based approaches, and how the field is now rapidly changing: in the last few years, we have seen how even better results can be obtained with deep learning. The key differences among rule-based processing, machine learning, and deep learning are summarized and illustrated for various applications of CAD in the chest.

  7. Housing Value Forecasting Based on Machine Learning Methods

    OpenAIRE

    Mu, Jingyi; Wu, Fang; Zhang, Aihua

    2014-01-01

    In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing...

  8. Using machine learning to accelerate sampling-based inversion

    Science.gov (United States)

    Valentine, A. P.; Sambridge, M.

    2017-12-01

    In most cases, a complete solution to a geophysical inverse problem (including robust understanding of the uncertainties associated with the result) requires a sampling-based approach. However, the computational burden is high, and proves intractable for many problems of interest. There is therefore considerable value in developing techniques that can accelerate sampling procedures.The main computational cost lies in evaluation of the forward operator (e.g. calculation of synthetic seismograms) for each candidate model. Modern machine learning techniques-such as Gaussian Processes-offer a route for constructing a computationally-cheap approximation to this calculation, which can replace the accurate solution during sampling. Importantly, the accuracy of the approximation can be refined as inversion proceeds, to ensure high-quality results.In this presentation, we describe and demonstrate this approach-which can be seen as an extension of popular current methods, such as the Neighbourhood Algorithm, and bridges the gap between prior- and posterior-sampling frameworks.

  9. Recent machine learning advancements in sensor-based mobility analysis: Deep learning for Parkinson's disease assessment.

    Science.gov (United States)

    Eskofier, Bjoern M; Lee, Sunghoon I; Daneault, Jean-Francois; Golabchi, Fatemeh N; Ferreira-Carvalho, Gabriela; Vergara-Diaz, Gloria; Sapienza, Stefano; Costante, Gianluca; Klucken, Jochen; Kautz, Thomas; Bonato, Paolo

    2016-08-01

    The development of wearable sensors has opened the door for long-term assessment of movement disorders. However, there is still a need for developing methods suitable to monitor motor symptoms in and outside the clinic. The purpose of this paper was to investigate deep learning as a method for this monitoring. Deep learning recently broke records in speech and image classification, but it has not been fully investigated as a potential approach to analyze wearable sensor data. We collected data from ten patients with idiopathic Parkinson's disease using inertial measurement units. Several motor tasks were expert-labeled and used for classification. We specifically focused on the detection of bradykinesia. For this, we compared standard machine learning pipelines with deep learning based on convolutional neural networks. Our results showed that deep learning outperformed other state-of-the-art machine learning algorithms by at least 4.6 % in terms of classification rate. We contribute a discussion of the advantages and disadvantages of deep learning for sensor-based movement assessment and conclude that deep learning is a promising method for this field.

  10. Towards a Standard-based Domain-specific Platform to Solve Machine Learning-based Problems

    Directory of Open Access Journals (Sweden)

    Vicente García-Díaz

    2015-12-01

    Full Text Available Machine learning is one of the most important subfields of computer science and can be used to solve a variety of interesting artificial intelligence problems. There are different languages, framework and tools to define the data needed to solve machine learning-based problems. However, there is a great number of very diverse alternatives which makes it difficult the intercommunication, portability and re-usability of the definitions, designs or algorithms that any developer may create. In this paper, we take the first step towards a language and a development environment independent of the underlying technologies, allowing developers to design solutions to solve machine learning-based problems in a simple and fast way, automatically generating code for other technologies. That can be considered a transparent bridge among current technologies. We rely on Model-Driven Engineering approach, focusing on the creation of models to abstract the definition of artifacts from the underlying technologies.

  11. An Interactive Web-based Learning System for Assisting Machining Technology Education

    Directory of Open Access Journals (Sweden)

    Min Jou

    2008-05-01

    Full Text Available The key technique of manufacturing methods is machining. The degree of technique of machining directly affects the quality of the product. Therefore, the machining technique is of primary importance in promoting student practice ability during the training process. Currently, practical training is applied in shop floor to discipline student’s practice ability. Much time and cost are used to teach these techniques. Particularly, computerized machines are continuously increasing in use. The development of educating engineers on computerized machines becomes much more difficult than with traditional machines. This is because of the limitation of the extremely expensive cost of teaching. The quality and quantity of teaching cannot always be promoted in this respect. The traditional teaching methods can not respond well to the needs of the future. Therefore, this research aims to the following topics; (1.Propose the teaching strategies for the students to learning machining processing planning through web-based learning system. (2.Establish on-line teaching material for the computer-aided manufacturing courses including CNC coding method, CNC simulation. (3.Develop the virtual machining laboratory to bring the machining practical training to web-based learning system. (4.Integrate multi-media and virtual laboratory in the developed e-learning web-based system to enhance the effectiveness of machining education through web-based system.

  12. Housing Value Forecasting Based on Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Jingyi Mu

    2014-01-01

    Full Text Available In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM, least squares support vector machine (LSSVM, and partial least squares (PLS methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.

  13. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  14. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  15. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  16. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    Science.gov (United States)

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  17. The Value Simulation-Based Learning Added to Machining Technology in Singapore

    Science.gov (United States)

    Fang, Linda; Tan, Hock Soon; Thwin, Mya Mya; Tan, Kim Cheng; Koh, Caroline

    2011-01-01

    This study seeks to understand the value simulation-based learning (SBL) added to the learning of Machining Technology in a 15-week core subject course offered to university students. The research questions were: (1) How did SBL enhance classroom learning? (2) How did SBL help participants in their test? (3) How did SBL prepare participants for…

  18. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

    OpenAIRE

    Weng, Wei-Hung; Wagholikar, Kavishwar B.; McCray, Alexa T.; Szolovits, Peter; Chueh, Henry C.

    2017-01-01

    Background The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. Methods We constructed the pipeline using the clinical ...

  19. Machine Learning for Security

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  20. Trip Travel Time Forecasting Based on Selective Forgetting Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Zhiming Gui

    2014-01-01

    Full Text Available Travel time estimation on road networks is a valuable traffic metric. In this paper, we propose a machine learning based method for trip travel time estimation in road networks. The method uses the historical trip information extracted from taxis trace data as the training data. An optimized online sequential extreme machine, selective forgetting extreme learning machine, is adopted to make the prediction. Its selective forgetting learning ability enables the prediction algorithm to adapt to trip conditions changes well. Experimental results using real-life taxis trace data show that the forecasting model provides an effective and practical way for the travel time forecasting.

  1. Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method

    Directory of Open Access Journals (Sweden)

    Muneki Yasuda

    2018-04-01

    Full Text Available The machine learning techniques for Markov random fields are fundamental in various fields involving pattern recognition, image processing, sparse modeling, and earth science, and a Boltzmann machine is one of the most important models in Markov random fields. However, the inference and learning problems in the Boltzmann machine are NP-hard. The investigation of an effective learning algorithm for the Boltzmann machine is one of the most important challenges in the field of statistical machine learning. In this paper, we study Boltzmann machine learning based on the (first-order spatial Monte Carlo integration method, referred to as the 1-SMCI learning method, which was proposed in the author’s previous paper. In the first part of this paper, we compare the method with the maximum pseudo-likelihood estimation (MPLE method using a theoretical and a numerical approaches, and show the 1-SMCI learning method is more effective than the MPLE. In the latter part, we compare the 1-SMCI learning method with other effective methods, ratio matching and minimum probability flow, using a numerical experiment, and show the 1-SMCI learning method outperforms them.

  2. Neural architecture design based on extreme learning machine.

    Science.gov (United States)

    Bueno-Crespo, Andrés; García-Laencina, Pedro J; Sancho-Gómez, José-Luis

    2013-12-01

    Selection of the optimal neural architecture to solve a pattern classification problem entails to choose the relevant input units, the number of hidden neurons and its corresponding interconnection weights. This problem has been widely studied in many research works but their solutions usually involve excessive computational cost in most of the problems and they do not provide a unique solution. This paper proposes a new technique to efficiently design the MultiLayer Perceptron (MLP) architecture for classification using the Extreme Learning Machine (ELM) algorithm. The proposed method provides a high generalization capability and a unique solution for the architecture design. Moreover, the selected final network only retains those input connections that are relevant for the classification task. Experimental results show these advantages. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. An efficient flow-based botnet detection using supervised machine learning

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2014-01-01

    Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper...... introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs...... to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates...

  4. Spike sorting based upon machine learning algorithms (SOMA).

    Science.gov (United States)

    Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F

    2007-02-15

    We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.

  5. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  6. Deep learning: Using machine learning to study biological vision

    OpenAIRE

    Majaj, Najib; Pelli, Denis

    2017-01-01

    Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.

  7. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  8. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

    Science.gov (United States)

    Zeng, Xueqiang; Luo, Gang

    2017-12-01

    Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

  9. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  10. Introduction to machine learning

    OpenAIRE

    Baştanlar, Yalın; Özuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning app...

  11. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  12. Machine-Learning Research

    OpenAIRE

    Dietterich, Thomas G.

    1997-01-01

    Machine-learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (1) the improvement of classification accuracy by learning ensembles of classifiers, (2) methods for scaling up supervised learning algorithms, (3) reinforcement learning, and (4) the learning of complex stochastic models.

  13. Online transfer learning with extreme learning machine

    Science.gov (United States)

    Yin, Haibo; Yang, Yun-an

    2017-05-01

    In this paper, we propose a new transfer learning algorithm for online training. The proposed algorithm, which is called Online Transfer Extreme Learning Machine (OTELM), is based on Online Sequential Extreme Learning Machine (OSELM) while it introduces Semi-Supervised Extreme Learning Machine (SSELM) to transfer knowledge from the source to the target domain. With the manifold regularization, SSELM picks out instances from the source domain that are less relevant to those in the target domain to initialize the online training, so as to improve the classification performance. Experimental results demonstrate that the proposed OTELM can effectively use instances in the source domain to enhance the learning performance.

  14. Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model

    Science.gov (United States)

    Pathak, Jaideep; Wikner, Alexander; Fussell, Rebeckah; Chandra, Sarthak; Hunt, Brian R.; Girvan, Michelle; Ott, Edward

    2018-04-01

    A model-based approach to forecasting chaotic dynamical systems utilizes knowledge of the mechanistic processes governing the dynamics to build an approximate mathematical model of the system. In contrast, machine learning techniques have demonstrated promising results for forecasting chaotic systems purely from past time series measurements of system state variables (training data), without prior knowledge of the system dynamics. The motivation for this paper is the potential of machine learning for filling in the gaps in our underlying mechanistic knowledge that cause widely-used knowledge-based models to be inaccurate. Thus, we here propose a general method that leverages the advantages of these two approaches by combining a knowledge-based model and a machine learning technique to build a hybrid forecasting scheme. Potential applications for such an approach are numerous (e.g., improving weather forecasting). We demonstrate and test the utility of this approach using a particular illustrative version of a machine learning known as reservoir computing, and we apply the resulting hybrid forecaster to a low-dimensional chaotic system, as well as to a high-dimensional spatiotemporal chaotic system. These tests yield extremely promising results in that our hybrid technique is able to accurately predict for a much longer period of time than either its machine-learning component or its model-based component alone.

  15. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  16. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  17. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  18. Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts

    OpenAIRE

    Jonsson, Leif; Borg, Markus; Broman, David; Sandahl, Kristian; Eldh, Sigrid; Runeson, Per

    2016-01-01

    Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learni...

  19. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  20. Exploring machine-learning-based control plane intrusion detection techniques in software defined optical networks

    Science.gov (United States)

    Zhang, Huibin; Wang, Yuqiao; Chen, Haoran; Zhao, Yongli; Zhang, Jie

    2017-12-01

    In software defined optical networks (SDON), the centralized control plane may encounter numerous intrusion threatens which compromise the security level of provisioned services. In this paper, the issue of control plane security is studied and two machine-learning-based control plane intrusion detection techniques are proposed for SDON with properly selected features such as bandwidth, route length, etc. We validate the feasibility and efficiency of the proposed techniques by simulations. Results show an accuracy of 83% for intrusion detection can be achieved with the proposed machine-learning-based control plane intrusion detection techniques.

  1. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    Science.gov (United States)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  2. Research into Financial Position of Listed Companies following Classification via Extreme Learning Machine Based upon DE Optimization

    OpenAIRE

    Fu Yu; Mu Jiong; Duan Xu Liang

    2016-01-01

    By means of the model of extreme learning machine based upon DE optimization, this article particularly centers on the optimization thinking of such a model as well as its application effect in the field of listed company’s financial position classification. It proves that the improved extreme learning machine algorithm based upon DE optimization eclipses the traditional extreme learning machine algorithm following comparison. Meanwhile, this article also intends to introduce certain research...

  3. Supervised machine learning algorithms to diagnose stress for vehicle drivers based on physiological sensor signals.

    Science.gov (United States)

    Barua, Shaibal; Begum, Shahina; Ahmed, Mobyen Uddin

    2015-01-01

    Machine learning algorithms play an important role in computer science research. Recent advancement in sensor data collection in clinical sciences lead to a complex, heterogeneous data processing, and analysis for patient diagnosis and prognosis. Diagnosis and treatment of patients based on manual analysis of these sensor data are difficult and time consuming. Therefore, development of Knowledge-based systems to support clinicians in decision-making is important. However, it is necessary to perform experimental work to compare performances of different machine learning methods to help to select appropriate method for a specific characteristic of data sets. This paper compares classification performance of three popular machine learning methods i.e., case-based reasoning, neutral networks and support vector machine to diagnose stress of vehicle drivers using finger temperature and heart rate variability. The experimental results show that case-based reasoning outperforms other two methods in terms of classification accuracy. Case-based reasoning has achieved 80% and 86% accuracy to classify stress using finger temperature and heart rate variability. On contrary, both neural network and support vector machine have achieved less than 80% accuracy by using both physiological signals.

  4. Robust Visual Knowledge Transfer via Extreme Learning Machine Based Domain Adaptation.

    Science.gov (United States)

    Zhang, Lei; Zhang, David

    2016-08-10

    We address the problem of visual knowledge adaptation by leveraging labeled patterns from source domain and a very limited number of labeled instances in target domain to learn a robust classifier for visual categorization. This paper proposes a new extreme learning machine based cross-domain network learning framework, that is called Extreme Learning Machine (ELM) based Domain Adaptation (EDA). It allows us to learn a category transformation and an ELM classifier with random projection by minimizing the -norm of the network output weights and the learning error simultaneously. The unlabeled target data, as useful knowledge, is also integrated as a fidelity term to guarantee the stability during cross domain learning. It minimizes the matching error between the learned classifier and a base classifier, such that many existing classifiers can be readily incorporated as base classifiers. The network output weights cannot only be analytically determined, but also transferrable. Additionally, a manifold regularization with Laplacian graph is incorporated, such that it is beneficial to semi-supervised learning. Extensively, we also propose a model of multiple views, referred as MvEDA. Experiments on benchmark visual datasets for video event recognition and object recognition, demonstrate that our EDA methods outperform existing cross-domain learning methods.

  5. Creativity in Machine Learning

    OpenAIRE

    Thoma, Martin

    2016-01-01

    Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summary of the current work and give people who are new to machine learning some starting points.

  6. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach.

    Science.gov (United States)

    Pasupa, Kitsuchart; Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets-Maximum Unbiased Validation Dataset-which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6.

  7. Fault Diagnosis for Engine Based on Single-Stage Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Fei Gao

    2016-01-01

    Full Text Available Single-Stage Extreme Learning Machine (SS-ELM is presented to dispose of the mechanical fault diagnosis in this paper. Based on it, the traditional mapping type of extreme learning machine (ELM has been changed and the eigenvectors extracted from signal processing methods are directly regarded as outputs of the network’s hidden layer. Then the uncertainty that training data transformed from the input space to the ELM feature space with the ELM mapping and problem of the selection of the hidden nodes are avoided effectively. The experiment results of diesel engine fault diagnosis show good performance of the SS-ELM algorithm.

  8. Research into Financial Position of Listed Companies following Classification via Extreme Learning Machine Based upon DE Optimization

    Directory of Open Access Journals (Sweden)

    Fu Yu

    2016-01-01

    Full Text Available By means of the model of extreme learning machine based upon DE optimization, this article particularly centers on the optimization thinking of such a model as well as its application effect in the field of listed company’s financial position classification. It proves that the improved extreme learning machine algorithm based upon DE optimization eclipses the traditional extreme learning machine algorithm following comparison. Meanwhile, this article also intends to introduce certain research thinking concerning extreme learning machine into the economics classification area so as to fulfill the purpose of computerizing the speedy but effective evaluation of massive financial statements of listed companies pertain to different classes

  9. DROUGHT FORECASTING BASED ON MACHINE LEARNING OF REMOTE SENSING AND LONG-RANGE FORECAST DATA

    Directory of Open Access Journals (Sweden)

    J. Rhee

    2016-06-01

    Full Text Available The reduction of drought impacts may be achieved through sustainable drought management and proactive measures against drought disaster. Accurate and timely provision of drought information is essential. In this study, drought forecasting models to provide high-resolution drought information based on drought indicators for ungauged areas were developed. The developed models predict drought indices of the 6-month Standardized Precipitation Index (SPI6 and the 6-month Standardized Precipitation Evapotranspiration Index (SPEI6. An interpolation method based on multiquadric spline interpolation method as well as three machine learning models were tested. Three machine learning models of Decision Tree, Random Forest, and Extremely Randomized Trees were tested to enhance the provision of drought initial conditions based on remote sensing data, since initial conditions is one of the most important factors for drought forecasting. Machine learning-based methods performed better than interpolation methods for both classification and regression, and the methods using climatology data outperformed the methods using long-range forecast. The model based on climatological data and the machine learning method outperformed overall.

  10. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    OpenAIRE

    Hamed Hassanzadeh; MohammadReza Keyvanpour

    2011-01-01

    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as ...

  11. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  12. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

    Science.gov (United States)

    Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546

  13. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach.

    Science.gov (United States)

    Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

  14. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    Science.gov (United States)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  15. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  16. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    Science.gov (United States)

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.

  17. Improved method for SNR prediction in machine-learning-based test

    NARCIS (Netherlands)

    Sheng, Xiaoqin; Kerkhoff, Hans G.

    2010-01-01

    This paper applies an improved method for testing the signal-to-noise ratio (SNR) of Analogue-to-Digital Converters (ADC). In previous work, a noisy and nonlinear pulse signal is exploited as the input stimulus to obtain the signature results of ADC. By applying a machine-learning-based approach,

  18. VATE: VAlidation of high TEchnology based on large database analysis by learning machine

    NARCIS (Netherlands)

    Meldolesi, E; Van Soest, J; Alitto, A R; Autorino, R; Dinapoli, N; Dekker, A; Gambacorta, M A; Gatta, R; Tagliaferri, L; Damiani, A; Valentini, V

    2014-01-01

    The interaction between implementation of new technologies and different outcomes can allow a broad range of researches to be expanded. The purpose of this paper is to introduce the VAlidation of high TEchnology based on large database analysis by learning machine (VATE) project that aims to combine

  19. Quantum Machine Learning

    OpenAIRE

    Romero García, Cristian

    2017-01-01

    [EN] In a world in which accessible information grows exponentially, the selection of the appropriate information turns out to be an extremely relevant problem. In this context, the idea of Machine Learning (ML), a subfield of Artificial Intelligence, emerged to face problems in data mining, pattern recognition, automatic prediction, among others. Quantum Machine Learning is an interdisciplinary research area combining quantum mechanics with methods of ML, in which quantum properties allow fo...

  20. Performance of machine learning methods for ligand-based virtual screening.

    Science.gov (United States)

    Plewczynski, Dariusz; Spieser, Stéphane A H; Koch, Uwe

    2009-05-01

    Computational screening of compound databases has become increasingly popular in pharmaceutical research. This review focuses on the evaluation of ligand-based virtual screening using active compounds as templates in the context of drug discovery. Ligand-based screening techniques are based on comparative molecular similarity analysis of compounds with known and unknown activity. We provide an overview of publications that have evaluated different machine learning methods, such as support vector machines, decision trees, ensemble methods such as boosting, bagging and random forests, clustering methods, neuronal networks, naïve Bayesian, data fusion methods and others.

  1. Automatic vetting of planet candidates from ground based surveys: Machine learning with NGTS

    Science.gov (United States)

    Armstrong, David J.; Günther, Maximilian N.; McCormac, James; Smith, Alexis M. S.; Bayliss, Daniel; Bouchy, François; Burleigh, Matthew R.; Casewell, Sarah; Eigmüller, Philipp; Gillen, Edward; Goad, Michael R.; Hodgkin, Simon T.; Jenkins, James S.; Louden, Tom; Metrailler, Lionel; Pollacco, Don; Poppenhaeger, Katja; Queloz, Didier; Raynard, Liam; Rauer, Heike; Udry, Stéphane; Walker, Simon R.; Watson, Christopher A.; West, Richard G.; Wheatley, Peter J.

    2018-05-01

    State of the art exoplanet transit surveys are producing ever increasing quantities of data. To make the best use of this resource, in detecting interesting planetary systems or in determining accurate planetary population statistics, requires new automated methods. Here we describe a machine learning algorithm that forms an integral part of the pipeline for the NGTS transit survey, demonstrating the efficacy of machine learning in selecting planetary candidates from multi-night ground based survey data. Our method uses a combination of random forests and self-organising-maps to rank planetary candidates, achieving an AUC score of 97.6% in ranking 12368 injected planets against 27496 false positives in the NGTS data. We build on past examples by using injected transit signals to form a training set, a necessary development for applying similar methods to upcoming surveys. We also make the autovet code used to implement the algorithm publicly accessible. autovet is designed to perform machine learned vetting of planetary candidates, and can utilise a variety of methods. The apparent robustness of machine learning techniques, whether on space-based or the qualitatively different ground-based data, highlights their importance to future surveys such as TESS and PLATO and the need to better understand their advantages and pitfalls in an exoplanetary context.

  2. Evolving Neural Turing Machines for Reward-based Learning

    DEFF Research Database (Denmark)

    Greve, Rasmus Boll; Jacobsen, Emil Juul; Risi, Sebastian

    2016-01-01

    An unsolved problem in neuroevolution (NE) is to evolve artificial neural networks (ANN) that can store and use information to change their behavior online. While plastic neural networks have shown promise in this context, they have difficulties retaining information over longer periods of time...... version of the double T-Maze, a complex reinforcement-like learning problem. In the T-Maze learning task the agent uses the memory bank to display adaptive behavior that normally requires a plastic ANN, thereby suggesting a complementary and effective mechanism for adaptive behavior in NE....

  3. Machine learning-based methods for prediction of linear B-cell epitopes.

    Science.gov (United States)

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  4. Machine learning systems

    Energy Technology Data Exchange (ETDEWEB)

    Forsyth, R

    1984-05-01

    With the dramatic rise of expert systems has come a renewed interest in the fuel that drives them-knowledge. For it is specialist knowledge which gives expert systems their power. But extracting knowledge from human experts in symbolic form has proved arduous and labour-intensive. So the idea of machine learning is enjoying a renaissance. Machine learning is any automatic improvement in the performance of a computer system over time, as a result of experience. Thus a learning algorithm seeks to do one or more of the following: cover a wider range of problems, deliver more accurate solutions, obtain answers more cheaply, and simplify codified knowledge. 6 references.

  5. Extreme Learning Machine and Moving Least Square Regression Based Solar Panel Vision Inspection

    Directory of Open Access Journals (Sweden)

    Heng Liu

    2017-01-01

    Full Text Available In recent years, learning based machine intelligence has aroused a lot of attention across science and engineering. Particularly in the field of automatic industry inspection, the machine learning based vision inspection plays a more and more important role in defect identification and feature extraction. Through learning from image samples, many features of industry objects, such as shapes, positions, and orientations angles, can be obtained and then can be well utilized to determine whether there is defect or not. However, the robustness and the quickness are not easily achieved in such inspection way. In this work, for solar panel vision inspection, we present an extreme learning machine (ELM and moving least square regression based approach to identify solder joint defect and detect the panel position. Firstly, histogram peaks distribution (HPD and fractional calculus are applied for image preprocessing. Then an ELM-based defective solder joints identification is discussed in detail. Finally, moving least square regression (MLSR algorithm is introduced for solar panel position determination. Experimental results and comparisons show that the proposed ELM and MLSR based inspection method is efficient not only in detection accuracy but also in processing speed.

  6. Differentiation of Enhancing Glioma and Primary Central Nervous System Lymphoma by Texture-Based Machine Learning.

    Science.gov (United States)

    Alcaide-Leon, P; Dufort, P; Geraldo, A F; Alshafai, L; Maralani, P J; Spears, J; Bharatha, A

    2017-06-01

    Accurate preoperative differentiation of primary central nervous system lymphoma and enhancing glioma is essential to avoid unnecessary neurosurgical resection in patients with primary central nervous system lymphoma. The purpose of the study was to evaluate the diagnostic performance of a machine-learning algorithm by using texture analysis of contrast-enhanced T1-weighted images for differentiation of primary central nervous system lymphoma and enhancing glioma. Seventy-one adult patients with enhancing gliomas and 35 adult patients with primary central nervous system lymphomas were included. The tumors were manually contoured on contrast-enhanced T1WI, and the resulting volumes of interest were mined for textural features and subjected to a support vector machine-based machine-learning protocol. Three readers classified the tumors independently on contrast-enhanced T1WI. Areas under the receiver operating characteristic curves were estimated for each reader and for the support vector machine classifier. A noninferiority test for diagnostic accuracy based on paired areas under the receiver operating characteristic curve was performed with a noninferiority margin of 0.15. The mean areas under the receiver operating characteristic curve were 0.877 (95% CI, 0.798-0.955) for the support vector machine classifier; 0.878 (95% CI, 0.807-0.949) for reader 1; 0.899 (95% CI, 0.833-0.966) for reader 2; and 0.845 (95% CI, 0.757-0.933) for reader 3. The mean area under the receiver operating characteristic curve of the support vector machine classifier was significantly noninferior to the mean area under the curve of reader 1 ( P = .021), reader 2 ( P = .035), and reader 3 ( P = .007). Support vector machine classification based on textural features of contrast-enhanced T1WI is noninferior to expert human evaluation in the differentiation of primary central nervous system lymphoma and enhancing glioma. © 2017 by American Journal of Neuroradiology.

  7. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles.

    Science.gov (United States)

    Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua

    2018-04-25

    Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.

  8. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  9. The influence of negative training set size on machine learning-based virtual screening.

    Science.gov (United States)

    Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J

    2014-01-01

    The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.

  10. A Comparison Study of Machine Learning Based Algorithms for Fatigue Crack Growth Calculation.

    Science.gov (United States)

    Wang, Hongxun; Zhang, Weifang; Sun, Fuqiang; Zhang, Wei

    2017-05-18

    The relationships between the fatigue crack growth rate ( d a / d N ) and stress intensity factor range ( Δ K ) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are diverse in different materials. However, most existing fatigue crack growth models cannot handle these nonlinearities appropriately. The machine learning method provides a flexible approach to the modeling of fatigue crack growth because of its excellent nonlinear approximation and multivariable learning ability. In this paper, a fatigue crack growth calculation method is proposed based on three different machine learning algorithms (MLAs): extreme learning machine (ELM), radial basis function network (RBFN) and genetic algorithms optimized back propagation network (GABP). The MLA based method is validated using testing data of different materials. The three MLAs are compared with each other as well as the classical two-parameter model ( K * approach). The results show that the predictions of MLAs are superior to those of K * approach in accuracy and effectiveness, and the ELM based algorithms show overall the best agreement with the experimental data out of the three MLAs, for its global optimization and extrapolation ability.

  11. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  12. Machine Learning in Medicine

    Science.gov (United States)

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  13. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  14. Designing “Theory of Machines and Mechanisms” course on Project Based Learning approach

    DEFF Research Database (Denmark)

    Shinde, Vikas

    2013-01-01

    by the industry and the learning outcomes specified by the National Board of Accreditation (NBA), India; this course is restructured on Project Based Learning approach. A mini project is designed to suit course objectives. An objective of this paper is to discuss the rationale of this course design......Theory of Machines and Mechanisms course is one of the essential courses of Mechanical Engineering undergraduate curriculum practiced at Indian Institute. Previously, this course was taught by traditional instruction based pedagogy. In order to achieve profession specific skills demanded...... and the process followed to design a project which meets diverse objectives....

  15. Clojure for machine learning

    CERN Document Server

    Wali, Akhil

    2014-01-01

    A book that brings out the strengths of Clojure programming that have to facilitate machine learning. Each topic is described in substantial detail, and examples and libraries in Clojure are also demonstrated.This book is intended for Clojure developers who want to explore the area of machine learning. Basic understanding of the Clojure programming language is required, but thorough acquaintance with the standard Clojure library or any libraries are not required. Familiarity with theoretical concepts and notation of mathematics and statistics would be an added advantage.

  16. Towards large-scale FAME-based bacterial species identification using machine learning techniques.

    Science.gov (United States)

    Slabbinck, Bram; De Baets, Bernard; Dawyndt, Peter; De Vos, Paul

    2009-05-01

    In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species

  17. Use of machine learning methods to classify Universities based on the income structure

    Science.gov (United States)

    Terlyga, Alexandra; Balk, Igor

    2017-10-01

    In this paper we discuss use of machine learning methods such as self organizing maps, k-means and Ward’s clustering to perform classification of universities based on their income. This classification will allow us to quantitate classification of universities as teaching, research, entrepreneur, etc. which is important tool for government, corporations and general public alike in setting expectation and selecting universities to achieve different goals.

  18. PredPsych: A toolbox for predictive machine learning based approach in experimental psychology research

    OpenAIRE

    Cavallo, Andrea; Becchio, Cristina; Koul, Atesh

    2016-01-01

    Recent years have seen an increased interest in machine learning based predictive methods for analysing quantitative behavioural data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible software framework. The goal of this work was to build an open-source toolbox – “PredPsych” – that could make these methods readily available to all psychologists. PredPsych is a...

  19. Machine learning of radial basis function neural network based on Kalman filter: Introduction

    Directory of Open Access Journals (Sweden)

    Vuković Najdan L.

    2014-01-01

    Full Text Available This paper analyzes machine learning of radial basis function neural network based on Kalman filtering. Three algorithms are derived: linearized Kalman filter, linearized information filter and unscented Kalman filter. We emphasize basic properties of these estimation algorithms, demonstrate how their advantages can be used for optimization of network parameters, derive mathematical models and show how they can be applied to model problems in engineering practice.

  20. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  1. Accelerated Monte Carlo system reliability analysis through machine-learning-based surrogate models of network connectivity

    International Nuclear Information System (INIS)

    Stern, R.E.; Song, J.; Work, D.B.

    2017-01-01

    The two-terminal reliability problem in system reliability analysis is known to be computationally intractable for large infrastructure graphs. Monte Carlo techniques can estimate the probability of a disconnection between two points in a network by selecting a representative sample of network component failure realizations and determining the source-terminal connectivity of each realization. To reduce the runtime required for the Monte Carlo approximation, this article proposes an approximate framework in which the connectivity check of each sample is estimated using a machine-learning-based classifier. The framework is implemented using both a support vector machine (SVM) and a logistic regression based surrogate model. Numerical experiments are performed on the California gas distribution network using the epicenter and magnitude of the 1989 Loma Prieta earthquake as well as randomly-generated earthquakes. It is shown that the SVM and logistic regression surrogate models are able to predict network connectivity with accuracies of 99% for both methods, and are 1–2 orders of magnitude faster than using a Monte Carlo method with an exact connectivity check. - Highlights: • Surrogate models of network connectivity are developed by machine-learning algorithms. • Developed surrogate models can reduce the runtime required for Monte Carlo simulations. • Support vector machine and logistic regressions are employed to develop surrogate models. • Numerical example of California gas distribution network demonstrate the proposed approach. • The developed models have accuracies 99%, and are 1–2 orders of magnitude faster than MCS.

  2. Mlifdect: Android Malware Detection Based on Parallel Machine Learning and Information Fusion

    Directory of Open Access Journals (Sweden)

    Xin Wang

    2017-01-01

    Full Text Available In recent years, Android malware has continued to grow at an alarming rate. More recent malicious apps’ employing highly sophisticated detection avoidance techniques makes the traditional machine learning based malware detection methods far less effective. More specifically, they cannot cope with various types of Android malware and have limitation in detection by utilizing a single classification algorithm. To address this limitation, we propose a novel approach in this paper that leverages parallel machine learning and information fusion techniques for better Android malware detection, which is named Mlifdect. To implement this approach, we first extract eight types of features from static analysis on Android apps and build two kinds of feature sets after feature selection. Then, a parallel machine learning detection model is developed for speeding up the process of classification. Finally, we investigate the probability analysis based and Dempster-Shafer theory based information fusion approaches which can effectively obtain the detection results. To validate our method, other state-of-the-art detection works are selected for comparison with real-world Android apps. The experimental results demonstrate that Mlifdect is capable of achieving higher detection accuracy as well as a remarkable run-time efficiency compared to the existing malware detection solutions.

  3. A machine learning-based framework to identify type 2 diabetes through electronic health records.

    Science.gov (United States)

    Zheng, Tao; Xie, Wei; Xu, Liling; He, Xiaoying; Zhang, Ya; You, Mingrong; Yang, Gong; Chen, You

    2017-01-01

    To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014. We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature

  4. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop

  5. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    Science.gov (United States)

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie

  6. PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research.

    Science.gov (United States)

    Koul, Atesh; Becchio, Cristina; Cavallo, Andrea

    2017-12-12

    Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox - "PredPsych" - that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

  7. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  8. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  9. Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

    Science.gov (United States)

    Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

    2017-10-01

    Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.

  10. Pressure Prediction of Coal Slurry Transportation Pipeline Based on Particle Swarm Optimization Kernel Function Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Xue-cun Yang

    2015-01-01

    Full Text Available For coal slurry pipeline blockage prediction problem, through the analysis of actual scene, it is determined that the pressure prediction from each measuring point is the premise of pipeline blockage prediction. Kernel function of support vector machine is introduced into extreme learning machine, the parameters are optimized by particle swarm algorithm, and blockage prediction method based on particle swarm optimization kernel function extreme learning machine (PSOKELM is put forward. The actual test data from HuangLing coal gangue power plant are used for simulation experiments and compared with support vector machine prediction model optimized by particle swarm algorithm (PSOSVM and kernel function extreme learning machine prediction model (KELM. The results prove that mean square error (MSE for the prediction model based on PSOKELM is 0.0038 and the correlation coefficient is 0.9955, which is superior to prediction model based on PSOSVM in speed and accuracy and superior to KELM prediction model in accuracy.

  11. Classification of follicular lymphoma images: a holistic approach with symbol-based machine learning methods.

    Science.gov (United States)

    Zorman, Milan; Sánchez de la Rosa, José Luis; Dinevski, Dejan

    2011-12-01

    It is not very often to see a symbol-based machine learning approach to be used for the purpose of image classification and recognition. In this paper we will present such an approach, which we first used on the follicular lymphoma images. Lymphoma is a broad term encompassing a variety of cancers of the lymphatic system. Lymphoma is differentiated by the type of cell that multiplies and how the cancer presents itself. It is very important to get an exact diagnosis regarding lymphoma and to determine the treatments that will be most effective for the patient's condition. Our work was focused on the identification of lymphomas by finding follicles in microscopy images provided by the Laboratory of Pathology in the University Hospital of Tenerife, Spain. We divided our work in two stages: in the first stage we did image pre-processing and feature extraction, and in the second stage we used different symbolic machine learning approaches for pixel classification. Symbolic machine learning approaches are often neglected when looking for image analysis tools. They are not only known for a very appropriate knowledge representation, but also claimed to lack computational power. The results we got are very promising and show that symbolic approaches can be successful in image analysis applications.

  12. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  13. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  14. Can machine learning explain human learning?

    NARCIS (Netherlands)

    Vahdat, M.; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, G.W.M.

    2016-01-01

    Learning Analytics (LA) has a major interest in exploring and understanding the learning process of humans and, for this purpose, benefits from both Cognitive Science, which studies how humans learn, and Machine Learning, which studies how algorithms learn from data. Usually, Machine Learning is

  15. Discrimination of Rock Fracture and Blast Events Based on Signal Complexity and Machine Learning

    Directory of Open Access Journals (Sweden)

    Zilong Zhou

    2018-01-01

    Full Text Available The automatic discrimination of rock fracture and blast events is complex and challenging due to the similar waveform characteristics. To solve this problem, a new method based on the signal complexity analysis and machine learning has been proposed in this paper. First, the permutation entropy values of signals at different scale factors are calculated to reflect complexity of signals and constructed into a feature vector set. Secondly, based on the feature vector set, back-propagation neural network (BPNN as a means of machine learning is applied to establish a discriminator for rock fracture and blast events. Then to evaluate the classification performances of the new method, the classifying accuracies of support vector machine (SVM, naive Bayes classifier, and the new method are compared, and the receiver operating characteristic (ROC curves are also analyzed. The results show the new method obtains the best classification performances. In addition, the influence of different scale factor q and number of training samples n on discrimination results is discussed. It is found that the classifying accuracy of the new method reaches the highest value when q = 8–15 or 8–20 and n=140.

  16. An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.

    Science.gov (United States)

    Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein

    2017-12-22

    The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.

  17. Radar HRRP Target Recognition Based on Stacked Autoencoder and Extreme Learning Machine.

    Science.gov (United States)

    Zhao, Feixiang; Liu, Yongxiang; Huo, Kai; Zhang, Shuanghui; Zhang, Zhongshuai

    2018-01-10

    A novel radar high-resolution range profile (HRRP) target recognition method based on a stacked autoencoder (SAE) and extreme learning machine (ELM) is presented in this paper. As a key component of deep structure, the SAE does not only learn features by making use of data, it also obtains feature expressions at different levels of data. However, with the deep structure, it is hard to achieve good generalization performance with a fast learning speed. ELM, as a new learning algorithm for single hidden layer feedforward neural networks (SLFNs), has attracted great interest from various fields for its fast learning speed and good generalization performance. However, ELM needs more hidden nodes than conventional tuning-based learning algorithms due to the random set of input weights and hidden biases. In addition, the existing ELM methods cannot utilize the class information of targets well. To solve this problem, a regularized ELM method based on the class information of the target is proposed. In this paper, SAE and the regularized ELM are combined to make full use of their advantages and make up for each of their shortcomings. The effectiveness of the proposed method is demonstrated by experiments with measured radar HRRP data. The experimental results show that the proposed method can achieve good performance in the two aspects of real-time and accuracy, especially when only a few training samples are available.

  18. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

    Science.gov (United States)

    Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C

    2017-12-01

    The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised

  19. A Machine Learning-based Rainfall System for GPM Dual-frequency Radar

    Science.gov (United States)

    Tan, H.; Chandrasekar, V.; Chen, H.

    2017-12-01

    Precipitation measurement produced by the Global Precipitation Measurement (GPM) Dual-frequency Precipitation Radar (DPR) plays an important role in researching the water circle and forecasting extreme weather event. Compare with its predecessor - Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR), GRM DPR measures precipitation in two different frequencies (i.e., Ku and Ka band), which can provide detailed information on the microphysical properties of precipitation particles, quantify particle size distribution and quantitatively measure light rain and falling snow. This paper presents a novel Machine Learning system for ground-based and space borne radar rainfall estimation. The system first trains ground radar data for rainfall estimation using rainfall measurements from gauges and subsequently uses the ground radar based rainfall estimates to train GPM DPR data in order to get space based rainfall product. Therein, data alignment between space DPR and ground radar is conducted using the methodology proposed by Bolen and Chandrasekar (2013), which can minimize the effects of potential geometric distortion of GPM DPR observations. For demonstration purposes, rainfall measurements from three rain gauge networks near Melbourne, Florida, are used for training and validation purposes. These three gauge networks, which are located in Kennedy Space Center (KSC), South Florida Water Management District (SFL), and St. Johns Water Management District (STJ), include 33, 46, and 99 rain gauge stations, respectively. Collocated ground radar observations from the National Weather Service (NWS) Weather Surveillance Radar - 1988 Doppler (WSR-88D) in Melbourne (i.e., KMLB radar) are trained with the gauge measurements. The trained model is then used to derive KMLB radar based rainfall product, which is used to train GPM DPR data collected from coincident overpasses events. The machine learning based rainfall product is compared against the GPM standard products

  20. Student Modeling and Machine Learning

    OpenAIRE

    Sison , Raymund; Shimura , Masamichi

    1998-01-01

    After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine le...

  1. A planning quality evaluation tool for prostate adaptive IMRT based on machine learning

    International Nuclear Information System (INIS)

    Zhu Xiaofeng; Ge Yaorong; Li Taoran; Thongphiew, Danthai; Yin Fangfang; Wu, Q Jackie

    2011-01-01

    Purpose: To ensure plan quality for adaptive IMRT of the prostate, we developed a quantitative evaluation tool using a machine learning approach. This tool generates dose volume histograms (DVHs) of organs-at-risk (OARs) based on prior plans as a reference, to be compared with the adaptive plan derived from fluence map deformation. Methods: Under the same configuration using seven-field 15 MV photon beams, DVHs of OARs (bladder and rectum) were estimated based on anatomical information of the patient and a model learned from a database of high quality prior plans. In this study, the anatomical information was characterized by the organ volumes and distance-to-target histogram (DTH). The database consists of 198 high quality prostate plans and was validated with 14 cases outside the training pool. Principal component analysis (PCA) was applied to DVHs and DTHs to quantify their salient features. Then, support vector regression (SVR) was implemented to establish the correlation between the features of the DVH and the anatomical information. Results: DVH/DTH curves could be characterized sufficiently just using only two or three truncated principal components, thus, patient anatomical information was quantified with reduced numbers of variables. The evaluation of the model using the test data set demonstrated its accuracy ∼80% in prediction and effectiveness in improving ART planning quality. Conclusions: An adaptive IMRT plan quality evaluation tool based on machine learning has been developed, which estimates OAR sparing and provides reference in evaluating ART.

  2. Quantum Machine Learning

    Science.gov (United States)

    Biswas, Rupak

    2018-01-01

    Quantum computing promises an unprecedented ability to solve intractable problems by harnessing quantum mechanical effects such as tunneling, superposition, and entanglement. The Quantum Artificial Intelligence Laboratory (QuAIL) at NASA Ames Research Center is the space agency's primary facility for conducting research and development in quantum information sciences. QuAIL conducts fundamental research in quantum physics but also explores how best to exploit and apply this disruptive technology to enable NASA missions in aeronautics, Earth and space sciences, and space exploration. At the same time, machine learning has become a major focus in computer science and captured the imagination of the public as a panacea to myriad big data problems. In this talk, we will discuss how classical machine learning can take advantage of quantum computing to significantly improve its effectiveness. Although we illustrate this concept on a quantum annealer, other quantum platforms could be used as well. If explored fully and implemented efficiently, quantum machine learning could greatly accelerate a wide range of tasks leading to new technologies and discoveries that will significantly change the way we solve real-world problems.

  3. A User-Oriented Splog Filtering Based on a Machine Learning

    Science.gov (United States)

    Yoshinaka, Takayuki; Ishii, Soichi; Fukuhara, Tomohiro; Masuda, Hidetaka; Nakagawa, Hiroshi

    A method for filtering spam blogs (splogs) based on a machine learning technique, and its evaluation results are described. Today, spam blogs (splogs) became one of major issues on the Web. The problem of splogs is that values of blog sites are different by people. We propose a novel user-oriented splog filtering method that can adapt each user's preference for valuable blogs. We use the SVM(Support Vector Machine) for creating a personalized splog filter for each user. We had two experiments: (1) an experiment of individual splog judgement, and (2) an experiment for user oriented splog filtering. From the former experiment, we found existence of 'gray' blogs that are needed to treat by persons. From the latter experiment, we found that we can provide appropriate personalized filters by choosing the best feature set for each user. An overview of proposed method, and evaluation results are described.

  4. An Extreme Learning Machine Based on the Mixed Kernel Function of Triangular Kernel and Generalized Hermite Dirichlet Kernel

    Directory of Open Access Journals (Sweden)

    Senyue Zhang

    2016-01-01

    Full Text Available According to the characteristics that the kernel function of extreme learning machine (ELM and its performance have a strong correlation, a novel extreme learning machine based on a generalized triangle Hermitian kernel function was proposed in this paper. First, the generalized triangle Hermitian kernel function was constructed by using the product of triangular kernel and generalized Hermite Dirichlet kernel, and the proposed kernel function was proved as a valid kernel function of extreme learning machine. Then, the learning methodology of the extreme learning machine based on the proposed kernel function was presented. The biggest advantage of the proposed kernel is its kernel parameter values only chosen in the natural numbers, which thus can greatly shorten the computational time of parameter optimization and retain more of its sample data structure information. Experiments were performed on a number of binary classification, multiclassification, and regression datasets from the UCI benchmark repository. The experiment results demonstrated that the robustness and generalization performance of the proposed method are outperformed compared to other extreme learning machines with different kernels. Furthermore, the learning speed of proposed method is faster than support vector machine (SVM methods.

  5. Figure of merit for macrouniformity based on image quality ruler evaluation and machine learning framework

    Science.gov (United States)

    Wang, Weibao; Overall, Gary; Riggs, Travis; Silveston-Keith, Rebecca; Whitney, Julie; Chiu, George; Allebach, Jan P.

    2013-01-01

    Assessment of macro-uniformity is a capability that is important for the development and manufacture of printer products. Our goal is to develop a metric that will predict macro-uniformity, as judged by human subjects, by scanning and analyzing printed pages. We consider two different machine learning frameworks for the metric: linear regression and the support vector machine. We have implemented the image quality ruler, based on the recommendations of the INCITS W1.1 macro-uniformity team. Using 12 subjects at Purdue University and 20 subjects at Lexmark, evenly balanced with respect to gender, we conducted subjective evaluations with a set of 35 uniform b/w prints from seven different printers with five levels of tint coverage. Our results suggest that the image quality ruler method provides a reliable means to assess macro-uniformity. We then defined and implemented separate features to measure graininess, mottle, large area variation, jitter, and large-scale non-uniformity. The algorithms that we used are largely based on ISO image quality standards. Finally, we used these features computed for a set of test pages and the subjects' image quality ruler assessments of these pages to train the two different predictors - one based on linear regression and the other based on the support vector machine (SVM). Using five-fold cross-validation, we confirmed the efficacy of our predictor.

  6. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.

    Science.gov (United States)

    Ain, Qurrat Ul; Aleksandrova, Antoniya; Roessler, Florian D; Ballester, Pedro J

    2015-01-01

    Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.

  7. Use of machine learning to shorten observation-based screening and diagnosis of autism

    OpenAIRE

    Wall, D P; Kosmicki, J; DeLuca, T F; Harstad, E; Fusaro, V A

    2012-01-01

    The Autism Diagnostic Observation Schedule-Generic (ADOS) is one of the most widely used instruments for behavioral evaluation of autism spectrum disorders. It is composed of four modules, each tailored for a specific group of individuals based on their language and developmental level. On average, a module takes between 30 and 60 min to deliver. We used a series of machine-learning algorithms to study the complete set of scores from Module 1 of the ADOS available at the Autism Genetic Resour...

  8. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  9. Machine Learning Based Single-Frame Super-Resolution Processing for Lensless Blood Cell Counting

    Directory of Open Access Journals (Sweden)

    Xiwei Huang

    2016-11-01

    Full Text Available A lensless blood cell counting system integrating microfluidic channel and a complementary metal oxide semiconductor (CMOS image sensor is a promising technique to miniaturize the conventional optical lens based imaging system for point-of-care testing (POCT. However, such a system has limited resolution, making it imperative to improve resolution from the system-level using super-resolution (SR processing. Yet, how to improve resolution towards better cell detection and recognition with low cost of processing resources and without degrading system throughput is still a challenge. In this article, two machine learning based single-frame SR processing types are proposed and compared for lensless blood cell counting, namely the Extreme Learning Machine based SR (ELMSR and Convolutional Neural Network based SR (CNNSR. Moreover, lensless blood cell counting prototypes using commercial CMOS image sensors and custom designed backside-illuminated CMOS image sensors are demonstrated with ELMSR and CNNSR. When one captured low-resolution lensless cell image is input, an improved high-resolution cell image will be output. The experimental results show that the cell resolution is improved by 4×, and CNNSR has 9.5% improvement over the ELMSR on resolution enhancing performance. The cell counting results also match well with a commercial flow cytometer. Such ELMSR and CNNSR therefore have the potential for efficient resolution improvement in lensless blood cell counting systems towards POCT applications.

  10. Convolutional Neural Network Based on Extreme Learning Machine for Maritime Ships Recognition in Infrared Images.

    Science.gov (United States)

    Khellal, Atmane; Ma, Hongbin; Fei, Qing

    2018-05-09

    The success of Deep Learning models, notably convolutional neural networks (CNNs), makes them the favorable solution for object recognition systems in both visible and infrared domains. However, the lack of training data in the case of maritime ships research leads to poor performance due to the problem of overfitting. In addition, the back-propagation algorithm used to train CNN is very slow and requires tuning many hyperparameters. To overcome these weaknesses, we introduce a new approach fully based on Extreme Learning Machine (ELM) to learn useful CNN features and perform a fast and accurate classification, which is suitable for infrared-based recognition systems. The proposed approach combines an ELM based learning algorithm to train CNN for discriminative features extraction and an ELM based ensemble for classification. The experimental results on VAIS dataset, which is the largest dataset of maritime ships, confirm that the proposed approach outperforms the state-of-the-art models in term of generalization performance and training speed. For instance, the proposed model is up to 950 times faster than the traditional back-propagation based training of convolutional neural networks, primarily for low-level features extraction.

  11. Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong

    Directory of Open Access Journals (Sweden)

    Jiangshe Zhang

    2017-01-01

    Full Text Available With the development of the economy and society all over the world, most metropolitan cities are experiencing elevated concentrations of ground-level air pollutants. It is urgent to predict and evaluate the concentration of air pollutants for some local environmental or health agencies. Feed-forward artificial neural networks have been widely used in the prediction of air pollutants concentration. However, there are some drawbacks, such as the low convergence rate and the local minimum. The extreme learning machine for single hidden layer feed-forward neural networks tends to provide good generalization performance at an extremely fast learning speed. The major sources of air pollutants in Hong Kong are mobile, stationary, and from trans-boundary sources. We propose predicting the concentration of air pollutants by the use of trained extreme learning machines based on the data obtained from eight air quality parameters in two monitoring stations, including Sham Shui Po and Tap Mun in Hong Kong for six years. The experimental results show that our proposed algorithm performs better on the Hong Kong data both quantitatively and qualitatively. Particularly, our algorithm shows better predictive ability, with R 2 increased and root mean square error values decreased respectively.

  12. Application of machine learning methodology for pet-based definition of lung cancer

    Science.gov (United States)

    Kerhet, A.; Small, C.; Quon, H.; Riauka, T.; Schrader, L.; Greiner, R.; Yee, D.; McEwan, A.; Roa, W.

    2010-01-01

    We applied a learning methodology framework to assist in the threshold-based segmentation of non-small-cell lung cancer (nsclc) tumours in positron-emission tomography–computed tomography (pet–ct) imaging for use in radiotherapy planning. Gated and standard free-breathing studies of two patients were independently analysed (four studies in total). Each study had a pet–ct and a treatment-planning ct image. The reference gross tumour volume (gtv) was identified by two experienced radiation oncologists who also determined reference standardized uptake value (suv) thresholds that most closely approximated the gtv contour on each slice. A set of uptake distribution-related attributes was calculated for each pet slice. A machine learning algorithm was trained on a subset of the pet slices to cope with slice-to-slice variation in the optimal suv threshold: that is, to predict the most appropriate suv threshold from the calculated attributes for each slice. The algorithm’s performance was evaluated using the remainder of the pet slices. A high degree of geometric similarity was achieved between the areas outlined by the predicted and the reference suv thresholds (Jaccard index exceeding 0.82). No significant difference was found between the gated and the free-breathing results in the same patient. In this preliminary work, we demonstrated the potential applicability of a machine learning methodology as an auxiliary tool for radiation treatment planning in nsclc. PMID:20179802

  13. Rapid tomographic reconstruction based on machine learning for time-resolved combustion diagnostics

    Science.gov (United States)

    Yu, Tao; Cai, Weiwei; Liu, Yingzheng

    2018-04-01

    Optical tomography has attracted surged research efforts recently due to the progress in both the imaging concepts and the sensor and laser technologies. The high spatial and temporal resolutions achievable by these methods provide unprecedented opportunity for diagnosis of complicated turbulent combustion. However, due to the high data throughput and the inefficiency of the prevailing iterative methods, the tomographic reconstructions which are typically conducted off-line are computationally formidable. In this work, we propose an efficient inversion method based on a machine learning algorithm, which can extract useful information from the previous reconstructions and build efficient neural networks to serve as a surrogate model to rapidly predict the reconstructions. Extreme learning machine is cited here as an example for demonstrative purpose simply due to its ease of implementation, fast learning speed, and good generalization performance. Extensive numerical studies were performed, and the results show that the new method can dramatically reduce the computational time compared with the classical iterative methods. This technique is expected to be an alternative to existing methods when sufficient training data are available. Although this work is discussed under the context of tomographic absorption spectroscopy, we expect it to be useful also to other high speed tomographic modalities such as volumetric laser-induced fluorescence and tomographic laser-induced incandescence which have been demonstrated for combustion diagnostics.

  14. Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong.

    Science.gov (United States)

    Zhang, Jiangshe; Ding, Weifu

    2017-01-24

    With the development of the economy and society all over the world, most metropolitan cities are experiencing elevated concentrations of ground-level air pollutants. It is urgent to predict and evaluate the concentration of air pollutants for some local environmental or health agencies. Feed-forward artificial neural networks have been widely used in the prediction of air pollutants concentration. However, there are some drawbacks, such as the low convergence rate and the local minimum. The extreme learning machine for single hidden layer feed-forward neural networks tends to provide good generalization performance at an extremely fast learning speed. The major sources of air pollutants in Hong Kong are mobile, stationary, and from trans-boundary sources. We propose predicting the concentration of air pollutants by the use of trained extreme learning machines based on the data obtained from eight air quality parameters in two monitoring stations, including Sham Shui Po and Tap Mun in Hong Kong for six years. The experimental results show that our proposed algorithm performs better on the Hong Kong data both quantitatively and qualitatively. Particularly, our algorithm shows better predictive ability, with R 2 increased and root mean square error values decreased respectively.

  15. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

    Science.gov (United States)

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-06-19

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.

  16. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  17. Identifying seizure onset zone from electrocorticographic recordings: A machine learning approach based on phase locking value.

    Science.gov (United States)

    Elahian, Bahareh; Yeasin, Mohammed; Mudigoudar, Basanagoud; Wheless, James W; Babajani-Feremi, Abbas

    2017-10-01

    Using a novel technique based on phase locking value (PLV), we investigated the potential for features extracted from electrocorticographic (ECoG) recordings to serve as biomarkers to identify the seizure onset zone (SOZ). We computed the PLV between the phase of the amplitude of high gamma activity (80-150Hz) and the phase of lower frequency rhythms (4-30Hz) from ECoG recordings obtained from 10 patients with epilepsy (21 seizures). We extracted five features from the PLV and used a machine learning approach based on logistic regression to build a model that classifies electrodes as SOZ or non-SOZ. More than 96% of electrodes identified as the SOZ by our algorithm were within the resected area in six seizure-free patients. In four non-seizure-free patients, more than 31% of the identified SOZ electrodes by our algorithm were outside the resected area. In addition, we observed that the seizure outcome in non-seizure-free patients correlated with the number of non-resected SOZ electrodes identified by our algorithm. This machine learning approach, based on features extracted from the PLV, effectively identified electrodes within the SOZ. The approach has the potential to assist clinicians in surgical decision-making when pre-surgical intracranial recordings are utilized. Copyright © 2017 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.

  18. Gamma/hadron segregation for a ground based imaging atmospheric Cherenkov telescope using machine learning methods: Random Forest leads

    International Nuclear Information System (INIS)

    Sharma Mradul; Koul Maharaj Krishna; Mitra Abhas; Nayak Jitadeepa; Bose Smarajit

    2014-01-01

    A detailed case study of γ-hadron segregation for a ground based atmospheric Cherenkov telescope is presented. We have evaluated and compared various supervised machine learning methods such as the Random Forest method, Artificial Neural Network, Linear Discriminant method, Naive Bayes Classifiers, Support Vector Machines as well as the conventional dynamic supercut method by simulating triggering events with the Monte Carlo method and applied the results to a Cherenkov telescope. It is demonstrated that the Random Forest method is the most sensitive machine learning method for γ-hadron segregation. (research papers)

  19. Web-based newborn screening system for metabolic diseases: machine learning versus clinicians.

    Science.gov (United States)

    Chen, Wei-Hsin; Hsieh, Sheau-Ling; Hsu, Kai-Ping; Chen, Han-Ping; Su, Xing-Yu; Tseng, Yi-Ju; Chien, Yin-Hsiu; Hwu, Wuh-Liang; Lai, Feipei

    2013-05-23

    A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as

  20. Parameters optimization for wavelet denoising based on normalized spectral angle and threshold constraint machine learning

    Science.gov (United States)

    Li, Hao; Ma, Yong; Liang, Kun; Tian, Yong; Wang, Rui

    2012-01-01

    Wavelet parameters (e.g., wavelet type, level of decomposition) affect the performance of the wavelet denoising algorithm in hyperspectral applications. Current studies select the best wavelet parameters for a single spectral curve by comparing similarity criteria such as spectral angle (SA). However, the method to find the best parameters for a spectral library that contains multiple spectra has not been studied. In this paper, a criterion named normalized spectral angle (NSA) is proposed. By comparing NSA, the best combination of parameters for a spectral library can be selected. Moreover, a fast algorithm based on threshold constraint and machine learning is developed to reduce the time of a full search. After several iterations of learning, the combination of parameters that constantly surpasses a threshold is selected. The experiments proved that by using the NSA criterion, the SA values decreased significantly, and the fast algorithm could save 80% time consumption, while the denoising performance was not obviously impaired.

  1. Fingerprint-Based Machine Learning Approach to Identify Potent and Selective 5-HT2BR Ligands

    Directory of Open Access Journals (Sweden)

    Krzysztof Rataj

    2018-05-01

    Full Text Available The identification of subtype-selective GPCR (G-protein coupled receptor ligands is a challenging task. In this study, we developed a computational protocol to find compounds with 5-HT2BR versus 5-HT1BR selectivity. Our approach employs the hierarchical combination of machine learning methods, docking, and multiple scoring methods. First, we applied machine learning tools to filter a large database of druglike compounds by the new Neighbouring Substructures Fingerprint (NSFP. This two-dimensional fingerprint contains information on the connectivity of the substructural features of a compound. Preselected subsets of the database were then subjected to docking calculations. The main indicators of compounds’ selectivity were their different interactions with the secondary binding pockets of both target proteins, while binding modes within the orthosteric binding pocket were preserved. The combined methodology of ligand-based and structure-based methods was validated prospectively, resulting in the identification of hits with nanomolar affinity and ten-fold to ten thousand-fold selectivities.

  2. Machine learning-based prediction of adverse drug effects: An example of seizure-inducing compounds

    Directory of Open Access Journals (Sweden)

    Mengxuan Gao

    2017-02-01

    Full Text Available Various biological factors have been implicated in convulsive seizures, involving side effects of drugs. For the preclinical safety assessment of drug development, it is difficult to predict seizure-inducing side effects. Here, we introduced a machine learning-based in vitro system designed to detect seizure-inducing side effects. We recorded local field potentials from the CA1 alveus in acute mouse neocortico-hippocampal slices, while 14 drugs were bath-perfused at 5 different concentrations each. For each experimental condition, we collected seizure-like neuronal activity and merged their waveforms as one graphic image, which was further converted into a feature vector using Caffe, an open framework for deep learning. In the space of the first two principal components, the support vector machine completely separated the vectors (i.e., doses of individual drugs that induced seizure-like events and identified diphenhydramine, enoxacin, strychnine and theophylline as “seizure-inducing” drugs, which indeed were reported to induce seizures in clinical situations. Thus, this artificial intelligence-based classification may provide a new platform to detect the seizure-inducing side effects of preclinical drugs.

  3. Machine Learning and Applied Linguistics

    OpenAIRE

    Vajjala, Sowmya

    2018-01-01

    This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.

  4. The Simulation Computer Based Learning (SCBL) for Short Circuit Multi Machine Power System Analysis

    Science.gov (United States)

    Rahmaniar; Putri, Maharani

    2018-03-01

    Strengthening Competitiveness of human resources become the reply of college as a conductor of high fomal education. Electrical Engineering Program UNPAB (Prodi TE UNPAB) as one of the department of electrical engineering that manages the field of electrical engineering expertise has a very important part in preparing human resources (HR), Which is required by where graduates are produced by DE UNPAB, Is expected to be able to compete globally, especially related to the implementation of Asean Economic Community (AEC) which requires the active participation of graduates with competence and quality of human resource competitiveness. Preparation of HR formation Competitive is done with the various strategies contained in the Seven (7) Higher Education Standard, one part of which is the implementation of teaching and learning process in Electrical system analysis with short circuit analysis (SCA) This course is a course The core of which is the basis for the competencies of other subjects in the advanced semester at Development of Computer Based Learning model (CBL) is done in the learning of interference analysis of multi-machine short circuit which includes: (a) Short-circuit One phase, (B) Two-phase Short Circuit Disruption, (c) Ground Short Circuit Disruption, (d) Short Circuit Disruption One Ground Floor Development of CBL learning model for Electrical System Analysis course provides space for students to be more active In learning in solving complex (complicated) problems, so it is thrilling Ilkan flexibility of student learning how to actively solve the problem of short-circuit analysis and to form the active participation of students in learning (Student Center Learning, in the course of electrical power system analysis.

  5. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  6. Considerations upon the Machine Learning Technologies

    OpenAIRE

    Alin Munteanu; Cristina Ofelia Sofran

    2006-01-01

    Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  7. Considerations upon the Machine Learning Technologies

    Directory of Open Access Journals (Sweden)

    Alin Munteanu

    2006-01-01

    Full Text Available Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to “learn”. Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach.

  8. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  9. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  10. Machine learning and medical imaging

    CERN Document Server

    Shen, Dinggang; Sabuncu, Mert

    2016-01-01

    Machine Learning and Medical Imaging presents state-of- the-art machine learning methods in medical image analysis. It first summarizes cutting-edge machine learning algorithms in medical imaging, including not only classical probabilistic modeling and learning methods, but also recent breakthroughs in deep learning, sparse representation/coding, and big data hashing. In the second part leading research groups around the world present a wide spectrum of machine learning methods with application to different medical imaging modalities, clinical domains, and organs. The biomedical imaging modalities include ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), histology, and microscopy images. The targeted organs span the lung, liver, brain, and prostate, while there is also a treatment of examining genetic associations. Machine Learning and Medical Imaging is an ideal reference for medical imaging researchers, industry scientists and engineers, advanced undergraduate and graduate students, a...

  11. Mixed Integer Linear Programming based machine learning approach identifies regulators of telomerase in yeast.

    Science.gov (United States)

    Poos, Alexandra M; Maicher, André; Dieckmann, Anna K; Oswald, Marcus; Eils, Roland; Kupiec, Martin; Luke, Brian; König, Rainer

    2016-06-02

    Understanding telomere length maintenance mechanisms is central in cancer biology as their dysregulation is one of the hallmarks for immortalization of cancer cells. Important for this well-balanced control is the transcriptional regulation of the telomerase genes. We integrated Mixed Integer Linear Programming models into a comparative machine learning based approach to identify regulatory interactions that best explain the discrepancy of telomerase transcript levels in yeast mutants with deleted regulators showing aberrant telomere length, when compared to mutants with normal telomere length. We uncover novel regulators of telomerase expression, several of which affect histone levels or modifications. In particular, our results point to the transcription factors Sum1, Hst1 and Srb2 as being important for the regulation of EST1 transcription, and we validated the effect of Sum1 experimentally. We compiled our machine learning method leading to a user friendly package for R which can straightforwardly be applied to similar problems integrating gene regulator binding information and expression profiles of samples of e.g. different phenotypes, diseases or treatments. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Ricardo Andres Pizarro

    2016-12-01

    Full Text Available High-resolution three-dimensional magnetic resonance imaging (3D-MRI is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM algorithm in the quality assessment of structural brain images, using global and region of interest (ROI automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  13. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm.

    Science.gov (United States)

    Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S

    2016-01-01

    High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  14. A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics.

    Science.gov (United States)

    Deng, Wan-Yu; Bai, Zuo; Huang, Guang-Bin; Zheng, Qing-Hua

    2016-05-01

    Big dimensional data is a growing trend that is emerging in many real world contexts, extending from web mining, gene expression analysis, protein-protein interaction to high-frequency financial data. Nowadays, there is a growing consensus that the increasing dimensionality poses impeding effects on the performances of classifiers, which is termed as the "peaking phenomenon" in the field of machine intelligence. To address the issue, dimensionality reduction is commonly employed as a preprocessing step on the Big dimensional data before building the classifiers. In this paper, we propose an Extreme Learning Machine (ELM) approach for large-scale data analytic. In contrast to existing approaches, we embed hidden nodes that are designed using singular value decomposition (SVD) into the classical ELM. These SVD nodes in the hidden layer are shown to capture the underlying characteristics of the Big dimensional data well, exhibiting excellent generalization performances. The drawback of using SVD on the entire dataset, however, is the high computational complexity involved. To address this, a fast divide and conquer approximation scheme is introduced to maintain computational tractability on high volume data. The resultant algorithm proposed is labeled here as Fast Singular Value Decomposition-Hidden-nodes based Extreme Learning Machine or FSVD-H-ELM in short. In FSVD-H-ELM, instead of identifying the SVD hidden nodes directly from the entire dataset, SVD hidden nodes are derived from multiple random subsets of data sampled from the original dataset. Comprehensive experiments and comparisons are conducted to assess the FSVD-H-ELM against other state-of-the-art algorithms. The results obtained demonstrated the superior generalization performance and efficiency of the FSVD-H-ELM. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Machine learning topological states

    Science.gov (United States)

    Deng, Dong-Ling; Li, Xiaopeng; Das Sarma, S.

    2017-11-01

    Artificial neural networks and machine learning have now reached a new era after several decades of improvement where applications are to explode in many fields of science, industry, and technology. Here, we use artificial neural networks to study an intriguing phenomenon in quantum physics—the topological phases of matter. We find that certain topological states, either symmetry-protected or with intrinsic topological order, can be represented with classical artificial neural networks. This is demonstrated by using three concrete spin systems, the one-dimensional (1D) symmetry-protected topological cluster state and the 2D and 3D toric code states with intrinsic topological orders. For all three cases, we show rigorously that the topological ground states can be represented by short-range neural networks in an exact and efficient fashion—the required number of hidden neurons is as small as the number of physical spins and the number of parameters scales only linearly with the system size. For the 2D toric-code model, we find that the proposed short-range neural networks can describe the excited states with Abelian anyons and their nontrivial mutual statistics as well. In addition, by using reinforcement learning we show that neural networks are capable of finding the topological ground states of nonintegrable Hamiltonians with strong interactions and studying their topological phase transitions. Our results demonstrate explicitly the exceptional power of neural networks in describing topological quantum states, and at the same time provide valuable guidance to machine learning of topological phases in generic lattice models.

  16. Adaptive Machine Aids to Learning.

    Science.gov (United States)

    Starkweather, John A.

    With emphasis on man-machine relationships and on machine evolution, computer-assisted instruction (CAI) is examined in this paper. The discussion includes the background of machine assistance to learning, the current status of CAI, directions of development, the development of criteria for successful instruction, meeting the needs of users,…

  17. Machine learning based cloud mask algorithm driven by radiative transfer modeling

    Science.gov (United States)

    Chen, N.; Li, W.; Tanikawa, T.; Hori, M.; Shimada, R.; Stamnes, K. H.

    2017-12-01

    Cloud detection is a critically important first step required to derive many satellite data products. Traditional threshold based cloud mask algorithms require a complicated design process and fine tuning for each sensor, and have difficulty over snow/ice covered areas. With the advance of computational power and machine learning techniques, we have developed a new algorithm based on a neural network classifier driven by extensive radiative transfer modeling. Statistical validation results obtained by using collocated CALIOP and MODIS data show that its performance is consistent over different ecosystems and significantly better than the MODIS Cloud Mask (MOD35 C6) during the winter seasons over mid-latitude snow covered areas. Simulations using a reduced number of satellite channels also show satisfactory results, indicating its flexibility to be configured for different sensors.

  18. Alchemical and structural distribution based representation for universal quantum machine learning

    Science.gov (United States)

    Faber, Felix A.; Christensen, Anders S.; Huang, Bing; von Lilienfeld, O. Anatole

    2018-06-01

    We introduce a representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space. The representation is based on Gaussian distribution functions, scaled by power laws and explicitly accounting for structural as well as elemental degrees of freedom. The elemental components help us to lower the QML model's learning curve, and, through interpolation across the periodic table, even enable "alchemical extrapolation" to covalent bonding between elements not part of training. This point is demonstrated for the prediction of covalent binding in single, double, and triple bonds among main-group elements as well as for atomization energies in organic molecules. We present numerical evidence that resulting QML energy models, after training on a few thousand random training instances, reach chemical accuracy for out-of-sample compounds. Compound datasets studied include thousands of structurally and compositionally diverse organic molecules, non-covalently bonded protein side-chains, (H2O)40-clusters, and crystalline solids. Learning curves for QML models also indicate competitive predictive power for various other electronic ground state properties of organic molecules, calculated with hybrid density functional theory, including polarizability, heat-capacity, HOMO-LUMO eigenvalues and gap, zero point vibrational energy, dipole moment, and highest vibrational fundamental frequency.

  19. Novel Machine Learning-Based Techniques for Efficient Resource Allocation in Next Generation Wireless Networks

    KAUST Repository

    AlQuerm, Ismail A.

    2018-02-21

    resources management in diverse wireless networks. The core operation of the proposed architecture is decision-making for resource allocation and system’s parameters adaptation. Thus, we develop the decision-making mechanism using different artificial intelligence techniques, evaluate the performance achieved and determine the tradeoff of using one technique over the others. The techniques include decision-trees, genetic algorithm, hybrid engine based on decision-trees and case based reasoning, and supervised engine with machine learning contribution to determine the ultimate technique that suits the current environment conditions. All the proposed techniques are evaluated using testbed implementation in different topologies and scenarios. LTE networks have been considered as a potential environment for demonstration of our proposed cognitive based resource allocation techniques as they lack of radio resource management. In addition, we explore the use of enhanced online learning to perform efficient resource allocation in the upcoming 5G networks to maximize energy efficiency and data rate. The considered 5G structures are heterogeneous multi-tier networks with device to device communication and heterogeneous cloud radio access networks. We propose power and resource blocks allocation schemes to maximize energy efficiency and data rate in heterogeneous 5G networks. Moreover, traffic offloading from large cells to small cells in 5G heterogeneous networks is investigated and an online learning based traffic offloading strategy is developed to enhance energy efficiency. Energy efficiency problem in heterogeneous cloud radio access networks is tackled using online learning in centralized and distributed fashions. The proposed online learning comprises improvement features that reduce the algorithms complexities and enhance the performance achieved.

  20. Machine Learning for Robotic Vision

    OpenAIRE

    Drummond, Tom

    2018-01-01

    Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.

  1. A Novel Machine Learning Strategy Based on Two-Dimensional Numerical Models in Financial Engineering

    Directory of Open Access Journals (Sweden)

    Qingzhen Xu

    2013-01-01

    Full Text Available Machine learning is the most commonly used technique to address larger and more complex tasks by analyzing the most relevant information already present in databases. In order to better predict the future trend of the index, this paper proposes a two-dimensional numerical model for machine learning to simulate major U.S. stock market index and uses a nonlinear implicit finite-difference method to find numerical solutions of the two-dimensional simulation model. The proposed machine learning method uses partial differential equations to predict the stock market and can be extensively used to accelerate large-scale data processing on the history database. The experimental results show that the proposed algorithm reduces the prediction error and improves forecasting precision.

  2. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  3. Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review

    Science.gov (United States)

    Zhang, Xue; Acencio, Marcio Luis; Lemke, Ney

    2016-01-01

    Essential proteins/genes are indispensable to the survival or reproduction of an organism, and the deletion of such essential proteins will result in lethality or infertility. The identification of essential genes is very important not only for understanding the minimal requirements for survival of an organism, but also for finding human disease genes and new drug targets. Experimental methods for identifying essential genes are costly, time-consuming, and laborious. With the accumulation of sequenced genomes data and high-throughput experimental data, many computational methods for identifying essential proteins are proposed, which are useful complements to experimental methods. In this review, we show the state-of-the-art methods for identifying essential genes and proteins based on machine learning and network topological features, point out the progress and limitations of current methods, and discuss the challenges and directions for further research. PMID:27014079

  4. Big data analytics for early detection of breast cancer based on machine learning

    Science.gov (United States)

    Ivanova, Desislava

    2017-12-01

    This paper presents the concept and the modern advances in personalized medicine that rely on technology and review the existing tools for early detection of breast cancer. The breast cancer types and distribution worldwide is discussed. It is spent time to explain the importance of identifying the normality and to specify the main classes in breast cancer, benign or malignant. The main purpose of the paper is to propose a conceptual model for early detection of breast cancer based on machine learning for processing and analysis of medical big dataand further knowledge discovery for personalized treatment. The proposed conceptual model is realized by using Naive Bayes classifier. The software is written in python programming language and for the experiments the Wisconsin breast cancer database is used. Finally, the experimental results are presented and discussed.

  5. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  6. Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy

    Science.gov (United States)

    Gueth, P.; Dauvergne, D.; Freud, N.; Létang, J. M.; Ray, C.; Testa, E.; Sarrut, D.

    2013-07-01

    Online dose monitoring in proton therapy is currently being investigated with prompt-gamma (PG) devices. PG emission was shown to be correlated with dose deposition. This relationship is mostly unknown under real conditions. We propose a machine learning approach based on simulations to create optimized treatment-specific classifiers that detect discrepancies between planned and delivered dose. Simulations were performed with the Monte-Carlo platform Gate/Geant4 for a spot-scanning proton therapy treatment and a PG camera prototype currently under investigation. The method first builds a learning set of perturbed situations corresponding to a range of patient translation. This set is then used to train a combined classifier using distal falloff and registered correlation measures. Classifier performances were evaluated using receiver operating characteristic curves and maximum associated specificity and sensitivity. A leave-one-out study showed that it is possible to detect discrepancies of 5 mm with specificity and sensitivity of 85% whereas using only distal falloff decreases the sensitivity down to 77% on the same data set. The proposed method could help to evaluate performance and to optimize the design of PG monitoring devices. It is generic: other learning sets of deviations, other measures and other types of classifiers could be studied to potentially reach better performance. At the moment, the main limitation lies in the computation time needed to perform the simulations.

  7. Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy

    International Nuclear Information System (INIS)

    Gueth, P; Freud, N; Létang, J M; Sarrut, D; Dauvergne, D; Ray, C; Testa, E

    2013-01-01

    Online dose monitoring in proton therapy is currently being investigated with prompt-gamma (PG) devices. PG emission was shown to be correlated with dose deposition. This relationship is mostly unknown under real conditions. We propose a machine learning approach based on simulations to create optimized treatment-specific classifiers that detect discrepancies between planned and delivered dose. Simulations were performed with the Monte-Carlo platform Gate/Geant4 for a spot-scanning proton therapy treatment and a PG camera prototype currently under investigation. The method first builds a learning set of perturbed situations corresponding to a range of patient translation. This set is then used to train a combined classifier using distal falloff and registered correlation measures. Classifier performances were evaluated using receiver operating characteristic curves and maximum associated specificity and sensitivity. A leave-one-out study showed that it is possible to detect discrepancies of 5 mm with specificity and sensitivity of 85% whereas using only distal falloff decreases the sensitivity down to 77% on the same data set. The proposed method could help to evaluate performance and to optimize the design of PG monitoring devices. It is generic: other learning sets of deviations, other measures and other types of classifiers could be studied to potentially reach better performance. At the moment, the main limitation lies in the computation time needed to perform the simulations. (paper)

  8. Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

    Science.gov (United States)

    Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

    2018-02-01

    We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.

  9. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.

    Directory of Open Access Journals (Sweden)

    Lei Jia

    Full Text Available Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG and melting temperature change (dTm were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.

  10. Post hoc support vector machine learning for impedimetric biosensors based on weak protein-ligand interactions.

    Science.gov (United States)

    Rong, Y; Padron, A V; Hagerty, K J; Nelson, N; Chi, S; Keyhani, N O; Katz, J; Datta, S P A; Gomes, C; McLamore, E S

    2018-04-30

    Impedimetric biosensors for measuring small molecules based on weak/transient interactions between bioreceptors and target analytes are a challenge for detection electronics, particularly in field studies or in the analysis of complex matrices. Protein-ligand binding sensors have enormous potential for biosensing, but achieving accuracy in complex solutions is a major challenge. There is a need for simple post hoc analytical tools that are not computationally expensive, yet provide near real time feedback on data derived from impedance spectra. Here, we show the use of a simple, open source support vector machine learning algorithm for analyzing impedimetric data in lieu of using equivalent circuit analysis. We demonstrate two different protein-based biosensors to show that the tool can be used for various applications. We conclude with a mobile phone-based demonstration focused on the measurement of acetone, an important biomarker related to the onset of diabetic ketoacidosis. In all conditions tested, the open source classifier was capable of performing as well as, or better, than the equivalent circuit analysis for characterizing weak/transient interactions between a model ligand (acetone) and a small chemosensory protein derived from the tsetse fly. In addition, the tool has a low computational requirement, facilitating use for mobile acquisition systems such as mobile phones. The protocol is deployed through Jupyter notebook (an open source computing environment available for mobile phone, tablet or computer use) and the code was written in Python. For each of the applications, we provide step-by-step instructions in English, Spanish, Mandarin and Portuguese to facilitate widespread use. All codes were based on scikit-learn, an open source software machine learning library in the Python language, and were processed in Jupyter notebook, an open-source web application for Python. The tool can easily be integrated with the mobile biosensor equipment for rapid

  11. A New Perspective for the Training Assessment: Machine Learning-Based Neurometric for Augmented User's Evaluation

    Directory of Open Access Journals (Sweden)

    Gianluca Borghini

    2017-06-01

    Full Text Available Inappropriate training assessment might have either high social costs and economic impacts, especially in high risks categories, such as Pilots, Air Traffic Controllers, or Surgeons. One of the current limitations of the standard training assessment procedures is the lack of information about the amount of cognitive resources requested by the user for the correct execution of the proposed task. In fact, even if the task is accomplished achieving the maximum performance, by the standard training assessment methods, it would not be possible to gather and evaluate information about cognitive resources available for dealing with unexpected events or emergency conditions. Therefore, a metric based on the brain activity (neurometric able to provide the Instructor such a kind of information should be very important. As a first step in this direction, the Electroencephalogram (EEG and the performance of 10 participants were collected along a training period of 3 weeks, while learning the execution of a new task. Specific indexes have been estimated from the behavioral and EEG signal to objectively assess the users' training progress. Furthermore, we proposed a neurometric based on a machine learning algorithm to quantify the user's training level within each session by considering the level of task execution, and both the behavioral and cognitive stabilities between consecutive sessions. The results demonstrated that the proposed methodology and neurometric could quantify and track the users' progresses, and provide the Instructor information for a more objective evaluation and better tailoring of training programs.

  12. A New Perspective for the Training Assessment: Machine Learning-Based Neurometric for Augmented User's Evaluation.

    Science.gov (United States)

    Borghini, Gianluca; Aricò, Pietro; Di Flumeri, Gianluca; Sciaraffa, Nicolina; Colosimo, Alfredo; Herrero, Maria-Trinidad; Bezerianos, Anastasios; Thakor, Nitish V; Babiloni, Fabio

    2017-01-01

    Inappropriate training assessment might have either high social costs and economic impacts, especially in high risks categories, such as Pilots, Air Traffic Controllers, or Surgeons. One of the current limitations of the standard training assessment procedures is the lack of information about the amount of cognitive resources requested by the user for the correct execution of the proposed task. In fact, even if the task is accomplished achieving the maximum performance, by the standard training assessment methods, it would not be possible to gather and evaluate information about cognitive resources available for dealing with unexpected events or emergency conditions. Therefore, a metric based on the brain activity ( neurometric ) able to provide the Instructor such a kind of information should be very important. As a first step in this direction, the Electroencephalogram (EEG) and the performance of 10 participants were collected along a training period of 3 weeks, while learning the execution of a new task. Specific indexes have been estimated from the behavioral and EEG signal to objectively assess the users' training progress. Furthermore, we proposed a neurometric based on a machine learning algorithm to quantify the user's training level within each session by considering the level of task execution, and both the behavioral and cognitive stabilities between consecutive sessions. The results demonstrated that the proposed methodology and neurometric could quantify and track the users' progresses, and provide the Instructor information for a more objective evaluation and better tailoring of training programs.

  13. Estimation of the applicability domain of kernel-based machine learning models for virtual screening

    Directory of Open Access Journals (Sweden)

    Fechner Nikolas

    2010-03-01

    Full Text Available Abstract Background The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model. Results We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening

  14. Estimation of the applicability domain of kernel-based machine learning models for virtual screening.

    Science.gov (United States)

    Fechner, Nikolas; Jahn, Andreas; Hinselmann, Georg; Zell, Andreas

    2010-03-11

    The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model. We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening. The proposed applicability domain formulations

  15. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules.

    Science.gov (United States)

    Vyas, Renu; Bapat, Sanket; Jain, Esha; Tambe, Sanjeev S; Karthikeyan, Muthukumarasamy; Kulkarni, Bhaskar D

    2015-01-01

    The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX and NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network (ANN), k nearest neighbor (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree--(DT) and Random Forest--(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82%) and toxicophoric (70.64%). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic were studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.

  16. Machine Learning-based Virtual Screening and Its Applications to Alzheimer's Drug Discovery: A Review.

    Science.gov (United States)

    Carpenter, Kristy A; Huang, Xudong

    2018-06-07

    Virtual Screening (VS) has emerged as an important tool in the drug development process, as it conducts efficient in silico searches over millions of compounds, ultimately increasing yields of potential drug leads. As a subset of Artificial Intelligence (AI), Machine Learning (ML) is a powerful way of conducting VS for drug leads. ML for VS generally involves assembling a filtered training set of compounds, comprised of known actives and inactives. After training the model, it is validated and, if sufficiently accurate, used on previously unseen databases to screen for novel compounds with desired drug target binding activity. The study aims to review ML-based methods used for VS and applications to Alzheimer's disease (AD) drug discovery. To update the current knowledge on ML for VS, we review thorough backgrounds, explanations, and VS applications of the following ML techniques: Naïve Bayes (NB), k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Random Forests (RF), and Artificial Neural Networks (ANN). All techniques have found success in VS, but the future of VS is likely to lean more heavily toward the use of neural networks - and more specifically, Convolutional Neural Networks (CNN), which are a subset of ANN that utilize convolution. We additionally conceptualize a work flow for conducting ML-based VS for potential therapeutics of for AD, a complex neurodegenerative disease with no known cure and prevention. This both serves as an example of how to apply the concepts introduced earlier in the review and as a potential workflow for future implementation. Different ML techniques are powerful tools for VS, and they have advantages and disadvantages albeit. ML-based VS can be applied to AD drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    Science.gov (United States)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2016-03-01

    Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.

  18. Machine Learning an algorithmic perspective

    CERN Document Server

    Marsland, Stephen

    2009-01-01

    Traditional books on machine learning can be divided into two groups - those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement le

  19. Quasilinear Extreme Learning Machine Model Based Internal Model Control for Nonlinear Process

    Directory of Open Access Journals (Sweden)

    Dazi Li

    2015-01-01

    Full Text Available A new strategy for internal model control (IMC is proposed using a regression algorithm of quasilinear model with extreme learning machine (QL-ELM. Aimed at the chemical process with nonlinearity, the learning process of the internal model and inverse model is derived. The proposed QL-ELM is constructed as a linear ARX model with a complicated nonlinear coefficient. It shows some good approximation ability and fast convergence. The complicated coefficients are separated into two parts. The linear part is determined by recursive least square (RLS, while the nonlinear part is identified through extreme learning machine. The parameters of linear part and the output weights of ELM are estimated iteratively. The proposed internal model control is applied to CSTR process. The effectiveness and accuracy of the proposed method are extensively verified through numerical results.

  20. Support vector machine learning-based fMRI data group analysis.

    Science.gov (United States)

    Wang, Ze; Childress, Anna R; Wang, Jiongjiong; Detre, John A

    2007-07-15

    To explore the multivariate nature of fMRI data and to consider the inter-subject brain response discrepancies, a multivariate and brain response model-free method is fundamentally required. Two such methods are presented in this paper by integrating a machine learning algorithm, the support vector machine (SVM), and the random effect model. Without any brain response modeling, SVM was used to extract a whole brain spatial discriminance map (SDM), representing the brain response difference between the contrasted experimental conditions. Population inference was then obtained through the random effect analysis (RFX) or permutation testing (PMU) on the individual subjects' SDMs. Applied to arterial spin labeling (ASL) perfusion fMRI data, SDM RFX yielded lower false-positive rates in the null hypothesis test and higher detection sensitivity for synthetic activations with varying cluster size and activation strengths, compared to the univariate general linear model (GLM)-based RFX. For a sensory-motor ASL fMRI study, both SDM RFX and SDM PMU yielded similar activation patterns to GLM RFX and GLM PMU, respectively, but with higher t values and cluster extensions at the same significance level. Capitalizing on the absence of temporal noise correlation in ASL data, this study also incorporated PMU in the individual-level GLM and SVM analyses accompanied by group-level analysis through RFX or group-level PMU. Providing inferences on the probability of being activated or deactivated at each voxel, these individual-level PMU-based group analysis methods can be used to threshold the analysis results of GLM RFX, SDM RFX or SDM PMU.

  1. GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.

    Science.gov (United States)

    Maciukiewicz, Malgorzata; Marshe, Victoria S; Hauschild, Anne-Christin; Foster, Jane A; Rotzinger, Susan; Kennedy, James L; Kennedy, Sidney H; Müller, Daniel J; Geraci, Joseph

    2018-04-01

    Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction. Copyright © 2017. Published by Elsevier Ltd.

  2. A machine learning framework involving EEG-based functional connectivity to diagnose major depressive disorder (MDD).

    Science.gov (United States)

    Mumtaz, Wajid; Ali, Syed Saad Azhar; Yasin, Mohd Azhar Mohd; Malik, Aamir Saeed

    2018-02-01

    Major depressive disorder (MDD), a debilitating mental illness, could cause functional disabilities and could become a social problem. An accurate and early diagnosis for depression could become challenging. This paper proposed a machine learning framework involving EEG-derived synchronization likelihood (SL) features as input data for automatic diagnosis of MDD. It was hypothesized that EEG-based SL features could discriminate MDD patients and healthy controls with an acceptable accuracy better than measures such as interhemispheric coherence and mutual information. In this work, classification models such as support vector machine (SVM), logistic regression (LR) and Naïve Bayesian (NB) were employed to model relationship between the EEG features and the study groups (MDD patient and healthy controls) and ultimately achieved discrimination of study participants. The results indicated that the classification rates were better than chance. More specifically, the study resulted into SVM classification accuracy = 98%, sensitivity = 99.9%, specificity = 95% and f-measure = 0.97; LR classification accuracy = 91.7%, sensitivity = 86.66%, specificity = 96.6% and f-measure = 0.90; NB classification accuracy = 93.6%, sensitivity = 100%, specificity = 87.9% and f-measure = 0.95. In conclusion, SL could be a promising method for diagnosing depression. The findings could be generalized to develop a robust CAD-based tool that may help for clinical purposes.

  3. Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data

    Directory of Open Access Journals (Sweden)

    Simpson David

    2006-03-01

    Full Text Available Abstract Background Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. Results Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE, this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re

  4. Prediction of recombinant protein overexpression in Escherichia coli using a machine learning based model (RPOLP).

    Science.gov (United States)

    Habibi, Narjeskhatoon; Norouzi, Alireza; Mohd Hashim, Siti Z; Shamsir, Mohd Shahir; Samian, Razip

    2015-11-01

    Recombinant protein overexpression, an important biotechnological process, is ruled by complex biological rules which are mostly unknown, is in need of an intelligent algorithm so as to avoid resource-intensive lab-based trial and error experiments in order to determine the expression level of the recombinant protein. The purpose of this study is to propose a predictive model to estimate the level of recombinant protein overexpression for the first time in the literature using a machine learning approach based on the sequence, expression vector, and expression host. The expression host was confined to Escherichia coli which is the most popular bacterial host to overexpress recombinant proteins. To provide a handle to the problem, the overexpression level was categorized as low, medium and high. A set of features which were likely to affect the overexpression level was generated based on the known facts (e.g. gene length) and knowledge gathered from related literature. Then, a representative sub-set of features generated in the previous objective was determined using feature selection techniques. Finally a predictive model was developed using random forest classifier which was able to adequately classify the multi-class imbalanced small dataset constructed. The result showed that the predictive model provided a promising accuracy of 80% on average, in estimating the overexpression level of a recombinant protein. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. A Machine-Learning and Filtering Based Data Assimilation Framework for Geologic Carbon Sequestration Monitoring Optimization

    Science.gov (United States)

    Chen, B.; Harp, D. R.; Lin, Y.; Keating, E. H.; Pawar, R.

    2017-12-01

    Monitoring is a crucial aspect of geologic carbon sequestration (GCS) risk management. It has gained importance as a means to ensure CO2 is safely and permanently stored underground throughout the lifecycle of a GCS project. Three issues are often involved in a monitoring project: (i) where is the optimal location to place the monitoring well(s), (ii) what type of data (pressure, rate and/or CO2 concentration) should be measured, and (iii) What is the optimal frequency to collect the data. In order to address these important issues, a filtering-based data assimilation procedure is developed to perform the monitoring optimization. The optimal monitoring strategy is selected based on the uncertainty reduction of the objective of interest (e.g., cumulative CO2 leak) for all potential monitoring strategies. To reduce the computational cost of the filtering-based data assimilation process, two machine-learning algorithms: Support Vector Regression (SVR) and Multivariate Adaptive Regression Splines (MARS) are used to develop the computationally efficient reduced-order-models (ROMs) from full numerical simulations of CO2 and brine flow. The proposed framework for GCS monitoring optimization is demonstrated with two examples: a simple 3D synthetic case and a real field case named Rock Spring Uplift carbon storage site in Southwestern Wyoming.

  6. Tensor manifold-based extreme learning machine for 2.5-D face recognition

    Science.gov (United States)

    Chong, Lee Ying; Ong, Thian Song; Teoh, Andrew Beng Jin

    2018-01-01

    We explore the use of the Gabor regional covariance matrix (GRCM), a flexible matrix-based descriptor that embeds the Gabor features in the covariance matrix, as a 2.5-D facial descriptor and an effective means of feature fusion for 2.5-D face recognition problems. Despite its promise, matching is not a trivial problem for GRCM since it is a special instance of a symmetric positive definite (SPD) matrix that resides in non-Euclidean space as a tensor manifold. This implies that GRCM is incompatible with the existing vector-based classifiers and distance matchers. Therefore, we bridge the gap of the GRCM and extreme learning machine (ELM), a vector-based classifier for the 2.5-D face recognition problem. We put forward a tensor manifold-compliant ELM and its two variants by embedding the SPD matrix randomly into reproducing kernel Hilbert space (RKHS) via tensor kernel functions. To preserve the pair-wise distance of the embedded data, we orthogonalize the random-embedded SPD matrix. Hence, classification can be done using a simple ridge regressor, an integrated component of ELM, on the random orthogonal RKHS. Experimental results show that our proposed method is able to improve the recognition performance and further enhance the computational efficiency.

  7. Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers

    International Nuclear Information System (INIS)

    Roche-Lima, Abiel; Thulasiram, Ruppa K

    2012-01-01

    Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.

  8. Robust Machine Learning-Based Correction on Automatic Segmentation of the Cerebellum and Brainstem.

    Science.gov (United States)

    Wang, Jun Yi; Ngo, Michael M; Hessl, David; Hagerman, Randi J; Rivera, Susan M

    2016-01-01

    Automated segmentation is a useful method for studying large brain structures such as the cerebellum and brainstem. However, automated segmentation may lead to inaccuracy and/or undesirable boundary. The goal of the present study was to investigate whether SegAdapter, a machine learning-based method, is useful for automatically correcting large segmentation errors and disagreement in anatomical definition. We further assessed the robustness of the method in handling size of training set, differences in head coil usage, and amount of brain atrophy. High resolution T1-weighted images were acquired from 30 healthy controls scanned with either an 8-channel or 32-channel head coil. Ten patients, who suffered from brain atrophy because of fragile X-associated tremor/ataxia syndrome, were scanned using the 32-channel head coil. The initial segmentations of the cerebellum and brainstem were generated automatically using Freesurfer. Subsequently, Freesurfer's segmentations were both manually corrected to serve as the gold standard and automatically corrected by SegAdapter. Using only 5 scans in the training set, spatial overlap with manual segmentation in Dice coefficient improved significantly from 0.956 (for Freesurfer segmentation) to 0.978 (for SegAdapter-corrected segmentation) for the cerebellum and from 0.821 to 0.954 for the brainstem. Reducing the training set size to 2 scans only decreased the Dice coefficient ≤0.002 for the cerebellum and ≤ 0.005 for the brainstem compared to the use of training set size of 5 scans in corrective learning. The method was also robust in handling differences between the training set and the test set in head coil usage and the amount of brain atrophy, which reduced spatial overlap only by segmentation and corrective learning provides a valuable method for accurate and efficient segmentation of the cerebellum and brainstem, particularly in large-scale neuroimaging studies, and potentially for segmenting other neural regions as

  9. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  10. Machine learning for healthcare technologies

    CERN Document Server

    Clifton, David A

    2016-01-01

    This book brings together chapters on the state-of-the-art in machine learning (ML) as it applies to the development of patient-centred technologies, with a special emphasis on 'big data' and mobile data.

  11. Machine Learning via Mathematical Programming

    National Research Council Canada - National Science Library

    Mamgasarian, Olivi

    1999-01-01

    Mathematical programming approaches were applied to a variety of problems in machine learning in order to gain deeper understanding of the problems and to come up with new and more efficient computational algorithms...

  12. Arctic Sea Ice Thickness Estimation from CryoSat-2 Satellite Data Using Machine Learning-Based Lead Detection

    Directory of Open Access Journals (Sweden)

    Sanggyun Lee

    2016-08-01

    Full Text Available Satellite altimeters have been used to monitor Arctic sea ice thickness since the early 2000s. In order to estimate sea ice thickness from satellite altimeter data, leads (i.e., cracks between ice floes should first be identified for the calculation of sea ice freeboard. In this study, we proposed novel approaches for lead detection using two machine learning algorithms: decision trees and random forest. CryoSat-2 satellite data collected in March and April of 2011–2014 over the Arctic region were used to extract waveform parameters that show the characteristics of leads, ice floes and ocean, including stack standard deviation, stack skewness, stack kurtosis, pulse peakiness and backscatter sigma-0. The parameters were used to identify leads in the machine learning models. Results show that the proposed approaches, with overall accuracy >90%, produced much better performance than existing lead detection methods based on simple thresholding approaches. Sea ice thickness estimated based on the machine learning-detected leads was compared to the averaged Airborne Electromagnetic (AEM-bird data collected over two days during the CryoSat Validation experiment (CryoVex field campaign in April 2011. This comparison showed that the proposed machine learning methods had better performance (up to r = 0.83 and Root Mean Square Error (RMSE = 0.29 m compared to thickness estimation based on existing lead detection methods (RMSE = 0.86–0.93 m. Sea ice thickness based on the machine learning approaches showed a consistent decline from 2011–2013 and rebounded in 2014.

  13. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  14. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Louppe, Gilles; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu

    2012-01-01

    Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings....

  15. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  16. Machine Learning of Musical Gestures

    OpenAIRE

    Caramiaux, Baptiste; Tanaka, Atau

    2013-01-01

    We present an overview of machine learning (ML) techniques and theirapplication in interactive music and new digital instruments design. We firstgive to the non-specialist reader an introduction to two ML tasks,classification and regression, that are particularly relevant for gesturalinteraction. We then present a review of the literature in current NIMEresearch that uses ML in musical gesture analysis and gestural sound control.We describe the ways in which machine learning is useful for cre...

  17. Transportation Mode Detection Based on Permutation Entropy and Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2015-01-01

    Full Text Available With the increasing prevalence of GPS devices and mobile phones, transportation mode detection based on GPS data has been a hot topic in GPS trajectory data analysis. Transportation modes such as walking, driving, bus, and taxi denote an important characteristic of the mobile user. Longitude, latitude, speed, acceleration, and direction are usually used as features in transportation mode detection. In this paper, first, we explore the possibility of using Permutation Entropy (PE of speed, a measure of complexity and uncertainty of GPS trajectory segment, as a feature for transportation mode detection. Second, we employ Extreme Learning Machine (ELM to distinguish GPS trajectory segments of different transportation. Finally, to evaluate the performance of the proposed method, we make experiments on GeoLife dataset. Experiments results show that we can get more than 50% accuracy when only using PE as a feature to characterize trajectory sequence. PE can indeed be effectively used to detect transportation mode from GPS trajectory. The proposed method has much better accuracy and faster running time than the methods based on the other features and SVM classifier.

  18. Building an asynchronous web-based tool for machine learning classification.

    Science.gov (United States)

    Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila

    2002-01-01

    Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

  19. Machine Learning-based discovery of closures for reduced models of dynamical systems

    Science.gov (United States)

    Pan, Shaowu; Duraisamy, Karthik

    2017-11-01

    Despite the successful application of machine learning (ML) in fields such as image processing and speech recognition, only a few attempts has been made toward employing ML to represent the dynamics of complex physical systems. Previous attempts mostly focus on parameter calibration or data-driven augmentation of existing models. In this work we present a ML framework to discover closure terms in reduced models of dynamical systems and provide insights into potential problems associated with data-driven modeling. Based on exact closure models for linear system, we propose a general linear closure framework from viewpoint of optimization. The framework is based on trapezoidal approximation of convolution term. Hyperparameters that need to be determined include temporal length of memory effect, number of sampling points, and dimensions of hidden states. To circumvent the explicit specification of memory effect, a general framework inspired from neural networks is also proposed. We conduct both a priori and posteriori evaluations of the resulting model on a number of non-linear dynamical systems. This work was supported in part by AFOSR under the project ``LES Modeling of Non-local effects using Statistical Coarse-graining'' with Dr. Jean-Luc Cambier as the technical monitor.

  20. Error Correction of Meteorological Data Obtained with Mini-AWSs Based on Machine Learning

    Directory of Open Access Journals (Sweden)

    Ji-Hun Ha

    2018-01-01

    Full Text Available Severe weather events occur more frequently due to climate change; therefore, accurate weather forecasts are necessary, in addition to the development of numerical weather prediction (NWP of the past several decades. A method to improve the accuracy of weather forecasts based on NWP is the collection of more meteorological data by reducing the observation interval. However, in many areas, it is economically and locally difficult to collect observation data by installing automatic weather stations (AWSs. We developed a Mini-AWS, much smaller than AWSs, to complement the shortcomings of AWSs. The installation and maintenance costs of Mini-AWSs are lower than those of AWSs; Mini-AWSs have fewer spatial constraints with respect to the installation than AWSs. However, it is necessary to correct the data collected with Mini-AWSs because they might be affected by the external environment depending on the installation area. In this paper, we propose a novel error correction of atmospheric pressure data observed with a Mini-AWS based on machine learning. Using the proposed method, we obtained corrected atmospheric pressure data, reaching the standard of the World Meteorological Organization (WMO; ±0.1 hPa, and confirmed the potential of corrected atmospheric pressure data as an auxiliary resource for AWSs.

  1. A New Profile Learning Model for Recommendation System based on Machine Learning Technique

    Directory of Open Access Journals (Sweden)

    Shereen H. Ali

    2016-03-01

    Full Text Available Recommender systems (RSs have been used to successfully address the information overload problem by providing personalized and targeted recommendations to the end users. RSs are software tools and techniques providing suggestions for items to be of use to a user, hence, they typically apply techniques and methodologies from Data Mining. The main contribution of this paper is to introduce a new user profile learning model to promote the recommendation accuracy of vertical recommendation systems. The proposed profile learning model employs the vertical classifier that has been used in multi classification module of the Intelligent Adaptive Vertical Recommendation (IAVR system to discover the user’s area of interest, and then build the user’s profile accordingly. Experimental results have proven the effectiveness of the proposed profile learning model, which accordingly will promote the recommendation accuracy.

  2. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  3. Identifying Green Infrastructure from Social Media and Crowdsourcing- An Image Based Machine-Learning Approach.

    Science.gov (United States)

    Rai, A.; Minsker, B. S.

    2016-12-01

    In this work we introduce a novel dataset GRID: GReen Infrastructure Detection Dataset and a framework for identifying urban green storm water infrastructure (GI) designs (wetlands/ponds, urban trees, and rain gardens/bioswales) from social media and satellite aerial images using computer vision and machine learning methods. Along with the hydrologic benefits of GI, such as reducing runoff volumes and urban heat islands, GI also provides important socio-economic benefits such as stress recovery and community cohesion. However, GI is installed by many different parties and cities typically do not know where GI is located, making study of its impacts or siting new GI difficult. We use object recognition learning methods (template matching, sliding window approach, and Random Hough Forest method) and supervised machine learning algorithms (e.g., support vector machines) as initial screening approaches to detect potential GI sites, which can then be investigated in more detail using on-site surveys. Training data were collected from GPS locations of Flickr and Instagram image postings and Amazon Mechanical Turk identification of each GI type. Sliding window method outperformed other methods and achieved an average F measure, which is combined metric for precision and recall performance measure of 0.78.

  4. Novel nonlinear knowledge-based mean force potentials based on machine learning.

    Science.gov (United States)

    Dong, Qiwen; Zhou, Shuigeng

    2011-01-01

    The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based

  5. An IoT-Enabled Stroke Rehabilitation System Based on Smart Wearable Armband and Machine Learning.

    Science.gov (United States)

    Yang, Geng; Deng, Jia; Pang, Gaoyang; Zhang, Hao; Li, Jiayi; Deng, Bin; Pang, Zhibo; Xu, Juan; Jiang, Mingzhe; Liljeberg, Pasi; Xie, Haibo; Yang, Huayong

    2018-01-01

    Surface electromyography signal plays an important role in hand function recovery training. In this paper, an IoT-enabled stroke rehabilitation system was introduced which was based on a smart wearable armband (SWA), machine learning (ML) algorithms, and a 3-D printed dexterous robot hand. User comfort is one of the key issues which should be addressed for wearable devices. The SWA was developed by integrating a low-power and tiny-sized IoT sensing device with textile electrodes, which can measure, pre-process, and wirelessly transmit bio-potential signals. By evenly distributing surface electrodes over user's forearm, drawbacks of classification accuracy poor performance can be mitigated. A new method was put forward to find the optimal feature set. ML algorithms were leveraged to analyze and discriminate features of different hand movements, and their performances were appraised by classification complexity estimating algorithms and principal components analysis. According to the verification results, all nine gestures can be successfully identified with an average accuracy up to 96.20%. In addition, a 3-D printed five-finger robot hand was implemented for hand rehabilitation training purpose. Correspondingly, user's hand movement intentions were extracted and converted into a series of commands which were used to drive motors assembled inside the dexterous robot hand. As a result, the dexterous robot hand can mimic the user's gesture in a real-time manner, which shows the proposed system can be used as a training tool to facilitate rehabilitation process for the patients after stroke.

  6. Automatic detection of ischemic stroke based on scaling exponent electroencephalogram using extreme learning machine

    Science.gov (United States)

    Adhi, H. A.; Wijaya, S. K.; Prawito; Badri, C.; Rezal, M.

    2017-03-01

    Stroke is one of cerebrovascular diseases caused by the obstruction of blood flow to the brain. Stroke becomes the leading cause of death in Indonesia and the second in the world. Stroke also causes of the disability. Ischemic stroke accounts for most of all stroke cases. Obstruction of blood flow can cause tissue damage which results the electrical changes in the brain that can be observed through the electroencephalogram (EEG). In this study, we presented the results of automatic detection of ischemic stroke and normal subjects based on the scaling exponent EEG obtained through detrended fluctuation analysis (DFA) using extreme learning machine (ELM) as the classifier. The signal processing was performed with 18 channels of EEG in the range of 0-30 Hz. Scaling exponents of the subjects were used as the input for ELM to classify the ischemic stroke. The performance of detection was observed by the value of accuracy, sensitivity and specificity. The result showed, performance of the proposed method to classify the ischemic stroke was 84 % for accuracy, 82 % for sensitivity and 87 % for specificity with 120 hidden neurons and sine as the activation function of ELM.

  7. Node Detection and Internode Length Estimation of Tomato Seedlings Based on Image Analysis and Machine Learning

    Directory of Open Access Journals (Sweden)

    Kyosuke Yamamoto

    2016-07-01

    Full Text Available Seedling vigor in tomatoes determines the quality and growth of fruits and total plant productivity. It is well known that the salient effects of environmental stresses appear on the internode length; the length between adjoining main stem node (henceforth called node. In this study, we develop a method for internode length estimation using image processing technology. The proposed method consists of three steps: node detection, node order estimation, and internode length estimation. This method has two main advantages: (i as it uses machine learning approaches for node detection, it does not require adjustment of threshold values even though seedlings are imaged under varying timings and lighting conditions with complex backgrounds; and (ii as it uses affinity propagation for node order estimation, it can be applied to seedlings with different numbers of nodes without prior provision of the node number as a parameter. Our node detection results show that the proposed method can detect 72% of the 358 nodes in time-series imaging of three seedlings (recall = 0.72, precision = 0.78. In particular, the application of a general object recognition approach, Bag of Visual Words (BoVWs, enabled the elimination of many false positives on leaves occurring in the image segmentation based on pixel color, significantly improving the precision. The internode length estimation results had a relative error of below 15.4%. These results demonstrate that our method has the ability to evaluate the vigor of tomato seedlings quickly and accurately.

  8. Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes?

    Science.gov (United States)

    Drier, Yotam; Domany, Eytan

    2011-03-14

    The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.

  9. Discovering mammography-based machine learning classifiers for breast cancer diagnosis.

    Science.gov (United States)

    Ramos-Pollán, Raúl; Guevara-López, Miguel Angel; Suárez-Ortega, Cesar; Díaz-Herrero, Guillermo; Franco-Valiente, Jose Miguel; Rubio-Del-Solar, Manuel; González-de-Posada, Naimy; Vaz, Mario Augusto Pires; Loureiro, Joana; Ramos, Isabel

    2012-08-01

    This work explores the design of mammography-based machine learning classifiers (MLC) and proposes a new method to build MLC for breast cancer diagnosis. We massively evaluated MLC configurations to classify features vectors extracted from segmented regions (pathological lesion or normal tissue) on craniocaudal (CC) and/or mediolateral oblique (MLO) mammography image views, providing BI-RADS diagnosis. Previously, appropriate combinations of image processing and normalization techniques were applied to reduce image artifacts and increase mammograms details. The method can be used under different data acquisition circumstances and exploits computer clusters to select well performing MLC configurations. We evaluated 286 cases extracted from the repository owned by HSJ-FMUP, where specialized radiologists segmented regions on CC and/or MLO images (biopsies provided the golden standard). Around 20,000 MLC configurations were evaluated, obtaining classifiers achieving an area under the ROC curve of 0.996 when combining features vectors extracted from CC and MLO views of the same case.

  10. Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes?

    Directory of Open Access Journals (Sweden)

    Yotam Drier

    2011-03-01

    Full Text Available The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.

  11. A multilevel-ROI-features-based machine learning method for detection of morphometric biomarkers in Parkinson's disease.

    Science.gov (United States)

    Peng, Bo; Wang, Suhong; Zhou, Zhiyong; Liu, Yan; Tong, Baotong; Zhang, Tao; Dai, Yakang

    2017-06-09

    Machine learning methods have been widely used in recent years for detection of neuroimaging biomarkers in regions of interest (ROIs) and assisting diagnosis of neurodegenerative diseases. The innovation of this study is to use multilevel-ROI-features-based machine learning method to detect sensitive morphometric biomarkers in Parkinson's disease (PD). Specifically, the low-level ROI features (gray matter volume, cortical thickness, etc.) and high-level correlative features (connectivity between ROIs) are integrated to construct the multilevel ROI features. Filter- and wrapper- based feature selection method and multi-kernel support vector machine (SVM) are used in the classification algorithm. T1-weighted brain magnetic resonance (MR) images of 69 PD patients and 103 normal controls from the Parkinson's Progression Markers Initiative (PPMI) dataset are included in the study. The machine learning method performs well in classification between PD patients and normal controls with an accuracy of 85.78%, a specificity of 87.79%, and a sensitivity of 87.64%. The most sensitive biomarkers between PD patients and normal controls are mainly distributed in frontal lobe, parental lobe, limbic lobe, temporal lobe, and central region. The classification performance of our method with multilevel ROI features is significantly improved comparing with other classification methods using single-level features. The proposed method shows promising identification ability for detecting morphometric biomarkers in PD, thus confirming the potentiality of our method in assisting diagnosis of the disease. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  13. Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

    2018-03-01

    Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p breast cancer detected in the next subsequent mammography screening.

  14. Local-learning-based neuron selection for grasping gesture prediction in motor brain machine interfaces

    Science.gov (United States)

    Xu, Kai; Wang, Yiwen; Wang, Yueming; Wang, Fang; Hao, Yaoyao; Zhang, Shaomin; Zhang, Qiaosheng; Chen, Weidong; Zheng, Xiaoxiang

    2013-04-01

    Objective. The high-dimensional neural recordings bring computational challenges to movement decoding in motor brain machine interfaces (mBMI), especially for portable applications. However, not all recorded neural activities relate to the execution of a certain movement task. This paper proposes to use a local-learning-based method to perform neuron selection for the gesture prediction in a reaching and grasping task. Approach. Nonlinear neural activities are decomposed into a set of linear ones in a weighted feature space. A margin is defined to measure the distance between inter-class and intra-class neural patterns. The weights, reflecting the importance of neurons, are obtained by minimizing a margin-based exponential error function. To find the most dominant neurons in the task, 1-norm regularization is introduced to the objective function for sparse weights, where near-zero weights indicate irrelevant neurons. Main results. The signals of only 10 neurons out of 70 selected by the proposed method could achieve over 95% of the full recording's decoding accuracy of gesture predictions, no matter which different decoding methods are used (support vector machine and K-nearest neighbor). The temporal activities of the selected neurons show visually distinguishable patterns associated with various hand states. Compared with other algorithms, the proposed method can better eliminate the irrelevant neurons with near-zero weights and provides the important neuron subset with the best decoding performance in statistics. The weights of important neurons converge usually within 10-20 iterations. In addition, we study the temporal and spatial variation of neuron importance along a period of one and a half months in the same task. A high decoding performance can be maintained by updating the neuron subset. Significance. The proposed algorithm effectively ascertains the neuronal importance without assuming any coding model and provides a high performance with different

  15. A multi-label learning based kernel automatic recommendation method for support vector machine.

    Science.gov (United States)

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.

  16. Gaussian processes for machine learning.

    Science.gov (United States)

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

  17. Machine Learning in Medical Imaging.

    Science.gov (United States)

    Giger, Maryellen L

    2018-03-01

    Advances in both imaging and computers have synergistically led to a rapid rise in the potential use of artificial intelligence in various radiological imaging tasks, such as risk assessment, detection, diagnosis, prognosis, and therapy response, as well as in multi-omics disease discovery. A brief overview of the field is given here, allowing the reader to recognize the terminology, the various subfields, and components of machine learning, as well as the clinical potential. Radiomics, an expansion of computer-aided diagnosis, has been defined as the conversion of images to minable data. The ultimate benefit of quantitative radiomics is to (1) yield predictive image-based phenotypes of disease for precision medicine or (2) yield quantitative image-based phenotypes for data mining with other -omics for discovery (ie, imaging genomics). For deep learning in radiology to succeed, note that well-annotated large data sets are needed since deep networks are complex, computer software and hardware are evolving constantly, and subtle differences in disease states are more difficult to perceive than differences in everyday objects. In the future, machine learning in radiology is expected to have a substantial clinical impact with imaging examinations being routinely obtained in clinical practice, providing an opportunity to improve decision support in medical image interpretation. The term of note is decision support, indicating that computers will augment human decision making, making it more effective and efficient. The clinical impact of having computers in the routine clinical practice may allow radiologists to further integrate their knowledge with their clinical colleagues in other medical specialties and allow for precision medicine. Copyright © 2018. Published by Elsevier Inc.

  18. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    Directory of Open Access Journals (Sweden)

    Liu Qingzhong

    2011-12-01

    Full Text Available Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

  19. Lamb wave based automatic damage detection using matching pursuit and machine learning

    International Nuclear Information System (INIS)

    Agarwal, Sushant; Mitra, Mira

    2014-01-01

    In this study, matching pursuit (MP) has been tested with machine learning algorithms such as artificial neural networks (ANNs) and support vector machines (SVMs) to automate the process of damage detection in metallic plates. Here, damage detection is done using the Lamb wave response in a thin aluminium plate simulated using a finite element (FE) method. To reduce the complexity of the Lamb wave response, only the A 0 mode is excited and sensed. The procedure adopted for damage detection consists of three major steps, involving signal processing and machine learning (ML). In the first step, MP is used for de-noising and enhancing the sparsity of the database. In the existing literature, MP is used to decompose any signal into a linear combination of waveforms that are selected from a redundant dictionary. In this work, MP is deployed in two stages to make the database sparse as well as to de-noise it. After using MP on the database, it is then passed as input data for ML classifiers. ANN and SVM are used to detect the location of the potential damage from the reduced data. The study demonstrates that the SVM is a robust classifier in the presence of noise and is more efficient than the ANN. Out-of-sample data are used for the validation of the trained and tested classifier. Trained classifiers are found to be successful in the detection of damage with a detection rate of more than 95%. (paper)

  20. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors

    Directory of Open Access Journals (Sweden)

    Roland Zemp

    2016-01-01

    Full Text Available Occupational musculoskeletal disorders, particularly chronic low back pain (LBP, are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user’s sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest. Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples. Sixteen force sensor values and the backrest angle were used as the explanatory variables (features for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  1. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors.

    Science.gov (United States)

    Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio

    2016-01-01

    Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  2. Teraflop-scale Incremental Machine Learning

    OpenAIRE

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-u...

  3. Robust Machine Learning-Based Correction on Automatic Segmentation of the Cerebellum and Brainstem.

    Directory of Open Access Journals (Sweden)

    Jun Yi Wang

    Full Text Available Automated segmentation is a useful method for studying large brain structures such as the cerebellum and brainstem. However, automated segmentation may lead to inaccuracy and/or undesirable boundary. The goal of the present study was to investigate whether SegAdapter, a machine learning-based method, is useful for automatically correcting large segmentation errors and disagreement in anatomical definition. We further assessed the robustness of the method in handling size of training set, differences in head coil usage, and amount of brain atrophy. High resolution T1-weighted images were acquired from 30 healthy controls scanned with either an 8-channel or 32-channel head coil. Ten patients, who suffered from brain atrophy because of fragile X-associated tremor/ataxia syndrome, were scanned using the 32-channel head coil. The initial segmentations of the cerebellum and brainstem were generated automatically using Freesurfer. Subsequently, Freesurfer's segmentations were both manually corrected to serve as the gold standard and automatically corrected by SegAdapter. Using only 5 scans in the training set, spatial overlap with manual segmentation in Dice coefficient improved significantly from 0.956 (for Freesurfer segmentation to 0.978 (for SegAdapter-corrected segmentation for the cerebellum and from 0.821 to 0.954 for the brainstem. Reducing the training set size to 2 scans only decreased the Dice coefficient ≤0.002 for the cerebellum and ≤ 0.005 for the brainstem compared to the use of training set size of 5 scans in corrective learning. The method was also robust in handling differences between the training set and the test set in head coil usage and the amount of brain atrophy, which reduced spatial overlap only by <0.01. These results suggest that the combination of automated segmentation and corrective learning provides a valuable method for accurate and efficient segmentation of the cerebellum and brainstem, particularly in large

  4. Quality prediction modeling for sintered ores based on mechanism models of sintering and extreme learning machine based error compensation

    Science.gov (United States)

    Tiebin, Wu; Yunlian, Liu; Xinjun, Li; Yi, Yu; Bin, Zhang

    2018-06-01

    Aiming at the difficulty in quality prediction of sintered ores, a hybrid prediction model is established based on mechanism models of sintering and time-weighted error compensation on the basis of the extreme learning machine (ELM). At first, mechanism models of drum index, total iron, and alkalinity are constructed according to the chemical reaction mechanism and conservation of matter in the sintering process. As the process is simplified in the mechanism models, these models are not able to describe high nonlinearity. Therefore, errors are inevitable. For this reason, the time-weighted ELM based error compensation model is established. Simulation results verify that the hybrid model has a high accuracy and can meet the requirement for industrial applications.

  5. Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

    Science.gov (United States)

    Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

    2017-12-01

    The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3

  6. Advanced Cell Classifier: User-Friendly Machine-Learning-Based Software for Discovering Phenotypes in High-Content Imaging Data.

    Science.gov (United States)

    Piccinini, Filippo; Balassa, Tamas; Szkalisity, Abel; Molnar, Csaba; Paavolainen, Lassi; Kujala, Kaisa; Buzas, Krisztina; Sarazova, Marie; Pietiainen, Vilja; Kutay, Ulrike; Smith, Kevin; Horvath, Peter

    2017-06-28

    High-content, imaging-based screens now routinely generate data on a scale that precludes manual verification and interrogation. Software applying machine learning has become an essential tool to automate analysis, but these methods require annotated examples to learn from. Efficiently exploring large datasets to find relevant examples remains a challenging bottleneck. Here, we present Advanced Cell Classifier (ACC), a graphical software package for phenotypic analysis that addresses these difficulties. ACC applies machine-learning and image-analysis methods to high-content data generated by large-scale, cell-based experiments. It features methods to mine microscopic image data, discover new phenotypes, and improve recognition performance. We demonstrate that these features substantially expedite the training process, successfully uncover rare phenotypes, and improve the accuracy of the analysis. ACC is extensively documented, designed to be user-friendly for researchers without machine-learning expertise, and distributed as a free open-source tool at www.cellclassifier.org. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties

    Science.gov (United States)

    Gaultois, Michael W.; Oliynyk, Anton O.; Mar, Arthur; Sparks, Taylor D.; Mulholland, Gregory J.; Meredig, Bryce

    2016-05-01

    The experimental search for new thermoelectric materials remains largely confined to a limited set of successful chemical and structural families, such as chalcogenides, skutterudites, and Zintl phases. In principle, computational tools such as density functional theory (DFT) offer the possibility of rationally guiding experimental synthesis efforts toward very different chemistries. However, in practice, predicting thermoelectric properties from first principles remains a challenging endeavor [J. Carrete et al., Phys. Rev. X 4, 011019 (2014)], and experimental researchers generally do not directly use computation to drive their own synthesis efforts. To bridge this practical gap between experimental needs and computational tools, we report an open machine learning-based recommendation engine (http://thermoelectrics.citrination.com) for materials researchers that suggests promising new thermoelectric compositions based on pre-screening about 25 000 known materials and also evaluates the feasibility of user-designed compounds. We show this engine can identify interesting chemistries very different from known thermoelectrics. Specifically, we describe the experimental characterization of one example set of compounds derived from our engine, RE12Co5Bi (RE = Gd, Er), which exhibits surprising thermoelectric performance given its unprecedentedly high loading with metallic d and f block elements and warrants further investigation as a new thermoelectric material platform. We show that our engine predicts this family of materials to have low thermal and high electrical conductivities, but modest Seebeck coefficient, all of which are confirmed experimentally. We note that the engine also predicts materials that may simultaneously optimize all three properties entering into zT; we selected RE12Co5Bi for this study due to its interesting chemical composition and known facile synthesis.

  8. A Mobile Health Application to Predict Postpartum Depression Based on Machine Learning.

    Science.gov (United States)

    Jiménez-Serrano, Santiago; Tortajada, Salvador; García-Gómez, Juan Miguel

    2015-07-01

    Postpartum depression (PPD) is a disorder that often goes undiagnosed. The development of a screening program requires considerable and careful effort, where evidence-based decisions have to be taken in order to obtain an effective test with a high level of sensitivity and an acceptable specificity that is quick to perform, easy to interpret, culturally sensitive, and cost-effective. The purpose of this article is twofold: first, to develop classification models for detecting the risk of PPD during the first week after childbirth, thus enabling early intervention; and second, to develop a mobile health (m-health) application (app) for the Android(®) (Google, Mountain View, CA) platform based on the model with best performance for both mothers who have just given birth and clinicians who want to monitor their patient's test. A set of predictive models for estimating the risk of PPD was trained using machine learning techniques and data about postpartum women collected from seven Spanish hospitals. An internal evaluation was carried out using a hold-out strategy. An easy flowchart and architecture for designing the graphical user interface of the m-health app was followed. Naive Bayes showed the best balance between sensitivity and specificity as a predictive model for PPD during the first week after delivery. It was integrated into the clinical decision support system for Android mobile apps. This approach can enable the early prediction and detection of PPD because it fulfills the conditions of an effective screening test with a high level of sensitivity and specificity that is quick to perform, easy to interpret, culturally sensitive, and cost-effective.

  9. A SEMI-AUTOMATIC RULE SET BUILDING METHOD FOR URBAN LAND COVER CLASSIFICATION BASED ON MACHINE LEARNING AND HUMAN KNOWLEDGE

    Directory of Open Access Journals (Sweden)

    H. Y. Gu

    2017-09-01

    Full Text Available Classification rule set is important for Land Cover classification, which refers to features and decision rules. The selection of features and decision are based on an iterative trial-and-error approach that is often utilized in GEOBIA, however, it is time-consuming and has a poor versatility. This study has put forward a rule set building method for Land cover classification based on human knowledge and machine learning. The use of machine learning is to build rule sets effectively which will overcome the iterative trial-and-error approach. The use of human knowledge is to solve the shortcomings of existing machine learning method on insufficient usage of prior knowledge, and improve the versatility of rule sets. A two-step workflow has been introduced, firstly, an initial rule is built based on Random Forest and CART decision tree. Secondly, the initial rule is analyzed and validated based on human knowledge, where we use statistical confidence interval to determine its threshold. The test site is located in Potsdam City. We utilised the TOP, DSM and ground truth data. The results show that the method could determine rule set for Land Cover classification semi-automatically, and there are static features for different land cover classes.

  10. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features.

    Science.gov (United States)

    Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

    2017-07-18

    Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.

  11. Discrimination of plant root zone water status in greenhouse production based on phenotyping and machine learning techniques

    OpenAIRE

    Guo, Doudou; Juan, Jiaxiang; Chang, Liying; Zhang, Jingjin; Huang, Danfeng

    2017-01-01

    Plant-based sensing on water stress can provide sensitive and direct reference for precision irrigation system in greenhouse. However, plant information acquisition, interpretation, and systematical application remain insufficient. This study developed a discrimination method for plant root zone water status in greenhouse by integrating phenotyping and machine learning techniques. Pakchoi plants were used and treated by three root zone moisture levels, 40%, 60%, and 80% relative water content...

  12. Machine-Learning-Based Future Received Signal Strength Prediction Using Depth Images for mmWave Communications

    OpenAIRE

    Okamoto, Hironao; Nishio, Takayuki; Nakashima, Kota; Koda, Yusuke; Yamamoto, Koji; Morikura, Masahiro; Asai, Yusuke; Miyatake, Ryo

    2018-01-01

    This paper discusses a machine-learning (ML)-based future received signal strength (RSS) prediction scheme using depth camera images for millimeter-wave (mmWave) networks. The scheme provides the future RSS prediction of any mmWave links within the camera's view, including links where nodes are not transmitting frames. This enables network controllers to conduct network operations before line-of-sight path blockages degrade the RSS. Using the ML techniques, the prediction scheme automatically...

  13. Classification of HTTP traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, Tahir; Pedersen, Jens Myrup

    2012-01-01

    streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier...

  14. Higgs Machine Learning Challenge 2014

    CERN Document Server

    Olivier, A-P; Bourdarios, C ; LAL / Orsay; Goldfarb, S ; University of Michigan

    2014-01-01

    High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics. We are now exploring new ways to improve the cross-fertilization of the two fields by setting up a data challenge, following the footsteps of, among others, the astrophysics community (dark matter and galaxy zoo challenges) and neurobiology (connectomics and decoding the human brain). The organization committee consists of ATLAS physicists and machine learning researchers. The Challenge will run from Monday 12th to September 2014.

  15. Neuromorphic Deep Learning Machines

    OpenAIRE

    Neftci, E; Augustine, C; Paul, S; Detorakis, G

    2017-01-01

    An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Back Propagation (BP) rule, often relies on the immediate availability of network-wide...

  16. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Directory of Open Access Journals (Sweden)

    Enrico Glaab

    Full Text Available Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  17. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Science.gov (United States)

    Glaab, Enrico; Bacardit, Jaume; Garibaldi, Jonathan M; Krasnogor, Natalio

    2012-01-01

    Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  18. Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches.

    Science.gov (United States)

    Solti, Imre; Cooke, Colin R; Xia, Fei; Wurfel, Mark M

    2009-11-01

    This paper compares the performance of keyword and machine learning-based chest x-ray report classification for Acute Lung Injury (ALI). ALI mortality is approximately 30 percent. High mortality is, in part, a consequence of delayed manual chest x-ray classification. An automated system could reduce the time to recognize ALI and lead to reductions in mortality. For our study, 96 and 857 chest x-ray reports in two corpora were labeled by domain experts for ALI. We developed a keyword and a Maximum Entropy-based classification system. Word unigram and character n-grams provided the features for the machine learning system. The Maximum Entropy algorithm with character 6-gram achieved the highest performance (Recall=0.91, Precision=0.90 and F-measure=0.91) on the 857-report corpus. This study has shown that for the classification of ALI chest x-ray reports, the machine learning approach is superior to the keyword based system and achieves comparable results to highest performing physician annotators.

  19. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...

  20. Machine learning methods for credibility assessment of interviewees based on posturographic data.

    Science.gov (United States)

    Saripalle, Sashi K; Vemulapalli, Spandana; King, Gregory W; Burgoon, Judee K; Derakhshani, Reza

    2015-01-01

    This paper discusses the advantages of using posturographic signals from force plates for non-invasive credibility assessment. The contributions of our work are two fold: first, the proposed method is highly efficient and non invasive. Second, feasibility for creating an autonomous credibility assessment system using machine-learning algorithms is studied. This study employs an interview paradigm that includes subjects responding with truthful and deceptive intent while their center of pressure (COP) signal is being recorded. Classification models utilizing sets of COP features for deceptive responses are derived and best accuracy of 93.5% for test interval is reported.

  1. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales

    Directory of Open Access Journals (Sweden)

    Jihoon Oh

    2017-09-01

    Full Text Available Classification and prediction of suicide attempts in high-risk groups is important for preventing suicide. The purpose of this study was to investigate whether the information from multiple clinical scales has classification power for identifying actual suicide attempts. Patients with depression and anxiety disorders (N = 573 were included, and each participant completed 31 self-report psychiatric scales and questionnaires about their history of suicide attempts. We then trained an artificial neural network classifier with 41 variables (31 psychiatric scales and 10 sociodemographic elements and ranked the contribution of each variable for the classification of suicide attempts. To evaluate the clinical applicability of our model, we measured classification performance with top-ranked predictors. Our model had an overall accuracy of 93.7% in 1-month, 90.8% in 1-year, and 87.4% in lifetime suicide attempts detection. The area under the receiver operating characteristic curve (AUROC was the highest for 1-month suicide attempts detection (0.93, followed by lifetime (0.89, and 1-year detection (0.87. Among all variables, the Emotion Regulation Questionnaire had the highest contribution, and the positive and negative characteristics of the scales similarly contributed to classification performance. Performance on suicide attempts classification was largely maintained when we only used the top five ranked variables for training (AUROC; 1-month, 0.75, 1-year, 0.85, lifetime suicide attempts detection, 0.87. Our findings indicate that information from self-report clinical scales can be useful for the classification of suicide attempts. Based on the reliable performance of the top five predictors alone, this machine learning approach could help clinicians identify high-risk patients in clinical settings.

  2. Machine learning modeling of plant phenology based on coupling satellite and gridded meteorological dataset

    Science.gov (United States)

    Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna

    2018-04-01

    Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For

  3. Designing Green Stormwater Infrastructure for Hydrologic and Human Benefits: An Image Based Machine Learning Approach

    Science.gov (United States)

    Rai, A.; Minsker, B. S.

    2014-12-01

    Urbanization over the last century has degraded our natural water resources by increasing storm-water runoff, reducing nutrient retention, and creating poor ecosystem health downstream. The loss of tree canopy and expansion of impervious area and storm sewer systems have significantly decreased infiltration and evapotranspiration, increased stream-flow velocities, and increased flood risk. These problems have brought increasing attention to catchment-wide implementation of green infrastructure (e.g., decentralized green storm water management practices such as bioswales, rain gardens, permeable pavements, tree box filters, cisterns, urban wetlands, urban forests, stream buffers, and green roofs) to replace or supplement conventional storm water management practices and create more sustainable urban water systems. Current green infrastructure (GI) practice aims at mitigating the negative effects of urbanization by restoring pre-development hydrology and ultimately addressing water quality issues at an urban catchment scale. The benefits of green infrastructure extend well beyond local storm water management, as urban green spaces are also major contributors to human health. Considerable research in the psychological sciences have shown significant human health benefits from appropriately designed green spaces, yet impacts on human wellbeing have not yet been formally considered in GI design frameworks. This research is developing a novel computational green infrastructure (GI) design framework that integrates hydrologic requirements with criteria for human wellbeing. A supervised machine learning model is created to identify specific patterns in urban green spaces that promote human wellbeing; the model is linked to RHESSYS model to evaluate GI designs in terms of both hydrologic and human health benefits. An application of the models to Dead Run Watershed in Baltimore showed that image mining methods were able to capture key elements of human preferences that could

  4. Use of machine learning to shorten observation-based screening and diagnosis of autism.

    Science.gov (United States)

    Wall, D P; Kosmicki, J; Deluca, T F; Harstad, E; Fusaro, V A

    2012-04-10

    The Autism Diagnostic Observation Schedule-Generic (ADOS) is one of the most widely used instruments for behavioral evaluation of autism spectrum disorders. It is composed of four modules, each tailored for a specific group of individuals based on their language and developmental level. On average, a module takes between 30 and 60 min to deliver. We used a series of machine-learning algorithms to study the complete set of scores from Module 1 of the ADOS available at the Autism Genetic Resource Exchange (AGRE) for 612 individuals with a classification of autism and 15 non-spectrum individuals from both AGRE and the Boston Autism Consortium (AC). Our analysis indicated that 8 of the 29 items contained in Module 1 of the ADOS were sufficient to classify autism with 100% accuracy. We further validated the accuracy of this eight-item classifier against complete sets of scores from two independent sources, a collection of 110 individuals with autism from AC and a collection of 336 individuals with autism from the Simons Foundation. In both cases, our classifier performed with nearly 100% sensitivity, correctly classifying all but two of the individuals from these two resources with a diagnosis of autism, and with 94% specificity on a collection of observed and simulated non-spectrum controls. The classifier contained several elements found in the ADOS algorithm, demonstrating high test validity, and also resulted in a quantitative score that measures classification confidence and extremeness of the phenotype. With incidence rates rising, the ability to classify autism effectively and quickly requires careful design of assessment and diagnostic tools. Given the brevity, accuracy and quantitative nature of the classifier, results from this study may prove valuable in the development of mobile tools for preliminary evaluation and clinical prioritization-in particular those focused on assessment of short home videos of children--that speed the pace of initial evaluation

  5. A Machine Learning-Based Analysis of Game Data for Attention Deficit Hyperactivity Disorder Assessment.

    Science.gov (United States)

    Heller, Monika D; Roots, Kurt; Srivastava, Sanjana; Schumann, Jennifer; Srivastava, Jaideep; Hale, T Sigi

    2013-10-01

    Attention deficit hyperactivity disorder (ADHD) is found in 9.5 percent of the U.S. population and poses lifelong challenges. Current diagnostic approaches rely on evaluation forms completed by teachers and/or parents, although they are not specifically trained to recognize cognitive disorders. The most accurate diagnosis is by a psychiatrist, often only available to children with severe symptoms. Development of a tool that is engaging and objective and aids medical providers is needed in the diagnosis of ADHD. The goal of this research is to work toward the development of such a tool. The proposed approach takes advantage of two trends: The rapid adoption of tangible user interface devices and the popularity of interactive videogames. CogCubed Inc. (Minneapolis, MN) has created "Groundskeeper," a game on the Sifteo Cubes (Sifteo, Inc., San Francisco, CA) game system with elements that exercise skills affected by ADHD. "Groundskeeper" was evaluated for 52 patients, with and without ADHD. Gameplay data were mathematically transformed into ADHD-indicative feature variables and subjected to machine learning algorithms to develop diagnostic models to aid psychiatric clinical assessments of ADHD. The effectiveness of the developed model was evaluated against the diagnostic impressions of two licensed child/adolescent psychiatrists using semistructured interviews. Our predictive algorithms were highly accurate in correctly predicting diagnoses based on gameplay of "Groundskeeper." The F-measure, a measure of diagnosis accuracy, from the predictive models gave values as follows: ADHD, inattentive type, 78 percent (P>0.05); ADHD, combined type, 75 percent (Pdepressive disorders, 76%. This represents a promising new approach to screening tools for ADHD.

  6. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales.

    Science.gov (United States)

    Oh, Jihoon; Yun, Kyongsik; Hwang, Ji-Hyun; Chae, Jeong-Ho

    2017-01-01

    Classification and prediction of suicide attempts in high-risk groups is important for preventing suicide. The purpose of this study was to investigate whether the information from multiple clinical scales has classification power for identifying actual suicide attempts. Patients with depression and anxiety disorders ( N  = 573) were included, and each participant completed 31 self-report psychiatric scales and questionnaires about their history of suicide attempts. We then trained an artificial neural network classifier with 41 variables (31 psychiatric scales and 10 sociodemographic elements) and ranked the contribution of each variable for the classification of suicide attempts. To evaluate the clinical applicability of our model, we measured classification performance with top-ranked predictors. Our model had an overall accuracy of 93.7% in 1-month, 90.8% in 1-year, and 87.4% in lifetime suicide attempts detection. The area under the receiver operating characteristic curve (AUROC) was the highest for 1-month suicide attempts detection (0.93), followed by lifetime (0.89), and 1-year detection (0.87). Among all variables, the Emotion Regulation Questionnaire had the highest contribution, and the positive and negative characteristics of the scales similarly contributed to classification performance. Performance on suicide attempts classification was largely maintained when we only used the top five ranked variables for training (AUROC; 1-month, 0.75, 1-year, 0.85, lifetime suicide attempts detection, 0.87). Our findings indicate that information from self-report clinical scales can be useful for the classification of suicide attempts. Based on the reliable performance of the top five predictors alone, this machine learning approach could help clinicians identify high-risk patients in clinical settings.

  7. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  8. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  9. Application of Machine Learning Techniques in Aquaculture

    OpenAIRE

    Rahman, Akhlaqur; Tasnim, Sumaira

    2014-01-01

    In this paper we present applications of different machine learning algorithms in aquaculture. Machine learning algorithms learn models from historical data. In aquaculture historical data are obtained from farm practices, yields, and environmental data sources. Associations between these different variables can be obtained by applying machine learning algorithms to historical data. In this paper we present applications of different machine learning algorithms in aquaculture applications.

  10. An automated, high-throughput plant phenotyping system using machine learning-based plant segmentation and image analysis.

    Science.gov (United States)

    Lee, Unseok; Chang, Sungyul; Putra, Gian Anantrio; Kim, Hyoungseok; Kim, Dong Hwan

    2018-01-01

    A high-throughput plant phenotyping system automatically observes and grows many plant samples. Many plant sample images are acquired by the system to determine the characteristics of the plants (populations). Stable image acquisition and processing is very important to accurately determine the characteristics. However, hardware for acquiring plant images rapidly and stably, while minimizing plant stress, is lacking. Moreover, most software cannot adequately handle large-scale plant imaging. To address these problems, we developed a new, automated, high-throughput plant phenotyping system using simple and robust hardware, and an automated plant-imaging-analysis pipeline consisting of machine-learning-based plant segmentation. Our hardware acquires images reliably and quickly and minimizes plant stress. Furthermore, the images are processed automatically. In particular, large-scale plant-image datasets can be segmented precisely using a classifier developed using a superpixel-based machine-learning algorithm (Random Forest), and variations in plant parameters (such as area) over time can be assessed using the segmented images. We performed comparative evaluations to identify an appropriate learning algorithm for our proposed system, and tested three robust learning algorithms. We developed not only an automatic analysis pipeline but also a convenient means of plant-growth analysis that provides a learning data interface and visualization of plant growth trends. Thus, our system allows end-users such as plant biologists to analyze plant growth via large-scale plant image data easily.

  11. Using Blood Indexes to Predict Overweight Statuses: An Extreme Learning Machine-Based Approach.

    Directory of Open Access Journals (Sweden)

    Huiling Chen

    Full Text Available The number of the overweight people continues to rise across the world. Studies have shown that being overweight can increase health risks, such as high blood pressure, diabetes mellitus, coronary heart disease, and certain forms of cancer. Therefore, identifying the overweight status in people is critical to prevent and decrease health risks. This study explores a new technique that uses blood and biochemical measurements to recognize the overweight condition. A new machine learning technique, an extreme learning machine, was developed to accurately detect the overweight status from a pool of 225 overweight and 251 healthy subjects. The group included 179 males and 297 females. The detection method was rigorously evaluated against the real-life dataset for accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic (ROC curve criterion. Additionally, the feature selection was investigated to identify correlating factors for the overweight status. The results demonstrate that there are significant differences in blood and biochemical indexes between healthy and overweight people (p-value < 0.01. According to the feature selection, the most important correlated indexes are creatinine, hemoglobin, hematokrit, uric Acid, red blood cells, high density lipoprotein, alanine transaminase, triglyceride, and γ-glutamyl transpeptidase. These are consistent with the results of Spearman test analysis. The proposed method holds promise as a new, accurate method for identifying the overweight status in subjects.

  12. Machine Learning-Based Content Analysis: Automating the analysis of frames and agendas in political communication research

    NARCIS (Netherlands)

    Burscher, B.

    2016-01-01

    We used machine learning to study policy issues and frames in political messages. With regard to frames, we investigated the automation of two content-analytical tasks: frame coding and frame identification. We found that both tasks can be successfully automated by means of machine learning

  13. Machine Learning applications in CMS

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine Learning is used in many aspects of CMS data taking, monitoring, processing and analysis. We review a few of these use cases and the most recent developments, with an outlook to future applications in the LHC Run III and for the High-Luminosity phase.

  14. Visible Machine Learning for Biomedicine.

    Science.gov (United States)

    Yu, Michael K; Ma, Jianzhu; Fisher, Jasmin; Kreisberg, Jason F; Raphael, Benjamin J; Ideker, Trey

    2018-06-14

    A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology. Copyright © 2018. Published by Elsevier Inc.

  15. Markerless gating for lung cancer radiotherapy based on machine learning techniques

    International Nuclear Information System (INIS)

    Lin Tong; Li Ruijiang; Tang Xiaoli; Jiang, Steve B; Dy, Jennifer G

    2009-01-01

    In lung cancer radiotherapy, radiation to a mobile target can be delivered by respiratory gating, for which we need to know whether the target is inside or outside a predefined gating window at any time point during the treatment. This can be achieved by tracking one or more fiducial markers implanted inside or near the target, either fluoroscopically or electromagnetically. However, the clinical implementation of marker tracking is limited for lung cancer radiotherapy mainly due to the risk of pneumothorax. Therefore, gating without implanted fiducial markers is a promising clinical direction. We have developed several template-matching methods for fluoroscopic marker-less gating. Recently, we have modeled the gating problem as a binary pattern classification problem, in which principal component analysis (PCA) and support vector machine (SVM) are combined to perform the classification task. Following the same framework, we investigated different combinations of dimensionality reduction techniques (PCA and four nonlinear manifold learning methods) and two machine learning classification methods (artificial neural networks-ANN and SVM). Performance was evaluated on ten fluoroscopic image sequences of nine lung cancer patients. We found that among all combinations of dimensionality reduction techniques and classification methods, PCA combined with either ANN or SVM achieved a better performance than the other nonlinear manifold learning methods. ANN when combined with PCA achieves a better performance than SVM in terms of classification accuracy and recall rate, although the target coverage is similar for the two classification methods. Furthermore, the running time for both ANN and SVM with PCA is within tolerance for real-time applications. Overall, ANN combined with PCA is a better candidate than other combinations we investigated in this work for real-time gated radiotherapy.

  16. System modeling based on machine learning for anomaly detection and predictive maintenance in industrial plants

    OpenAIRE

    Kroll, Björn; Schaffranek, David; Schriegel, Sebastian; Niggemann, Oliver

    2014-01-01

    Electricity, water or air are some Industrial energy carriers which are struggling under the prices of primary energy carriers. The European Union for example used more 20.000.000 GWh electricity in 2011 based on the IEA Report [1]. Cyber Physical Production Systems (CPPS) are able to reduce this amount, but they also help to increase the efficiency of machines above expectations which results in a more cost efficient production. Especially in the field of improving industrial plants, one of ...

  17. Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.

    Science.gov (United States)

    Zech, John; Pain, Margaret; Titano, Joseph; Badgeley, Marcus; Schefflein, Javin; Su, Andres; Costa, Anthony; Bederson, Joshua; Lehar, Joseph; Oermann, Eric Karl

    2018-05-01

    Purpose To compare different methods for generating features from radiology reports and to develop a method to automatically identify findings in these reports. Materials and Methods In this study, 96 303 head computed tomography (CT) reports were obtained. The linguistic complexity of these reports was compared with that of alternative corpora. Head CT reports were preprocessed, and machine-analyzable features were constructed by using bag-of-words (BOW), word embedding, and Latent Dirichlet allocation-based approaches. Ultimately, 1004 head CT reports were manually labeled for findings of interest by physicians, and a subset of these were deemed critical findings. Lasso logistic regression was used to train models for physician-assigned labels on 602 of 1004 head CT reports (60%) using the constructed features, and the performance of these models was validated on a held-out 402 of 1004 reports (40%). Models were scored by area under the receiver operating characteristic curve (AUC), and aggregate AUC statistics were reported for (a) all labels, (b) critical labels, and (c) the presence of any critical finding in a report. Sensitivity, specificity, accuracy, and F1 score were reported for the best performing model's (a) predictions of all labels and (b) identification of reports containing critical findings. Results The best-performing model (BOW with unigrams, bigrams, and trigrams plus average word embeddings vector) had a held-out AUC of 0.966 for identifying the presence of any critical head CT finding and an average 0.957 AUC across all head CT findings. Sensitivity and specificity for identifying the presence of any critical finding were 92.59% (175 of 189) and 89.67% (191 of 213), respectively. Average sensitivity and specificity across all findings were 90.25% (1898 of 2103) and 91.72% (18 351 of 20 007), respectively. Simpler BOW methods achieved results competitive with those of more sophisticated approaches, with an average AUC for presence of any

  18. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  19. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  20. Parsimonious Wavelet Kernel Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Wang Qin

    2015-11-01

    Full Text Available In this study, a parsimonious scheme for wavelet kernel extreme learning machine (named PWKELM was introduced by combining wavelet theory and a parsimonious algorithm into kernel extreme learning machine (KELM. In the wavelet analysis, bases that were localized in time and frequency to represent various signals effectively were used. Wavelet kernel extreme learning machine (WELM maximized its capability to capture the essential features in “frequency-rich” signals. The proposed parsimonious algorithm also incorporated significant wavelet kernel functions via iteration in virtue of Householder matrix, thus producing a sparse solution that eased the computational burden and improved numerical stability. The experimental results achieved from the synthetic dataset and a gas furnace instance demonstrated that the proposed PWKELM is efficient and feasible in terms of improving generalization accuracy and real time performance.

  1. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening.

    Science.gov (United States)

    Cang, Zixuan; Mu, Lin; Wei, Guo-Wei

    2018-01-01

    This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.

  2. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Science.gov (United States)

    Mu, Lin

    2018-01-01

    This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination. PMID:29309403

  3. Machine Learning Optimization of Evolvable Artificial Cells

    DEFF Research Database (Denmark)

    Caschera, F.; Rasmussen, S.; Hanczyc, M.

    2011-01-01

    can be explored. A machine learning approach (Evo-DoE) could be applied to explore this experimental space and define optimal interactions according to a specific fitness function. Herein an implementation of an evolutionary design of experiments to optimize chemical and biochemical systems based...... on a machine learning process is presented. The optimization proceeds over generations of experiments in iterative loop until optimal compositions are discovered. The fitness function is experimentally measured every time the loop is closed. Two examples of complex systems, namely a liposomal drug formulation...

  4. DNA-based machines.

    Science.gov (United States)

    Wang, Fuan; Willner, Bilha; Willner, Itamar

    2014-01-01

    The base sequence in nucleic acids encodes substantial structural and functional information into the biopolymer. This encoded information provides the basis for the tailoring and assembly of DNA machines. A DNA machine is defined as a molecular device that exhibits the following fundamental features. (1) It performs a fuel-driven mechanical process that mimics macroscopic machines. (2) The mechanical process requires an energy input, "fuel." (3) The mechanical operation is accompanied by an energy consumption process that leads to "waste products." (4) The cyclic operation of the DNA devices, involves the use of "fuel" and "anti-fuel" ingredients. A variety of DNA-based machines are described, including the construction of "tweezers," "walkers," "robots," "cranes," "transporters," "springs," "gears," and interlocked cyclic DNA structures acting as reconfigurable catenanes, rotaxanes, and rotors. Different "fuels", such as nucleic acid strands, pH (H⁺/OH⁻), metal ions, and light, are used to trigger the mechanical functions of the DNA devices. The operation of the devices in solution and on surfaces is described, and a variety of optical, electrical, and photoelectrochemical methods to follow the operations of the DNA machines are presented. We further address the possible applications of DNA machines and the future perspectives of molecular DNA devices. These include the application of DNA machines as functional structures for the construction of logic gates and computing, for the programmed organization of metallic nanoparticle structures and the control of plasmonic properties, and for controlling chemical transformations by DNA machines. We further discuss the future applications of DNA machines for intracellular sensing, controlling intracellular metabolic pathways, and the use of the functional nanostructures for drug delivery and medical applications.

  5. Sitting Posture Monitoring System Based on a Low-Cost Load Cell Using Machine Learning

    Directory of Open Access Journals (Sweden)

    Jongryun Roh

    2018-01-01

    Full Text Available Sitting posture monitoring systems (SPMSs help assess the posture of a seated person in real-time and improve sitting posture. To date, SPMS studies reported have required many sensors mounted on the backrest plate and seat plate of a chair. The present study, therefore, developed a system that measures a total of six sitting postures including the posture that applied a load to the backrest plate, with four load cells mounted only on the seat plate. Various machine learning algorithms were applied to the body weight ratio measured by the developed SPMS to identify the method that most accurately classified the actual sitting posture of the seated person. After classifying the sitting postures using several classifiers, average and maximum classification rates of 97.20% and 97.94%, respectively, were obtained from nine subjects with a support vector machine using the radial basis function kernel; the results obtained by this classifier showed a statistically significant difference from the results of multiple classifications using other classifiers. The proposed SPMS was able to classify six sitting postures including the posture with loading on the backrest and showed the possibility of classifying the sitting posture even though the number of sensors is reduced.

  6. Using Five Machine Learning for Breast Cancer Biopsy Predictions Based on Mammographic Diagnosis

    OpenAIRE

    Oyewola, David; Hakimi, Danladi; Adeboye, Kayode; Shehu, Musa Danjuma

    2017-01-01

    Breast cancer is one of thecauses of female death in the world. Mammography  is commonly  used for  distinguishing  malignant tumors  from benign  ones. In this research,  a mammographic  diagnostic method  is  presented for breast  cancer  biopsy outcome  predictions  using  fivemachine learning which includes: Logistic Regression(LR), Linear DiscriminantAnalysis(LDA), Quadratic Discriminant Analysis(QDA), Random Forest(RF) andSupport  Vector Machine(SVM)  classification.  The testing result...

  7. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  8. Performance of machine-learning scoring functions in structure-based virtual screening.

    Science.gov (United States)

    Wójcikowski, Maciej; Ballester, Pedro J; Siedlecki, Pawel

    2017-04-25

    Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and -0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).

  9. Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels

    Science.gov (United States)

    Dral, Pavlo O.; Owens, Alec; Yurchenko, Sergei N.; Thiel, Walter

    2017-06-01

    We present an efficient approach for generating highly accurate molecular potential energy surfaces (PESs) using self-correcting, kernel ridge regression (KRR) based machine learning (ML). We introduce structure-based sampling to automatically assign nuclear configurations from a pre-defined grid to the training and prediction sets, respectively. Accurate high-level ab initio energies are required only for the points in the training set, while the energies for the remaining points are provided by the ML model with negligible computational cost. The proposed sampling procedure is shown to be superior to random sampling and also eliminates the need for training several ML models. Self-correcting machine learning has been implemented such that each additional layer corrects errors from the previous layer. The performance of our approach is demonstrated in a case study on a published high-level ab initio PES of methyl chloride with 44 819 points. The ML model is trained on sets of different sizes and then used to predict the energies for tens of thousands of nuclear configurations within seconds. The resulting datasets are utilized in variational calculations of the vibrational energy levels of CH3Cl. By using both structure-based sampling and self-correction, the size of the training set can be kept small (e.g., 10% of the points) without any significant loss of accuracy. In ab initio rovibrational spectroscopy, it is thus possible to reduce the number of computationally costly electronic structure calculations through structure-based sampling and self-correcting KRR-based machine learning by up to 90%.

  10. Machine learning enhanced optical distance sensor

    Science.gov (United States)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  11. Supporting visual quality assessment with machine learning

    NARCIS (Netherlands)

    Gastaldo, P.; Zunino, R.; Redi, J.

    2013-01-01

    Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality

  12. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods

    Directory of Open Access Journals (Sweden)

    Pontil Massimiliano

    2009-10-01

    Full Text Available Abstract Background Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots" at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. Results We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which ΔΔG ≥ 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. Conclusion We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been

  13. A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine.

    Directory of Open Access Journals (Sweden)

    Qiang Shang

    Full Text Available Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS. Because of the uncertainty and nonlinearity, short-term traffic flow prediction is a challenging task. In order to improve the accuracy of short-time traffic flow prediction, a hybrid model (SSA-KELM is proposed based on singular spectrum analysis (SSA and kernel extreme learning machine (KELM. SSA is used to filter out the noise of traffic flow time series. Then, the filtered traffic flow data is used to train KELM model, the optimal input form of the proposed model is determined by phase space reconstruction, and parameters of the model are optimized by gravitational search algorithm (GSA. Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. And the SSA-KELM model is compared with several well-known prediction models, including support vector machine, extreme learning machine, and single KLEM model. The experimental results demonstrate that performance of the proposed model is superior to that of the comparison models. Apart from accuracy improvement, the proposed model is more robust.

  14. Discrimination of plant root zone water status in greenhouse production based on phenotyping and machine learning techniques.

    Science.gov (United States)

    Guo, Doudou; Juan, Jiaxiang; Chang, Liying; Zhang, Jingjin; Huang, Danfeng

    2017-08-15

    Plant-based sensing on water stress can provide sensitive and direct reference for precision irrigation system in greenhouse. However, plant information acquisition, interpretation, and systematical application remain insufficient. This study developed a discrimination method for plant root zone water status in greenhouse by integrating phenotyping and machine learning techniques. Pakchoi plants were used and treated by three root zone moisture levels, 40%, 60%, and 80% relative water content. Three classification models, Random Forest (RF), Neural Network (NN), and Support Vector Machine (SVM) were developed and validated in different scenarios with overall accuracy over 90% for all. SVM model had the highest value, but it required the longest training time. All models had accuracy over 85% in all scenarios, and more stable performance was observed in RF model. Simplified SVM model developed by the top five most contributing traits had the largest accuracy reduction as 29.5%, while simplified RF and NN model still maintained approximately 80%. For real case application, factors such as operation cost, precision requirement, and system reaction time should be synthetically considered in model selection. Our work shows it is promising to discriminate plant root zone water status by implementing phenotyping and machine learning techniques for precision irrigation management.

  15. Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression.

    Science.gov (United States)

    Nouretdinov, Ilia; Costafreda, Sergi G; Gammerman, Alexander; Chervonenkis, Alexey; Vovk, Vladimir; Vapnik, Vladimir; Fu, Cynthia H Y

    2011-05-15

    There is rapidly accumulating evidence that the application of machine learning classification to neuroimaging measurements may be valuable for the development of diagnostic and prognostic prediction tools in psychiatry. However, current methods do not produce a measure of the reliability of the predictions. Knowing the risk of the error associated with a given prediction is essential for the development of neuroimaging-based clinical tools. We propose a general probabilistic classification method to produce measures of confidence for magnetic resonance imaging (MRI) data. We describe the application of transductive conformal predictor (TCP) to MRI images. TCP generates the most likely prediction and a valid measure of confidence, as well as the set of all possible predictions for a given confidence level. We present the theoretical motivation for TCP, and we have applied TCP to structural and functional MRI data in patients and healthy controls to investigate diagnostic and prognostic prediction in depression. We verify that TCP predictions are as accurate as those obtained with more standard machine learning methods, such as support vector machine, while providing the additional benefit of a valid measure of confidence for each prediction. Copyright © 2010 Elsevier Inc. All rights reserved.

  16. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  17. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  18. A comparison of rule-based and machine learning approaches for classifying patient portal messages.

    Science.gov (United States)

    Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Rosenbloom, S Trent; Jackson, Gretchen Purcell

    2017-09-01

    Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard

  19. What is the machine learning?

    Science.gov (United States)

    Chang, Spencer; Cohen, Timothy; Ostdiek, Bryan

    2018-03-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables—aided by physical intuition—that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus nonlinear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

  20. Statistical and machine learning approaches for network analysis

    CERN Document Server

    Dehmer, Matthias

    2012-01-01

    Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internation

  1. Galaxy Classification using Machine Learning

    Science.gov (United States)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  2. Scaling up graph-based semisupervised learning via prototype vector machines.

    Science.gov (United States)

    Zhang, Kai; Lan, Liang; Kwok, James T; Vucetic, Slobodan; Parvin, Bahram

    2015-03-01

    When the amount of labeled data are limited, semisupervised learning can improve the learner's performance by also using the often easily available unlabeled data. In particular, a popular approach requires the learned function to be smooth on the underlying data manifold. By approximating this manifold as a weighted graph, such graph-based techniques can often achieve state-of-the-art performance. However, their high time and space complexities make them less attractive on large data sets. In this paper, we propose to scale up graph-based semisupervised learning using a set of sparse prototypes derived from the data. These prototypes serve as a small set of data representatives, which can be used to approximate the graph-based regularizer and to control model complexity. Consequently, both training and testing become much more efficient. Moreover, when the Gaussian kernel is used to define the graph affinity, a simple and principled method to select the prototypes can be obtained. Experiments on a number of real-world data sets demonstrate encouraging performance and scaling properties of the proposed approach. It also compares favorably with models learned via l1 -regularization at the same level of model sparsity. These results demonstrate the efficacy of the proposed approach in producing highly parsimonious and accurate models for semisupervised learning.

  3. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

    Science.gov (United States)

    Masso, Majid; Vaisman, Iosif I

    2008-09-15

    Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

  4. Development of a Late-Life Dementia Prediction Index with Supervised Machine Learning in the Population-Based CAIDE Study.

    Science.gov (United States)

    Pekkala, Timo; Hall, Anette; Lötjönen, Jyrki; Mattila, Jussi; Soininen, Hilkka; Ngandu, Tiia; Laatikainen, Tiina; Kivipelto, Miia; Solomon, Alina

    2017-01-01

    This study aimed to develop a late-life dementia prediction model using a novel validated supervised machine learning method, the Disease State Index (DSI), in the Finnish population-based CAIDE study. The CAIDE study was based on previous population-based midlife surveys. CAIDE participants were re-examined twice in late-life, and the first late-life re-examination was used as baseline for the present study. The main study population included 709 cognitively normal subjects at first re-examination who returned to the second re-examination up to 10 years later (incident dementia n = 39). An extended population (n = 1009, incident dementia 151) included non-participants/non-survivors (national registers data). DSI was used to develop a dementia index based on first re-examination assessments. Performance in predicting dementia was assessed as area under the ROC curve (AUC). AUCs for DSI were 0.79 and 0.75 for main and extended populations. Included predictors were cognition, vascular factors, age, subjective memory complaints, and APOE genotype. The supervised machine learning method performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. DSI could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions.

  5. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    Science.gov (United States)

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  6. Feedback for reinforcement learning based brain-machine interfaces using confidence metrics

    Science.gov (United States)

    Prins, Noeline W.; Sanchez, Justin C.; Prasad, Abhishek

    2017-06-01

    Objective. For brain-machine interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance. Approach. Motivated by the perception-action-reward cycle (PARC) in the brain which links reward for cognitive decision making and goal-directed behavior, we used a reward-based RL architecture named Actor-Critic RL as the model. Instead of using an error signal towards building an autonomous BMI, we envision to use a reward signal from the nucleus accumbens (NAcc) which plays a key role in the linking of reward to motor behaviors. To deal with the complexity and non-stationarity of biological reward signals, we used a confidence metric which was used to indicate the degree of feedback accuracy. This confidence was added to the Actor’s weight update equation in the RL controller architecture. If the confidence was high (>0.2), the BMI decoder used this feedback to update its parameters. However, when the confidence was low, the BMI decoder ignored the feedback and did not update its parameters. The range between high confidence and low confidence was termed as the ‘ambiguous’ region. When the feedback was within this region, the BMI decoder updated its weight at a lower rate than when fully confident, which was decided by the confidence. We used two biologically realistic models to generate synthetic data for MI (Izhikevich

  7. Surface electromyography based muscle fatigue detection using high-resolution time-frequency methods and machine learning algorithms.

    Science.gov (United States)

    Karthick, P A; Ghosh, Diptasree Maitra; Ramakrishnan, S

    2018-02-01

    Surface electromyography (sEMG) based muscle fatigue research is widely preferred in sports science and occupational/rehabilitation studies due to its noninvasiveness. However, these signals are complex, multicomponent and highly nonstationary with large inter-subject variations, particularly during dynamic contractions. Hence, time-frequency based machine learning methodologies can improve the design of automated system for these signals. In this work, the analysis based on high-resolution time-frequency methods, namely, Stockwell transform (S-transform), B-distribution (BD) and extended modified B-distribution (EMBD) are proposed to differentiate the dynamic muscle nonfatigue and fatigue conditions. The nonfatigue and fatigue segments of sEMG signals recorded from the biceps brachii of 52 healthy volunteers are preprocessed and subjected to S-transform, BD and EMBD. Twelve features are extracted from each method and prominent features are selected using genetic algorithm (GA) and binary particle swarm optimization (BPSO). Five machine learning algorithms, namely, naïve Bayes, support vector machine (SVM) of polynomial and radial basis kernel, random forest and rotation forests are used for the classification. The results show that all the proposed time-frequency distributions (TFDs) are able to show the nonstationary variations of sEMG signals. Most of the features exhibit statistically significant difference in the muscle fatigue and nonfatigue conditions. The maximum number of features (66%) is reduced by GA and BPSO for EMBD and BD-TFD respectively. The combination of EMBD- polynomial kernel based SVM is found to be most accurate (91% accuracy) in classifying the conditions with the features selected using GA. The proposed methods are found to be capable of handling the nonstationary and multicomponent variations of sEMG signals recorded in dynamic fatiguing contractions. Particularly, the combination of EMBD- polynomial kernel based SVM could be used to

  8. Machine-learning-based calving prediction from activity, lying, and ruminating behaviors in dairy cattle.

    Science.gov (United States)

    Borchers, M R; Chang, Y M; Proudfoot, K L; Wadsworth, B A; Stone, A E; Bewley, J M

    2017-07-01

    The objective of this study was to use automated activity, lying, and rumination monitors to characterize prepartum behavior and predict calving in dairy cattle. Data were collected from 20 primiparous and 33 multiparous Holstein dairy cattle from September 2011 to May 2013 at the University of Kentucky Coldstream Dairy. The HR Tag (SCR Engineers Ltd., Netanya, Israel) automatically collected neck activity and rumination data in 2-h increments. The IceQube (IceRobotics Ltd., South Queensferry, United Kingdom) automatically collected number of steps, lying time, standing time, number of transitions from standing to lying (lying bouts), and total motion, summed in 15-min increments. IceQube data were summed in 2-h increments to match HR Tag data. All behavioral data were collected for 14 d before the predicted calving date. Retrospective data analysis was performed using mixed linear models to examine behavioral changes by day in the 14 d before calving. Bihourly behavioral differences from baseline values over the 14 d before calving were also evaluated using mixed linear models. Changes in daily rumination time, total motion, lying time, and lying bouts occurred in the 14 d before calving. In the bihourly analysis, extreme values for all behaviors occurred in the final 24 h, indicating that the monitored behaviors may be useful in calving prediction. To determine whether technologies were useful at predicting calving, random forest, linear discriminant analysis, and neural network machine-learning techniques were constructed and implemented using R version 3.1.0 (R Foundation for Statistical Computing, Vienna, Austria). These methods were used on variables from each technology and all combined variables from both technologies. A neural network analysis that combined variables from both technologies at the daily level yielded 100.0% sensitivity and 86.8% specificity. A neural network analysis that combined variables from both technologies in bihourly increments was

  9. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    Science.gov (United States)

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-12-01

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  10. An Improved Bacterial-Foraging Optimization-Based Machine Learning Framework for Predicting the Severity of Somatization Disorder

    Directory of Open Access Journals (Sweden)

    Xinen Lv

    2018-02-01

    Full Text Available It is of great clinical significance to establish an accurate intelligent model to diagnose the somatization disorder of community correctional personnel. In this study, a novel machine learning framework is proposed to predict the severity of somatization disorder in community correction personnel. The core of this framework is to adopt the improved bacterial foraging optimization (IBFO to optimize two key parameters (penalty coefficient and the kernel width of a kernel extreme learning machine (KELM and build an IBFO-based KELM (IBFO-KELM for the diagnosis of somatization disorder patients. The main innovation point of the IBFO-KELM model is the introduction of opposition-based learning strategies in traditional bacteria foraging optimization, which increases the diversity of bacterial species, keeps a uniform distribution of individuals of initial population, and improves the convergence rate of the BFO optimization process as well as the probability of escaping from the local optimal solution. In order to verify the effectiveness of the method proposed in this study, a 10-fold cross-validation method based on data from a symptom self-assessment scale (SCL-90 is used to make comparison among IBFO-KELM, BFO-KELM (model based on the original bacterial foraging optimization model, GA-KELM (model based on genetic algorithm, PSO-KELM (model based on particle swarm optimization algorithm and Grid-KELM (model based on grid search method. The experimental results show that the proposed IBFO-KELM prediction model has better performance than other methods in terms of classification accuracy, Matthews correlation coefficient (MCC, sensitivity and specificity. It can distinguish very well between severe somatization disorder and mild somatization and assist the psychological doctor with clinical diagnosis.

  11. Fifth Graders' Learning about Simple Machines through Engineering Design-Based Instruction Using LEGO™ Materials

    Science.gov (United States)

    Marulcu, Ismail; Barnett, Mike

    2013-01-01

    This study is part of a 5-year National Science Foundation-funded project, Transforming Elementary Science Learning Through LEGO™ Engineering Design. In this study, we report on the successes and challenges of implementing an engineering design-based and LEGO™-oriented unit in an urban classroom setting and we focus on the impact of the unit on…

  12. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM...

  13. Machine Learning of Fault Friction

    Science.gov (United States)

    Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.

    2017-12-01

    We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025

  14. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization – Extreme learning machine approach

    International Nuclear Information System (INIS)

    Salcedo-Sanz, S.; Pastor-Sánchez, A.; Prieto, L.; Blanco-Aguilera, A.; García-Herrera, R.

    2014-01-01

    Highlights: • A novel approach for short-term wind speed prediction is presented. • The system is formed by a coral reefs optimization algorithm and an extreme learning machine. • Feature selection is carried out with the CRO to improve the ELM performance. • The method is tested in real wind farm data in USA, for the period 2007–2008. - Abstract: This paper presents a novel approach for short-term wind speed prediction based on a Coral Reefs Optimization algorithm (CRO) and an Extreme Learning Machine (ELM), using meteorological predictive variables from a physical model (the Weather Research and Forecast model, WRF). The approach is based on a Feature Selection Problem (FSP) carried out with the CRO, that must obtain a reduced number of predictive variables out of the total available from the WRF. This set of features will be the input of an ELM, that finally provides the wind speed prediction. The CRO is a novel bio-inspired approach, based on the simulation of reef formation and coral reproduction, able to obtain excellent results in optimization problems. On the other hand, the ELM is a new paradigm in neural networks’ training, that provides a robust and extremely fast training of the network. Together, these algorithms are able to successfully solve this problem of feature selection in short-term wind speed prediction. Experiments in a real wind farm in the USA show the excellent performance of the CRO–ELM approach in this FSP wind speed prediction problem

  15. A Framework for Final Drive Simultaneous Failure Diagnosis Based on Fuzzy Entropy and Sparse Bayesian Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Qing Ye

    2015-01-01

    Full Text Available This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach.

  16. Machine learning-based kinetic modeling: a robust and reproducible solution for quantitative analysis of dynamic PET data.

    Science.gov (United States)

    Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia

    2017-05-07

    A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.

  17. Machine learning-based kinetic modeling: a robust and reproducible solution for quantitative analysis of dynamic PET data

    Science.gov (United States)

    Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia

    2017-05-01

    A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.

  18. Diagnosis of Alzheimer’s Disease Based on Structural MRI Images Using a Regularized Extreme Learning Machine and PCA Features

    Directory of Open Access Journals (Sweden)

    Ramesh Kumar Lama

    2017-01-01

    Full Text Available Alzheimer’s disease (AD is a progressive, neurodegenerative brain disorder that attacks neurotransmitters, brain cells, and nerves, affecting brain functions, memory, and behaviors and then finally causing dementia on elderly people. Despite its significance, there is currently no cure for it. However, there are medicines available on prescription that can help delay the progress of the condition. Thus, early diagnosis of AD is essential for patient care and relevant researches. Major challenges in proper diagnosis of AD using existing classification schemes are the availability of a smaller number of training samples and the larger number of possible feature representations. In this paper, we present and compare AD diagnosis approaches using structural magnetic resonance (sMR images to discriminate AD, mild cognitive impairment (MCI, and healthy control (HC subjects using a support vector machine (SVM, an import vector machine (IVM, and a regularized extreme learning machine (RELM. The greedy score-based feature selection technique is employed to select important feature vectors. In addition, a kernel-based discriminative approach is adopted to deal with complex data distributions. We compare the performance of these classifiers for volumetric sMR image data from Alzheimer’s disease neuroimaging initiative (ADNI datasets. Experiments on the ADNI datasets showed that RELM with the feature selection approach can significantly improve classification accuracy of AD from MCI and HC subjects.

  19. Leveraging Machine Learning to Estimate Soil Salinity through Satellite-Based Remote Sensing

    Science.gov (United States)

    Welle, P.; Ravanbakhsh, S.; Póczos, B.; Mauter, M.

    2016-12-01

    Human-induced salinization of agricultural soils is a growing problem which now affects an estimated 76 million hectares and causes billions of dollars of lost agricultural revenues annually. While there are indications that soil salinization is increasing in extent, current assessments of global salinity levels are outdated and rely heavily on expert opinion due to the prohibitive cost of a worldwide sampling campaign. A more practical alternative to field sampling may be earth observation through remote sensing, which takes advantage of the distinct spectral signature of salts in order to estimate soil conductivity. Recent efforts to map salinity using remote sensing have been met with limited success due to tractability issues of managing the computational load associated with large amounts of satellite data. In this study, we use Google Earth Engine to create composite satellite soil datasets, which combine data from multiple sources and sensors. These composite datasets contain pixel-level surface reflectance values for dates in which the algorithm is most confident that the surface contains bare soil. We leverage the detailed soil maps created and updated by the United States Geological Survey as label data and apply machine learning regression techniques such as Gaussian processes to learn a smooth mapping from surface reflection to noisy estimates of salinity. We also explore a semi-supervised approach using deep generative convolutional networks to leverage the abundance of unlabeled satellite images in producing better estimates for salinity values where we have relatively fewer measurements across the globe. The general method results in two significant contributions: (1) an algorithm that can be used to predict levels of soil salinity in regions without detailed soil maps and (2) a general framework that serves as an example for how remote sensing can be paired with extensive label data to generate methods for prediction of physical phenomenon.

  20. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior......Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...

  1. Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment.

    Science.gov (United States)

    Li, Yang; Li, Guoqing; Wang, Zhenhao

    2015-01-01

    In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.

  2. A Pathological Brain Detection System based on Extreme Learning Machine Optimized by Bat Algorithm.

    Science.gov (United States)

    Lu, Siyuan; Qiu, Xin; Shi, Jianping; Li, Na; Lu, Zhi-Hai; Chen, Peng; Yang, Meng-Meng; Liu, Fang-Yuan; Jia, Wen-Juan; Zhang, Yudong

    2017-01-01

    It is beneficial to classify brain images as healthy or pathological automatically, because 3D brain images can generate so much information which is time consuming and tedious for manual analysis. Among various 3D brain imaging techniques, magnetic resonance (MR) imaging is the most suitable for brain, and it is now widely applied in hospitals, because it is helpful in the four ways of diagnosis, prognosis, pre-surgical, and postsurgical procedures. There are automatic detection methods; however they suffer from low accuracy. Therefore, we proposed a novel approach which employed 2D discrete wavelet transform (DWT), and calculated the entropies of the subbands as features. Then, a bat algorithm optimized extreme learning machine (BA-ELM) was trained to identify pathological brains from healthy controls. A 10x10-fold cross validation was performed to evaluate the out-of-sample performance. The method achieved a sensitivity of 99.04%, a specificity of 93.89%, and an overall accuracy of 98.33% over 132 MR brain images. The experimental results suggest that the proposed approach is accurate and robust in pathological brain detection. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  3. Remote sensing-based measurement of Living Environment Deprivation: Improving classical approaches with machine learning.

    Directory of Open Access Journals (Sweden)

    Daniel Arribas-Bel

    Full Text Available This paper provides evidence on the usefulness of very high spatial resolution (VHR imagery in gathering socioeconomic information in urban settlements. We use land cover, spectral, structure and texture features extracted from a Google Earth image of Liverpool (UK to evaluate their potential to predict Living Environment Deprivation at a small statistical area level. We also contribute to the methodological literature on the estimation of socioeconomic indices with remote-sensing data by introducing elements from modern machine learning. In addition to classical approaches such as Ordinary Least Squares (OLS regression and a spatial lag model, we explore the potential of the Gradient Boost Regressor and Random Forests to improve predictive performance and accuracy. In addition to novel predicting methods, we also introduce tools for model interpretation and evaluation such as feature importance and partial dependence plots, or cross-validation. Our results show that Random Forest proved to be the best model with an R2 of around 0.54, followed by Gradient Boost Regressor with 0.5. Both the spatial lag model and the OLS fall behind with significantly lower performances of 0.43 and 0.3, respectively.

  4. Extreme learning machine based optimal embedding location finder for image steganography.

    Directory of Open Access Journals (Sweden)

    Hayfaa Abdulzahra Atee

    Full Text Available In image steganography, determining the optimum location for embedding the secret message precisely with minimum distortion of the host medium remains a challenging issue. Yet, an effective approach for the selection of the best embedding location with least deformation is far from being achieved. To attain this goal, we propose a novel approach for image steganography with high-performance, where extreme learning machine (ELM algorithm is modified to create a supervised mathematical model. This ELM is first trained on a part of an image or any host medium before being tested in the regression mode. This allowed us to choose the optimal location for embedding the message with best values of the predicted evaluation metrics. Contrast, homogeneity, and other texture features are used for training on a new metric. Furthermore, the developed ELM is exploited for counter over-fitting while training. The performance of the proposed steganography approach is evaluated by computing the correlation, structural similarity (SSIM index, fusion matrices, and mean square error (MSE. The modified ELM is found to outperform the existing approaches in terms of imperceptibility. Excellent features of the experimental results demonstrate that the proposed steganographic approach is greatly proficient for preserving the visual information of an image. An improvement in the imperceptibility as much as 28% is achieved compared to the existing state of the art methods.

  5. Subgrid-scale scalar flux modelling based on optimal estimation theory and machine-learning procedures

    Science.gov (United States)

    Vollant, A.; Balarac, G.; Corre, C.

    2017-09-01

    New procedures are explored for the development of models in the context of large eddy simulation (LES) of a passive scalar. They rely on the combination of the optimal estimator theory with machine-learning algorithms. The concept of optimal estimator allows to identify the most accurate set of parameters to be used when deriving a model. The model itself can then be defined by training an artificial neural network (ANN) on a database derived from the filtering of direct numerical simulation (DNS) results. This procedure leads to a subgrid scale model displaying good structural performance, which allows to perform LESs very close to the filtered DNS results. However, this first procedure does not control the functional performance so that the model can fail when the flow configuration differs from the training database. Another procedure is then proposed, where the model functional form is imposed and the ANN used only to define the model coefficients. The training step is a bi-objective optimisation in order to control both structural and functional performances. The model derived from this second procedure proves to be more robust. It also provides stable LESs for a turbulent plane jet flow configuration very far from the training database but over-estimates the mixing process in that case.

  6. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning.

    Science.gov (United States)

    Dozmorov, Mikhail G

    2017-10-15

    One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. mikhail.dozmorov@vcuhealth.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Machine-learning-based Brokers for Real-time Classification of the LSST Alert Stream

    Science.gov (United States)

    Narayan, Gautham; Zaidi, Tayeb; Soraisam, Monika D.; Wang, Zhe; Lochner, Michelle; Matheson, Thomas; Saha, Abhijit; Yang, Shuo; Zhao, Zhenge; Kececioglu, John; Scheidegger, Carlos; Snodgrass, Richard T.; Axelrod, Tim; Jenness, Tim; Maier, Robert S.; Ridgway, Stephen T.; Seaman, Robert L.; Evans, Eric Michael; Singh, Navdeep; Taylor, Clark; Toeniskoetter, Jackson; Welch, Eric; Zhu, Songzhe; The ANTARES Collaboration

    2018-05-01

    The unprecedented volume and rate of transient events that will be discovered by the Large Synoptic Survey Telescope (LSST) demand that the astronomical community update its follow-up paradigm. Alert-brokers—automated software system to sift through, characterize, annotate, and prioritize events for follow-up—will be critical tools for managing alert streams in the LSST era. The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is one such broker. In this work, we develop a machine learning pipeline to characterize and classify variable and transient sources only using the available multiband optical photometry. We describe three illustrative stages of the pipeline, serving the three goals of early, intermediate, and retrospective classification of alerts. The first takes the form of variable versus transient categorization, the second a multiclass typing of the combined variable and transient data set, and the third a purity-driven subtyping of a transient class. Although several similar algorithms have proven themselves in simulations, we validate their performance on real observations for the first time. We quantitatively evaluate our pipeline on sparse, unevenly sampled, heteroskedastic data from various existing observational campaigns, and demonstrate very competitive classification performance. We describe our progress toward adapting the pipeline developed in this work into a real-time broker working on live alert streams from time-domain surveys.

  8. Color Face Recognition Based on Steerable Pyramid Transform and Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Ayşegül Uçar

    2014-01-01

    Full Text Available This paper presents a novel color face recognition algorithm by means of fusing color and local information. The proposed algorithm fuses the multiple features derived from different color spaces. Multiorientation and multiscale information relating to the color face features are extracted by applying Steerable Pyramid Transform (SPT to the local face regions. In this paper, the new three hybrid color spaces, YSCr, ZnSCr, and BnSCr, are firstly constructed using the Cb and Cr component images of the YCbCr color space, the S color component of the HSV color spaces, and the Zn and Bn color components of the normalized XYZ color space. Secondly, the color component face images are partitioned into the local patches. Thirdly, SPT is applied to local face regions and some statistical features are extracted. Fourthly, all features are fused according to decision fusion frame and the combinations of Extreme Learning Machines classifiers are applied to achieve color face recognition with fast and high correctness. The experiments show that the proposed Local Color Steerable Pyramid Transform (LCSPT face recognition algorithm improves seriously face recognition performance by using the new color spaces compared to the conventional and some hybrid ones. Furthermore, it achieves faster recognition compared with state-of-the-art studies.

  9. Machine Learning-based Transient Brokers for Real-time Classification of the LSST Alert Stream

    Science.gov (United States)

    Narayan, Gautham; Zaidi, Tayeb; Soraisam, Monika; ANTARES Collaboration

    2018-01-01

    The number of transient events discovered by wide-field time-domain surveys already far outstrips the combined followup resources of the astronomical community. This number will only increase as we progress towards the commissioning of the Large Synoptic Survey Telescope (LSST), breaking the community's current followup paradigm. Transient brokers - software to sift through, characterize, annotate and prioritize events for followup - will be a critical tool for managing alert streams in the LSST era. Developing the algorithms that underlie the brokers, and obtaining simulated LSST-like datasets prior to LSST commissioning, to train and test these algorithms are formidable, though not insurmountable challenges. The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is a joint project of the National Optical Astronomy Observatory and the Department of Computer Science at the University of Arizona. We have been developing completely automated methods to characterize and classify variable and transient events from their multiband optical photometry. We describe the hierarchical ensemble machine learning algorithm we are developing, and test its performance on sparse, unevenly sampled, heteroskedastic data from various existing observational campaigns, as well as our progress towards incorporating these into a real-time event broker working on live alert streams from time-domain surveys.

  10. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  11. Deep machine learning provides state-of-the-art performance in image-based plant phenotyping.

    Science.gov (United States)

    Pound, Michael P; Atkinson, Jonathan A; Townsend, Alexandra J; Wilson, Michael H; Griffiths, Marcus; Jackson, Aaron S; Bulat, Adrian; Tzimiropoulos, Georgios; Wells, Darren M; Murchie, Erik H; Pridmore, Tony P; French, Andrew P

    2017-10-01

    In plant phenotyping, it has become important to be able to measure many features on large image sets in order to aid genetic discovery. The size of the datasets, now often captured robotically, often precludes manual inspection, hence the motivation for finding a fully automated approach. Deep learning is an emerging field that promises unparalleled results on many data analysis problems. Building on artificial neural networks, deep approaches have many more hidden layers in the network, and hence have greater discriminative and predictive power. We demonstrate the use of such approaches as part of a plant phenotyping pipeline. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping and demonstrate state-of-the-art results (>97% accuracy) for root and shoot feature identification and localization. We use fully automated trait identification using deep learning to identify quantitative trait loci in root architecture datasets. The majority (12 out of 14) of manually identified quantitative trait loci were also discovered using our automated approach based on deep learning detection to locate plant features. We have shown deep learning-based phenotyping to have very good detection and localization accuracy in validation and testing image sets. We have shown that such features can be used to derive meaningful biological traits, which in turn can be used in quantitative trait loci discovery pipelines. This process can be completely automated. We predict a paradigm shift in image-based phenotyping bought about by such deep learning approaches, given sufficient training sets. © The Authors 2017. Published by Oxford University Press.

  12. Intelligent fault diagnosis of photovoltaic arrays based on optimized kernel extreme learning machine and I-V characteristics

    International Nuclear Information System (INIS)

    Chen, Zhicong; Wu, Lijun; Cheng, Shuying; Lin, Peijie; Wu, Yue; Lin, Wencheng

    2017-01-01

    Highlights: •An improved Simulink based modeling method is proposed for PV modules and arrays. •Key points of I-V curves and PV model parameters are used as the feature variables. •Kernel extreme learning machine (KELM) is explored for PV arrays fault diagnosis. •The parameters of KELM algorithm are optimized by the Nelder-Mead simplex method. •The optimized KELM fault diagnosis model achieves high accuracy and reliability. -- Abstract: Fault diagnosis of photovoltaic (PV) arrays is important for improving the reliability, efficiency and safety of PV power stations, because the PV arrays usually operate in harsh outdoor environment and tend to suffer various faults. Due to the nonlinear output characteristics and varying operating environment of PV arrays, many machine learning based fault diagnosis methods have been proposed. However, there still exist some issues: fault diagnosis performance is still limited due to insufficient monitored information; fault diagnosis models are not efficient to be trained and updated; labeled fault data samples are hard to obtain by field experiments. To address these issues, this paper makes contribution in the following three aspects: (1) based on the key points and model parameters extracted from monitored I-V characteristic curves and environment condition, an effective and efficient feature vector of seven dimensions is proposed as the input of the fault diagnosis model; (2) the emerging kernel based extreme learning machine (KELM), which features extremely fast learning speed and good generalization performance, is utilized to automatically establish the fault diagnosis model. Moreover, the Nelder-Mead Simplex (NMS) optimization method is employed to optimize the KELM parameters which affect the classification performance; (3) an improved accurate Simulink based PV modeling approach is proposed for a laboratory PV array to facilitate the fault simulation and data sample acquisition. Intensive fault experiments are

  13. A Machine Learning Concept for DTN Routing

    Science.gov (United States)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  14. Machine learning-, rule- and pharmacophore-based classification on the inhibition of P-glycoprotein and NorA.

    Science.gov (United States)

    Ngo, T-D; Tran, T-D; Le, M-T; Thai, K-M

    2016-09-01

    The efflux pumps P-glycoprotein (P-gp) in humans and NorA in Staphylococcus aureus are of great interest for medicinal chemists because of their important roles in multidrug resistance (MDR). The high polyspecificity as well as the unavailability of high-resolution X-ray crystal structures of these transmembrane proteins lead us to combining ligand-based approaches, which in the case of this study were machine learning, perceptual mapping and pharmacophore modelling. For P-gp inhibitory activity, individual models were developed using different machine learning algorithms and subsequently combined into an ensemble model which showed a good discrimination between inhibitors and noninhibitors (acctrain-diverse = 84%; accinternal-test = 92% and accexternal-test = 100%). For ligand promiscuity between P-gp and NorA, perceptual maps and pharmacophore models were generated for the detection of rules and features. Based on these in silico tools, hit compounds for reversing MDR were discovered from the in-house and DrugBank databases through virtual screening in an attempt to restore drug sensitivity in cancer cells and bacteria.

  15. Metabolic Profiling and Classification of Propolis Samples from Southern Brazil: An NMR-Based Platform Coupled with Machine Learning.

    Science.gov (United States)

    Maraschin, Marcelo; Somensi-Zeggio, Amélia; Oliveira, Simone K; Kuhnen, Shirley; Tomazzoli, Maíra M; Raguzzoni, Josiane C; Zeri, Ana C M; Carreira, Rafael; Correia, Sara; Costa, Christopher; Rocha, Miguel

    2016-01-22

    The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching ∼90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.

  16. Detection of needle to nerve contact based on electric bioimpedance and machine learning methods.

    Science.gov (United States)

    Kalvoy, Havard; Tronstad, Christian; Ullensvang, Kyrre; Steinfeldt, Thorsten; Sauter, Axel R

    2017-07-01

    In an ongoing project for electrical impedance-based needle guidance we have previously showed in an animal model that intraneural needle positions can be detected with bioimpedance measurement. To enhance the power of this method we in this study have investigated whether an early detection of the needle only touching the nerve also is feasible. Measurement of complex impedance during needle to nerve contact was compared with needle positions in surrounding tissues in a volunteer study on 32 subjects. Classification analysis using Support-Vector Machines demonstrated that discrimination is possible, but that the sensitivity and specificity for the nerve touch algorithm not is at the same level of performance as for intra-neuralintraneural detection.

  17. Machine-Learning Based Channel Quality and Stability Estimation for Stream-Based Multichannel Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Waqas Rehan

    2016-09-01

    Full Text Available Wireless sensor networks (WSNs have become more and more diversified and are today able to also support high data rate applications, such as multimedia. In this case, per-packet channel handshaking/switching may result in inducing additional overheads, such as energy consumption, delays and, therefore, data loss. One of the solutions is to perform stream-based channel allocation where channel handshaking is performed once before transmitting the whole data stream. Deciding stream-based channel allocation is more critical in case of multichannel WSNs where channels of different quality/stability are available and the wish for high performance requires sensor nodes to switch to the best among the available channels. In this work, we will focus on devising mechanisms that perform channel quality/stability estimation in order to improve the accommodation of stream-based communication in multichannel wireless sensor networks. For performing channel quality assessment, we have formulated a composite metric, which we call channel rank measurement (CRM, that can demarcate channels into good, intermediate and bad quality on the basis of the standard deviation of the received signal strength indicator (RSSI and the average of the link quality indicator (LQI of the received packets. CRM is then used to generate a data set for training a supervised machine learning-based algorithm (which we call Normal Equation based Channel quality prediction (NEC algorithm in such a way that it may perform instantaneous channel rank estimation of any channel. Subsequently, two robust extensions of the NEC algorithm are proposed (which we call Normal Equation based Weighted Moving Average Channel quality prediction (NEWMAC algorithm and Normal Equation based Aggregate Maturity Criteria with Beta Tracking based Channel weight prediction (NEAMCBTC algorithm, that can perform channel quality estimation on the basis of both current and past values of channel rank estimation

  18. Machine-Learning Based Channel Quality and Stability Estimation for Stream-Based Multichannel Wireless Sensor Networks.

    Science.gov (United States)

    Rehan, Waqas; Fischer, Stefan; Rehan, Maaz

    2016-09-12

    Wireless sensor networks (WSNs) have become more and more diversified and are today able to also support high data rate applications, such as multimedia. In this case, per-packet channel handshaking/switching may result in inducing additional overheads, such as energy consumption, delays and, therefore, data loss. One of the solutions is to perform stream-based channel allocation where channel handshaking is performed once before transmitting the whole data stream. Deciding stream-based channel allocation is more critical in case of multichannel WSNs where channels of different quality/stability are available and the wish for high performance requires sensor nodes to switch to the best among the available channels. In this work, we will focus on devising mechanisms that perform channel quality/stability estimation in order to improve the accommodation of stream-based communication in multichannel wireless sensor networks. For performing channel quality assessment, we have formulated a composite metric, which we call channel rank measurement (CRM), that can demarcate channels into good, intermediate and bad quality on the basis of the standard deviation of the received signal strength indicator (RSSI) and the average of the link quality indicator (LQI) of the received packets. CRM is then used to generate a data set for training a supervised machine learning-based algorithm (which we call Normal Equation based Channel quality prediction (NEC) algorithm) in such a way that it may perform instantaneous channel rank estimation of any channel. Subsequently, two robust extensions of the NEC algorithm are proposed (which we call Normal Equation based Weighted Moving Average Channel quality prediction (NEWMAC) algorithm and Normal Equation based Aggregate Maturity Criteria with Beta Tracking based Channel weight prediction (NEAMCBTC) algorithm), that can perform channel quality estimation on the basis of both current and past values of channel rank estimation. In the end

  19. Machine learning in jet physics

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    High energy collider experiments produce several petabytes of data every year. Given the magnitude and complexity of the raw data, machine learning algorithms provide the best available platform to transform and analyse these data to obtain valuable insights to understand Standard Model and Beyond Standard Model theories. These collider experiments produce both quark and gluon initiated hadronic jets as the core components. Deep learning techniques enable us to classify quark/gluon jets through image recognition and help us to differentiate signals and backgrounds in Beyond Standard Model searches at LHC. We are currently working on quark/gluon jet classification and progressing in our studies to find the bias between event generators using domain adversarial neural networks (DANN). We also plan to investigate top tagging, weak supervision on mixed samples in high energy physics, utilizing transfer learning from simulated data to real experimental data.

  20. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

    Science.gov (United States)

    2013-01-01

    Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313

  1. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  2. BELM: Bayesian extreme learning machine.

    Science.gov (United States)

    Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J

    2011-03-01

    The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.

  3. Machine learning analysis of binaural rowing sounds

    DEFF Research Database (Denmark)

    Johard, Leonard; Ruffaldi, Emanuele; Hoffmann, Pablo F.

    2011-01-01

    Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition metho...... methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train.......Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition...

  4. An Extreme Learning Machine-Based Neuromorphic Tactile Sensing System for Texture Recognition.

    Science.gov (United States)

    Rasouli, Mahdi; Chen, Yi; Basu, Arindam; Kukreja, Sunil L; Thakor, Nitish V

    2018-04-01

    Despite significant advances in computational algorithms and development of tactile sensors, artificial tactile sensing is strikingly less efficient and capable than the human tactile perception. Inspired by efficiency of biological systems, we aim to develop a neuromorphic system for tactile pattern recognition. We particularly target texture recognition as it is one of the most necessary and challenging tasks for artificial sensory systems. Our system consists of a piezoresistive fabric material as the sensor to emulate skin, an interface that produces spike patterns to mimic neural signals from mechanoreceptors, and an extreme learning machine (ELM) chip to analyze spiking activity. Benefiting from intrinsic advantages of biologically inspired event-driven systems and massively parallel and energy-efficient processing capabilities of the ELM chip, the proposed architecture offers a fast and energy-efficient alternative for processing tactile information. Moreover, it provides the opportunity for the development of low-cost tactile modules for large-area applications by integration of sensors and processing circuits. We demonstrate the recognition capability of our system in a texture discrimination task, where it achieves a classification accuracy of 92% for categorization of ten graded textures. Our results confirm that there exists a tradeoff between response time and classification accuracy (and information transfer rate). A faster decision can be achieved at early time steps or by using a shorter time window. This, however, results in deterioration of the classification accuracy and information transfer rate. We further observe that there exists a tradeoff between the classification accuracy and the input spike rate (and thus energy consumption). Our work substantiates the importance of development of efficient sparse codes for encoding sensory data to improve the energy efficiency. These results have a significance for a wide range of wearable, robotic

  5. Machine-learning-based real-bogus system for the HSC-SSP moving object detection pipeline

    Science.gov (United States)

    Lin, Hsing-Wen; Chen, Ying-Tung; Wang, Jen-Hung; Wang, Shiang-Yu; Yoshida, Fumi; Ip, Wing-Huen; Miyazaki, Satoshi; Terai, Tsuyoshi

    2018-01-01

    Machine-learning techniques are widely applied in many modern optical sky surveys, e.g., Pan-STARRS1, PTF/iPTF, and the Subaru/Hyper Suprime-Cam survey, to reduce human intervention in data verification. In this study, we have established a machine-learning-based real-bogus system to reject false detections in the Subaru/Hyper-Suprime-Cam Strategic Survey Program (HSC-SSP) source catalog. Therefore, the HSC-SSP moving object detection pipeline can operate more effectively due to the reduction of false positives. To train the real-bogus system, we use stationary sources as the real training set and "flagged" data as the bogus set. The training set contains 47 features, most of which are photometric measurements and shape moments generated from the HSC image reduction pipeline (hscPipe). Our system can reach a true positive rate (tpr) ˜96% with a false positive rate (fpr) ˜1% or tpr ˜99% at fpr ˜5%. Therefore, we conclude that stationary sources are decent real training samples, and using photometry measurements and shape moments can reject false positives effectively.

  6. Archetypal Analysis for Machine Learning

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai

    2010-01-01

    Archetypal analysis (AA) proposed by Cutler and Breiman in [1] estimates the principal convex hull of a data set. As such AA favors features that constitute representative ’corners’ of the data, i.e. distinct aspects or archetypes. We will show that AA enjoys the interpretability of clustering - ...... for K-means [2]. We demonstrate that the AA model is relevant for feature extraction and dimensional reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, text mining and collaborative filtering....

  7. Classification of EEG signals using a genetic-based machine learning classifier.

    Science.gov (United States)

    Skinner, B T; Nguyen, H T; Liu, D K

    2007-01-01

    This paper investigates the efficacy of the genetic-based learning classifier system XCS, for the classification of noisy, artefact-inclusive human electroencephalogram (EEG) signals represented using large condition strings (108bits). EEG signals from three participants were recorded while they performed four mental tasks designed to elicit hemispheric responses. Autoregressive (AR) models and Fast Fourier Transform (FFT) methods were used to form feature vectors with which mental tasks can be discriminated. XCS achieved a maximum classification accuracy of 99.3% and a best average of 88.9%. The relative classification performance of XCS was then compared against four non-evolutionary classifier systems originating from different learning techniques. The experimental results will be used as part of our larger research effort investigating the feasibility of using EEG signals as an interface to allow paralysed persons to control a powered wheelchair or other devices.

  8. Impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines

    Science.gov (United States)

    Marulcu, Ismail; Barnett, Michael

    2016-01-01

    Background: Elementary Science Education is struggling with multiple challenges. National and State test results confirm the need for deeper understanding in elementary science education. Moreover, national policy statements and researchers call for increased exposure to engineering and technology in elementary science education. The basic motivation of this study is to suggest a solution to both improving elementary science education and increasing exposure to engineering and technology in it. Purpose/Hypothesis: This mixed-method study examined the impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines. We hypothesize that the LEGO-engineering design unit is as successful as the inquiry-based unit in terms of students' science content learning of simple machines. Design/Method: We used a mixed-methods approach to investigate our research questions; we compared the control and the experimental groups' scores from the tests and interviews by using Analysis of Covariance (ANCOVA) and compared each group's pre- and post-scores by using paired t-tests. Results: Our findings from the paired t-tests show that both the experimental and comparison groups significantly improved their scores from the pre-test to post-test on the multiple-choice, open-ended, and interview items. Moreover, ANCOVA results show that students in the experimental group, who learned simple machines with the design-based unit, performed significantly better on the interview questions. Conclusions: Our analyses revealed that the design-based Design a people mover: Simple machines unit was, if not better, as successful as the inquiry-based FOSS Levers and pulleys unit in terms of students' science content learning.

  9. Machine learning-based assessment tool for imbalance and vestibular dysfunction with virtual reality rehabilitation system.

    Science.gov (United States)

    Yeh, Shih-Ching; Huang, Ming-Chun; Wang, Pa-Chun; Fang, Te-Yung; Su, Mu-Chun; Tsai, Po-Yi; Rizzo, Albert

    2014-10-01

    Dizziness is a major consequence of imbalance and vestibular dysfunction. Compared to surgery and drug treatments, balance training is non-invasive and more desired. However, training exercises are usually tedious and the assessment tool is insufficient to diagnose patient's severity rapidly. An interactive virtual reality (VR) game-based rehabilitation program that adopted Cawthorne-Cooksey exercises, and a sensor-based measuring system were introduced. To verify the therapeutic effect, a clinical experiment with 48 patients and 36 normal subjects was conducted. Quantified balance indices were measured and analyzed by statistical tools and a Support Vector Machine (SVM) classifier. In terms of balance indices, patients who completed the training process are progressed and the difference between normal subjects and patients is obvious. Further analysis by SVM classifier show that the accuracy of recognizing the differences between patients and normal subject is feasible, and these results can be used to evaluate patients' severity and make rapid assessment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  11. Using Machine Learning to Predict Student Performance

    OpenAIRE

    Pojon, Murat

    2017-01-01

    This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance. Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification ...

  12. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  13. Structural Damage Detection using Frequency Response Function Index and Surrogate Model Based on Optimized Extreme Learning Machine Algorithm

    Directory of Open Access Journals (Sweden)

    R. Ghiasi

    2017-09-01

    Full Text Available Utilizing surrogate models based on artificial intelligence methods for detecting structural damages has attracted the attention of many researchers in recent decades. In this study, a new kernel based on Littlewood-Paley Wavelet (LPW is proposed for Extreme Learning Machine (ELM algorithm to improve the accuracy of detecting multiple damages in structural systems.  ELM is used as metamodel (surrogate model of exact finite element analysis of structures in order to efficiently reduce the computational cost through updating process. In the proposed two-step method, first a damage index, based on Frequency Response Function (FRF of the structure, is used to identify the location of damages. In the second step, the severity of damages in identified elements is detected using ELM. In order to evaluate the efficacy of ELM, the results obtained from the proposed kernel were compared with other kernels proposed for ELM as well as Least Square Support Vector Machine algorithm. The solved numerical problems indicated that ELM algorithm accuracy in detecting structural damages is increased drastically in case of using LPW kernel.

  14. Trends in Machine Learning for Signal Processing

    DEFF Research Database (Denmark)

    Adali, Tulay; Miller, David J.; Diamantaras, Konstantinos I.

    2011-01-01

    By putting the accent on learning from the data and the environment, the Machine Learning for SP (MLSP) Technical Committee (TC) provides the essential bridge between the machine learning and SP communities. While the emphasis in MLSP is on learning and data-driven approaches, SP defines the main...... applications of interest, and thus the constraints and requirements on solutions, which include computational efficiency, online adaptation, and learning with limited supervision/reference data....

  15. Two-Dimensional Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Bo Jia

    2015-01-01

    (BP networks. However, like many other methods, ELM is originally proposed to handle vector pattern while nonvector patterns in real applications need to be explored, such as image data. We propose the two-dimensional extreme learning machine (2DELM based on the very natural idea to deal with matrix data directly. Unlike original ELM which handles vectors, 2DELM take the matrices as input features without vectorization. Empirical studies on several real image datasets show the efficiency and effectiveness of the algorithm.

  16. Network anomaly detection a machine learning perspective

    CERN Document Server

    Bhattacharyya, Dhruba Kumar

    2013-01-01

    With the rapid rise in the ubiquity and sophistication of Internet technology and the accompanying growth in the number of network attacks, network intrusion detection has become increasingly important. Anomaly-based network intrusion detection refers to finding exceptional or nonconforming patterns in network traffic data compared to normal behavior. Finding these anomalies has extensive applications in areas such as cyber security, credit card and insurance fraud detection, and military surveillance for enemy activities. Network Anomaly Detection: A Machine Learning Perspective presents mach

  17. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  18. The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

    Directory of Open Access Journals (Sweden)

    Reza Samizade

    2018-06-01

    Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

  19. Multipolar electrostatics based on the Kriging machine learning method: an application to serine.

    Science.gov (United States)

    Yuan, Yongna; Mills, Matthew J L; Popelier, Paul L A

    2014-04-01

    A multipolar, polarizable electrostatic method for future use in a novel force field is described. Quantum Chemical Topology (QCT) is used to partition the electron density of a chemical system into atoms, then the machine learning method Kriging is used to build models that relate the multipole moments of the atoms to the positions of their surrounding nuclei. The pilot system serine is used to study both the influence of the level of theory and the set of data generator methods used. The latter consists of: (i) sampling of protein structures deposited in the Protein Data Bank (PDB), or (ii) normal mode distortion along either (a) Cartesian coordinates, or (b) redundant internal coordinates. Wavefunctions for the sampled geometries were obtained at the HF/6-31G(d,p), B3LYP/apc-1, and MP2/cc-pVDZ levels of theory, prior to calculation of the atomic multipole moments by volume integration. The average absolute error (over an independent test set of conformations) in the total atom-atom electrostatic interaction energy of serine, using Kriging models built with the three data generator methods is 11.3 kJ mol⁻¹ (PDB), 8.2 kJ mol⁻¹ (Cartesian distortion), and 10.1 kJ mol⁻¹ (redundant internal distortion) at the HF/6-31G(d,p) level. At the B3LYP/apc-1 level, the respective errors are 7.7 kJ mol⁻¹, 6.7 kJ mol⁻¹, and 4.9 kJ mol⁻¹, while at the MP2/cc-pVDZ level they are 6.5 kJ mol⁻¹, 5.3 kJ mol⁻¹, and 4.0 kJ mol⁻¹. The ranges of geometries generated by the redundant internal coordinate distortion and by extraction from the PDB are much wider than the range generated by Cartesian distortion. The atomic multipole moment and electrostatic interaction energy predictions for the B3LYP/apc-1 and MP2/cc-pVDZ levels are similar, and both are better than the corresponding predictions at the HF/6-31G(d,p) level.

  20. Deep machine learning based Image classification in hard disk drive manufacturing (Conference Presentation)

    Science.gov (United States)

    Rana, Narender; Chien, Chester

    2018-03-01

    A key sensor element in a Hard Disk Drive (HDD) is the read-write head device. The device is complex 3D shape and its fabrication requires over thousand process steps with many of them being various types of image inspection and critical dimension (CD) metrology steps. In order to have high yield of devices across a wafer, very tight inspection and metrology specifications are implemented. Many images are collected on a wafer and inspected for various types of defects and in CD metrology the quality of image impacts the CD measurements. Metrology noise need to be minimized in CD metrology to get better estimate of the process related variations for implementing robust process controls. Though there are specialized tools available for defect inspection and review allowing classification and statistics. However, due to unavailability of such advanced tools or other reasons, many times images need to be manually inspected. SEM Image inspection and CD-SEM metrology tools are different tools differing in software as well. SEM Image inspection and CD-SEM metrology tools are separate tools differing in software and purpose. There have been cases where a significant numbers of CD-SEM images are blurred or have some artefact and there is a need for image inspection along with the CD measurement. Tool may not report a practical metric highlighting the quality of image. Not filtering CD from these blurred images will add metrology noise to the CD measurement. An image classifier can be helpful here for filtering such data. This paper presents the use of artificial intelligence in classifying the SEM images. Deep machine learning is used to train a neural network which is then used to classify the new images as blurred and not blurred. Figure 1 shows the image blur artefact and contingency table of classification results from the trained deep neural network. Prediction accuracy of 94.9 % was achieved in the first model. Paper covers other such applications of the deep neural

  1. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  2. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    International Nuclear Information System (INIS)

    Kanemoto, Shigeru; Watanabe, Masaya; Yusa, Noritaka

    2014-01-01

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology

  3. A Comparative Experimental Study on the Use of Machine Learning Approaches for Automated Valve Monitoring Based on Acoustic Emission Parameters

    Science.gov (United States)

    Ali, Salah M.; Hui, K. H.; Hee, L. M.; Salman Leong, M.; Al-Obaidi, M. A.; Ali, Y. H.; Abdelrhman, Ahmed M.

    2018-03-01

    Acoustic emission (AE) analysis has become a vital tool for initiating the maintenance tasks in many industries. However, the analysis process and interpretation has been found to be highly dependent on the experts. Therefore, an automated monitoring method would be required to reduce the cost and time consumed in the interpretation of AE signal. This paper investigates the application of two of the most common machine learning approaches namely artificial neural network (ANN) and support vector machine (SVM) to automate the diagnosis of valve faults in reciprocating compressor based on AE signal parameters. Since the accuracy is an essential factor in any automated diagnostic system, this paper also provides a comparative study based on predictive performance of ANN and SVM. AE parameters data was acquired from single stage reciprocating air compressor with different operational and valve conditions. ANN and SVM diagnosis models were subsequently devised by combining AE parameters of different conditions. Results demonstrate that ANN and SVM models have the same results in term of prediction accuracy. However, SVM model is recommended to automate diagnose the valve condition in due to the ability of handling a high number of input features with low sampling data sets.

  4. Monitoring mangrove biomass change in Vietnam using SPOT images and an object-based approach combined with machine learning algorithms

    Science.gov (United States)

    Pham, Lien T. H.; Brabyn, Lars

    2017-06-01

    Mangrove forests are well-known for their provision of ecosystem services and capacity to reduce carbon dioxide concentrations in the atmosphere. Mapping and quantifying mangrove biomass is useful for the effective management of these forests and maximizing their ecosystem service performance. The objectives of this research were to model, map, and analyse the biomass change between 2000 and 2011 of mangrove forests in the Cangio region in Vietnam. SPOT 4 and 5 images were used in conjunction with object-based image analysis and machine learning algorithms. The study area included natural and planted mangroves of diverse species. After image preparation, three different mangrove associations were identified using two levels of image segmentation followed by a Support Vector Machine classifier and a range of spectral, texture and GIS information for classification. The overall classification accuracy for the 2000 and 2011 images were 77.1% and 82.9%, respectively. Random Forest regression algorithms were then used for modelling and mapping biomass. The model that integrated spectral, vegetation association type, texture, and vegetation indices obtained the highest accuracy (R2adj = 0.73). Among the different variables, vegetation association type was the most important variable identified by the Random Forest model. Based on the biomass maps generated from the Random Forest, total biomass in the Cangio mangrove forest increased by 820,136 tons over this period, although this change varied between the three different mangrove associations.

  5. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  6. Machine learning of molecular properties: Locality and active learning

    Science.gov (United States)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  7. Error modeling for surrogates of dynamical systems using machine learning: Machine-learning-based error model for surrogates of dynamical systems

    International Nuclear Information System (INIS)

    Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.

    2017-01-01

    A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed by simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well

  8. Machine Learning and Inverse Problem in Geodynamics

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R.

    2017-12-01

    During the past few decades numerical modeling and traditional HPC have been widely deployed in many diverse fields for problem solutions. However, in recent years the rapid emergence of machine learning (ML), a subfield of the artificial intelligence (AI), in many fields of sciences, engineering, and finance seems to mark a turning point in the replacement of traditional modeling procedures with artificial intelligence-based techniques. The study of the circulation in the interior of Earth relies on the study of high pressure mineral physics, geochemistry, and petrology where the number of the mantle parameters is large and the thermoelastic parameters are highly pressure- and temperature-dependent. More complexity arises from the fact that many of these parameters that are incorporated in the numerical models as input parameters are not yet well established. In such complex systems the application of machine learning algorithms can play a valuable role. Our focus in this study is the application of supervised machine learning (SML) algorithms in predicting mantle properties with the emphasis on SML techniques in solving the inverse problem. As a sample problem we focus on the spin transition in ferropericlase and perovskite that may cause slab and plume stagnation at mid-mantle depths. The degree of the stagnation depends on the degree of negative density anomaly at the spin transition zone. The training and testing samples for the machine learning models are produced by the numerical convection models with known magnitudes of density anomaly (as the class labels of the samples). The volume fractions of the stagnated slabs and plumes which can be considered as measures for the degree of stagnation are assigned as sample features. The machine learning models can determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at mid-mantle depths. Employing support vector machine (SVM) algorithms we show that SML techniques

  9. Investigating the impact of a LEGO(TM)-based, engineering-oriented curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines

    Science.gov (United States)

    Marulcu, Ismail

    This mixed method study examined the impact of a LEGO-based, engineering-oriented curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines. This study takes a social constructivist theoretical stance that science learning involves learning scientific concepts and their relations to each other. From this perspective, students are active participants, and they construct their conceptual understanding through the guidance of their teacher. With the goal of better understanding the use of engineering education materials in classrooms the National Academy of Engineering and National Research Council in the book "Engineering in K-12 Education" conducted an in-depth review of the potential benefits of including engineering in K--12 schools as (a) improved learning and achievement in science and mathematics, (b) increased awareness of engineering and the work of engineers, (c) understanding of and the ability to engage in engineering design, (d) interest in pursuing engineering as a career, and (e) increased technological literacy (Katehi, Pearson, & Feder, 2009). However, they also noted a lack of reliable data and rigorous research to support these assertions. Data sources included identical written tests and interviews, classroom observations and videos, teacher interviews, and classroom artifacts. To investigate the impact of the design-based simple machines curriculum compared to the scientific inquiry-based simple machines curriculum on student learning outcomes, I compared the control and the experimental groups' scores on the tests and interviews by using ANCOVA. To analyze and characterize the classroom observation videotapes, I used Jordan and Henderson's (1995) method and divide them into episodes. My analyses revealed that the design-based Design a People Mover: Simple Machines unit was, if not better, as successful as the inquiry-based FOSS Levers and Pulleys unit in terms of students' content learning. I also

  10. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  11. Introduction to Machine Learning: Class Notes 67577

    OpenAIRE

    Shashua, Amnon

    2009-01-01

    Introduction to Machine learning covering Statistical Inference (Bayes, EM, ML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering), and PAC learning (the Formal model, VC dimension, Double Sampling theorem).

  12. Building machine learning systems with Python

    CERN Document Server

    Coelho, Luis Pedro

    2015-01-01

    This book primarily targets Python developers who want to learn and use Python's machine learning capabilities and gain valuable insights from data to develop effective solutions for business problems.

  13. Comparison of Different Machine Learning Approaches for Monthly Satellite-Based Soil Moisture Downscaling over Northeast China

    Directory of Open Access Journals (Sweden)

    Yangxiaoyue Liu

    2017-12-01

    Full Text Available Although numerous satellite-based soil moisture (SM products can provide spatiotemporally continuous worldwide datasets, they can hardly be employed in characterizing fine-grained regional land surface processes, owing to their coarse spatial resolution. In this study, we proposed a machine-learning-based method to enhance SM spatial accuracy and improve the availability of SM data. Four machine learning algorithms, including classification and regression trees (CART, K-nearest neighbors (KNN, Bayesian (BAYE, and random forests (RF, were implemented to downscale the monthly European Space Agency Climate Change Initiative (ESA CCI SM product from 25-km to 1-km spatial resolution. During the regression, the land surface temperature (including daytime temperature, nighttime temperature, and diurnal fluctuation temperature, normalized difference vegetation index, surface reflections (red band, blue band, NIR band and MIR band, and digital elevation model were taken as explanatory variables to produce fine spatial resolution SM. We chose Northeast China as the study area and acquired corresponding SM data from 2003 to 2012 in unfrozen seasons. The reconstructed SM datasets were validated against in-situ measurements. The results showed that the RF-downscaled results had superior matching performance to both ESA CCI SM and in-situ measurements, and can positively respond to precipitation variation. Additionally, the RF was less affected by parameters, which revealed its robustness. Both CART and KNN ranked second. Compared to KNN, CART had a relatively close correlation with the validation data, but KNN showed preferable precision. Moreover, BAYE ranked last with significantly abnormal regression values.

  14. Machine learning based analytics of micro-MRI trabecular bone microarchitecture and texture in type 1 Gaucher disease.

    Science.gov (United States)

    Sharma, Gulshan B; Robertson, Douglas D; Laney, Dawn A; Gambello, Michael J; Terk, Michael

    2016-06-14

    Type 1 Gaucher disease (GD) is an autosomal recessive lysosomal storage disease, affecting bone metabolism, structure and strength. Current bone assessment methods are not ideal. Semi-quantitative MRI scoring is unreliable, not standardized, and only evaluates bone marrow. DXA BMD is also used but is a limited predictor of bone fragility/fracture risk. Our purpose was to measure trabecular bone microarchitecture, as a biomarker of bone disease severity, in type 1 GD individuals with different GD genotypes and to apply machine learning based analytics to discriminate between GD patients and healthy individuals. Micro-MR imaging of the distal radius was performed on 20 type 1 GD patients and 10 healthy controls (HC). Fifteen stereological and textural measures (STM) were calculated from the MR images. General linear models demonstrated significant differences between GD and HC, and GD genotypes. Stereological measures, main contributors to the first two principal components (PCs), explained ~50% of data variation and were significantly different between males and females. Subsequent PCs textural measures were significantly different between GD patients and HC individuals. Textural measures also significantly differed between GD genotypes, and distinguished between GD patients with normal and pathologic DXA scores. PCA and SVM predictive analyses discriminated between GD and HC with maximum accuracy of 73% and area under ROC curve of 0.79. Trabecular STM differences can be quantified between GD patients and HC, and GD sub-types using micro-MRI and machine learning based analytics. Work is underway to expand this approach to evaluate GD disease burden and treatment efficacy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Identification of Forested Landslides Using LiDar Data, Object-based Image Analysis, and Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Xianju Li

    2015-07-01

    Full Text Available For identification of forested landslides, most studies focus on knowledge-based and pixel-based analysis (PBA of LiDar data, while few studies have examined (semi- automated methods and object-based image analysis (OBIA. Moreover, most of them are focused on soil-covered areas with gentle hillslopes. In bedrock-covered mountains with steep and rugged terrain, it is so difficult to identify landslides that there is currently no research on whether combining semi-automated methods and OBIA with only LiDar derivatives could be more effective. In this study, a semi-automatic object-based landslide identification approach was developed and implemented in a forested area, the Three Gorges of China. Comparisons of OBIA and PBA, two different machine learning algorithms and their respective sensitivity to feature selection (FS, were first investigated. Based on the classification result, the landslide inventory was finally obtained according to (1 inclusion of holes encircled by the landslide body; (2 removal of isolated segments, and (3 delineation of closed envelope curves for landslide objects by manual digitizing operation. The proposed method achieved the following: (1 the filter features of surface roughness were first applied for calculating object features, and proved useful; (2 FS improved classification accuracy and reduced features; (3 the random forest algorithm achieved higher accuracy and was less sensitive to FS than a support vector machine; (4 compared to PBA, OBIA was more sensitive to FS, remarkably reduced computing time, and depicted more contiguous terrain segments; (5 based on the classification result with an overall accuracy of 89.11% ± 0.03%, the obtained inventory map was consistent with the referenced landslide inventory map, with a position mismatch value of 9%. The outlined approach would be helpful for forested landslide identification in steep and rugged terrain.

  16. Identification of human flap endonuclease 1 (FEN1) inhibitors using a machine learning based consensus virtual screening.

    Science.gov (United States)

    Deshmukh, Amit Laxmikant; Chandra, Sharat; Singh, Deependra Kumar; Siddiqi, Mohammad Imran; Banerjee, Dibyendu

    2017-07-25

    Human Flap endonuclease1 (FEN1) is an enzyme that is indispensable for DNA replication and repair processes and inhibition of its Flap cleavage activity results in increased cellular sensitivity to DNA damaging agents (cisplatin, temozolomide, MMS, etc.), with the potential to improve cancer prognosis. Reports of the high expression levels of FEN1 in several cancer cells support the idea that FEN1 inhibitors may target cancer cells with minimum side effects to normal cells. In this study, we used large publicly available, high-throughput screening data of small molecule compounds targeted against FEN1. Two machine learning algorithms, Support Vector Machine (SVM) and Random Forest (RF), were utilized to generate four classification models from huge PubChem bioassay data containing probable FEN1 inhibitors and non-inhibitors. We also investigated the influence of randomly selected Zinc-database compounds as negative data on the outcome of classification modelling. The results show that the SVM model with inactive compounds was superior to RF with Matthews's correlation coefficient (MCC) of 0.67 for the test set. A Maybridge database containing approximately 53 000 compounds was screened and top ranking 5 compounds were selected for enzyme and cell-based in vitro screening. The compound JFD00950 was identified as a novel FEN1 inhibitor with in vitro inhibition of flap cleavage activity as well as cytotoxic activity against a colon cancer cell line, DLD-1.

  17. Machine Learning Classification of Cirrhotic Patients with and without Minimal Hepatic Encephalopathy Based on Regional Homogeneity of Intrinsic Brain Activity.

    Science.gov (United States)

    Chen, Qiu-Feng; Chen, Hua-Jun; Liu, Jun; Sun, Tao; Shen, Qun-Tai

    2016-01-01

    Machine learning-based approaches play an important role in examining functional magnetic resonance imaging (fMRI) data in a multivariate manner and extracting features predictive of group membership. This study was performed to assess the potential for measuring brain intrinsic activity to identify minimal hepatic encephalopathy (MHE) in cirrhotic patients, using the support vector machine (SVM) method. Resting-state fMRI data were acquired in 16 cirrhotic patients with MHE and 19 cirrhotic patients without MHE. The regional homogeneity (ReHo) method was used to investigate the local synchrony of intrinsic brain activity. Psychometric Hepatic Encephalopathy Score (PHES) was used to define MHE condition. SVM-classifier was then applied using leave-one-out cross-validation, to determine the discriminative ReHo-map for MHE. The discrimination map highlights a set of regions, including the prefrontal cortex, anterior cingulate cortex, anterior insular cortex, inferior parietal lobule, precentral and postcentral gyri, superior and medial temporal cortices, and middle and inferior occipital gyri. The optimized discriminative model showed total accuracy of 82.9% and sensitivity of 81.3%. Our results suggested that a combination of the SVM approach and brain intrinsic activity measurement could be helpful for detection of MHE in cirrhotic patients.

  18. Supervised machine learning-based classification scheme to segment the brainstem on MRI in multicenter brain tumor treatment context.

    Science.gov (United States)

    Dolz, Jose; Laprie, Anne; Ken, Soléakhéna; Leroy, Henri-Arthur; Reyns, Nicolas; Massoptier, Laurent; Vermandel, Maximilien

    2016-01-01

    To constrain the risk of severe toxicity in radiotherapy and radiosurgery, precise volume delineation of organs at risk is required. This task is still manually performed, which is time-consuming and prone to observer variability. To address these issues, and as alternative to atlas-based segmentation methods, machine learning techniques, such as support vector machines (SVM), have been recently presented to segment subcortical structures on magnetic resonance images (MRI). SVM is proposed to segment the brainstem on MRI in multicenter brain cancer context. A dataset composed by 14 adult brain MRI scans is used to evaluate its performance. In addition to spatial and probabilistic information, five different image intensity values (IIVs) configurations are evaluated as features to train the SVM classifier. Segmentation accuracy is evaluated by computing the Dice similarity coefficient (DSC), absolute volumes difference (AVD) and percentage volume difference between automatic and manual contours. Mean DSC for all proposed IIVs configurations ranged from 0.89 to 0.90. Mean AVD values were below 1.5 cm(3), where the value for best performing IIVs configuration was 0.85 cm(3), representing an absolute mean difference of 3.99% with respect to the manual segmented volumes. Results suggest consistent volume estimation and high spatial similarity with respect to expert delineations. The proposed approach outperformed presented methods to segment the brainstem, not only in volume similarity metrics, but also in segmentation time. Preliminary results showed that the approach might be promising for adoption in clinical use.

  19. Learning as a Machine: Crossovers between Humans and Machines

    Science.gov (United States)

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  20. What is the machine learning.

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. In this talk, I explore a procedure for identifying combinations of variables -- aided by physical intuition -- that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. I will demonstrate these features in both an easy to understand toy model and an idealized LHC resonance scenario.

  1. Teaching machine learning to design students

    NARCIS (Netherlands)

    Vlist, van der B.J.J.; van de Westelaken, H.F.M.; Bartneck, C.; Hu, J.; Ahn, R.M.C.; Barakova, E.I.; Delbressine, F.L.M.; Feijs, L.M.G.; Pan, Z.; Zhang, X.; El Rhalibi, A.

    2008-01-01

    Machine learning is a key technology to design and create intelligent systems, products, and related services. Like many other design departments, we are faced with the challenge to teach machine learning to design students, who often do not have an inherent affinity towards technology. We

  2. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma.

    Science.gov (United States)

    Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei

    2018-04-01

    To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.

  3. Relationship between neuronal network architecture and naming performance in temporal lobe epilepsy: A connectome based approach using machine learning.

    Science.gov (United States)

    Munsell, B C; Wu, G; Fridriksson, J; Thayer, K; Mofrad, N; Desisto, N; Shen, D; Bonilha, L

    2017-09-09

    Impaired confrontation naming is a common symptom of temporal lobe epilepsy (TLE). The neurobiological mechanisms underlying this impairment are poorly understood but may indicate a structural disorganization of broadly distributed neuronal networks that support naming ability. Importantly, naming is frequently impaired in other neurological disorders and by contrasting the neuronal structures supporting naming in TLE with other diseases, it will become possible to elucidate the common systems supporting naming. We aimed to evaluate the neuronal networks that support naming in TLE by using a machine learning algorithm intended to predict naming performance in subjects with medication refractory TLE using only the structural brain connectome reconstructed from diffusion tensor imaging. A connectome-based prediction framework was developed using network properties from anatomically defined brain regions across the entire brain, which were used in a multi-task machine learning algorithm followed by support vector regression. Nodal eigenvector centrality, a measure of regional network integration, predicted approximately 60% of the variance in naming. The nodes with the highest regression weight were bilaterally distributed among perilimbic sub-networks involving mainly the medial and lateral temporal lobe regions. In the context of emerging evidence regarding the role of large structural networks that support language processing, our results suggest intact naming relies on the integration of sub-networks, as opposed to being dependent on isolated brain areas. In the case of TLE, these sub-networks may be disproportionately indicative naming processes that are dependent semantic integration from memory and lexical retrieval, as opposed to multi-modal perception or motor speech production. Copyright © 2017. Published by Elsevier Inc.

  4. An Improved Pathological Brain Detection System Based on Two-Dimensional PCA and Evolutionary Extreme Learning Machine.

    Science.gov (United States)

    Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar

    2017-12-07

    Pathological brain detection has made notable stride in the past years, as a consequence many pathological brain detection systems (PBDSs) have been proposed. But, the accuracy of these systems still needs significant improvement in order to meet the necessity of real world diagnostic situations. In this paper, an efficient PBDS based on MR images is proposed that markedly improves the recent results. The proposed system makes use of contrast limited adaptive histogram equalization (CLAHE) to enhance the quality of the input MR images. Thereafter, two-dimensional PCA (2DPCA) strategy is employed to extract the features and subsequently, a PCA+LDA approach is used to generate a compact and discriminative feature set. Finally, a new learning algorithm called MDE-ELM is suggested that combines modified differential evolution (MDE) and extreme learning machine (ELM) for segregation of MR images as pathological or healthy. The MDE is utilized to optimize the input weights and hidden biases of single-hidden-layer feed-forward neural networks (SLFN), whereas an analytical method is used for determining the output weights. The proposed algorithm performs optimization based on both the root mean squared error (RMSE) and norm of the output weights of SLFNs. The suggested scheme is benchmarked on three standard datasets and the results are compared against other competent schemes. The experimental outcomes show that the proposed scheme offers superior results compared to its counterparts. Further, it has been noticed that the proposed MDE-ELM classifier obtains better accuracy with compact network architecture than conventional algorithms.

  5. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  6. An Internet of Things System for Underground Mine Air Quality Pollutant Prediction Based on Azure Machine Learning.

    Science.gov (United States)

    Jo, ByungWan; Khan, Rana Muhammad Asad

    2018-03-21

    The implementation of wireless sensor networks (WSNs) for monitoring the complex, dynamic, and harsh environment of underground coal mines (UCMs) is sought around the world to enhance safety. However, previously developed smart systems are limited to monitoring or, in a few cases, can report events. Therefore, this study introduces a reliable, efficient, and cost-effective internet of things (IoT) system for air quality monitoring with newly added features of assessment and pollutant prediction. This system is comprised of sensor modules, communication protocols, and a base station, running Azure Machine Learning (AML) Studio over it. Arduino-based sensor modules with eight different parameters were installed at separate locations of an operational UCM. Based on the sensed data, the proposed system assesses mine air quality in terms of the mine environment index (MEI). Principal component analysis (PCA) identified CH₄, CO, SO₂, and H₂S as the most influencing gases significantly affecting mine air quality. The results of PCA were fed into the ANN model in AML studio, which enabled the prediction of MEI. An optimum number of neurons were determined for both actual input and PCA-based input parameters. The results showed a better performance of the PCA-based ANN for MEI prediction, with R ² and RMSE values of 0.6654 and 0.2104, respectively. Therefore, the proposed Arduino and AML-based system enhances mine environmental safety by quickly assessing and predicting mine air quality.

  7. An Internet of Things System for Underground Mine Air Quality Pollutant Prediction Based on Azure Machine Learning

    Directory of Open Access Journals (Sweden)

    ByungWan Jo

    2018-03-01

    Full Text Available The implementation of wireless sensor networks (WSNs for monitoring the complex, dynamic, and harsh environment of underground coal mines (UCMs is sought around the world to enhance safety. However, previously developed smart systems are limited to monitoring or, in a few cases, can report events. Therefore, this study introduces a reliable, efficient, and cost-effective internet of things (IoT system for air quality monitoring with newly added features of assessment and pollutant prediction. This system is comprised of sensor modules, communication protocols, and a base station, running Azure Machine Learning (AML Studio over it. Arduino-based sensor modules with eight different parameters were installed at separate locations of an operational UCM. Based on the sensed data, the proposed system assesses mine air quality in terms of the mine environment index (MEI. Principal component analysis (PCA identified CH4, CO, SO2, and H2S as the most influencing gases significantly affecting mine air quality. The results of PCA were fed into the ANN model in AML studio, which enabled the prediction of MEI. An optimum number of neurons were determined for both actual input and PCA-based input parameters. The results showed a better performance of the PCA-based ANN for MEI prediction, with R2 and RMSE values of 0.6654 and 0.2104, respectively. Therefore, the proposed Arduino and AML-based system enhances mine environmental safety by quickly assessing and predicting mine air quality.

  8. Machine learning in cardiovascular medicine: are we there yet?

    Science.gov (United States)

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Machine vision systems using machine learning for industrial product inspection

    Science.gov (United States)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  10. Adaptive Learning Systems: Beyond Teaching Machines

    Science.gov (United States)

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  11. Machine learning approach to detect intruders in database based on hexplet data structure

    Directory of Open Access Journals (Sweden)

    Saad M. Darwish

    2016-09-01

    Full Text Available Most of valuable information resources for any organization are stored in the database; it is a serious subject to protect this information against intruders. However, conventional security mechanisms are not designed to detect anomalous actions of database users. An intrusion detection system (IDS, delivers an extra layer of security that cannot be guaranteed by built-in security tools, is the ideal solution to defend databases from intruders. This paper suggests an anomaly detection approach that summarizes the raw transactional SQL queries into a compact data structure called hexplet, which can model normal database access behavior (abstract the user's profile and recognize impostors specifically tailored for role-based access control (RBAC database system. This hexplet lets us to preserve the correlation among SQL statements in the same transaction by exploiting the information in the transaction-log entry with the aim to improve detection accuracy specially those inside the organization and behave strange behavior. The model utilizes naive Bayes classifier (NBC as the simplest supervised learning technique for creating profiles and evaluating the legitimacy of a transaction. Experimental results show the performance of the proposed model in the term of detection rate.

  12. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  13. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  14. Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment.

    Directory of Open Access Journals (Sweden)

    Yang Li

    Full Text Available In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA methods, a new rule extraction method based on extreme learning machine (ELM and an improved Ant-miner (IAM algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.

  15. Forecasting of flowrate under rolling motion flow instability condition based on on-line sequential extreme learning machine

    International Nuclear Information System (INIS)

    Chen Hanying; Gao Puzhen; Tan Sichao; Tang Jiguo; Hou Xiaofan; Xu Huiqiang; Wu Xiangcheng

    2015-01-01

    The coupling of multiple thermal-hydraulic parameters can result in complex flow instability in natural circulation system under rolling motion. A real-time thermal-hydraulic condition prediction is helpful to the operation of systems in such condition. A single hidden layer feedforward neural networks algorithm named extreme learning machine (ELM) is considered as suitable method for this application because of its extremely fast training time, good accuracy and simplicity. However, traditional ELM assumes that all the training data are ready before the training process, while the training data is received sequentially in practical forecasting of flowrate. Therefore, this paper proposes a forecasting method for flowrate under rolling motion based on on-line sequential ELM (OS-ELM), which can learn the data one by one or chunk-by-chunk. The experiment results show that the OS-ELM method can achieve a better forecasting performance than basic ELM method and still keep the advantage of fast training and simplicity. (author)

  16. Artificial Mangrove Species Mapping Using Pléiades-1: An Evaluation of Pixel-Based and Object-Based Classifications with Selected Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Dezhi Wang

    2018-02-01

    Full Text Available In the dwindling natural mangrove today, mangrove reforestation projects are conducted worldwide to prevent further losses. Due to monoculture and the low survival rate of artificial mangroves, it is necessary to pay attention to mapping and monitoring them dynamically. Remote sensing techniques have been widely used to map mangrove forests due to their capacity for large-scale, accurate, efficient, and repetitive monitoring. This study evaluated the capability of a 0.5-m Pléiades-1 in classifying artificial mangrove species using both pixel-based and object-based classification schemes. For comparison, three machine learning algorithms—decision tree (DT, support vector machine (SVM, and random forest (RF—were used as the classifiers in the pixel-based and object-based classification procedure. The results showed that both the pixel-based and object-based approaches could recognize the major discriminations between the four major artificial mangrove species. However, the object-based method had a better overall accuracy than the pixel-based method on average. For pixel-based image analysis, SVM produced the highest overall accuracy (79.63%; for object-based image analysis, RF could achieve the highest overall accuracy (82.40%, and it was also the best machine learning algorithm for classifying artificial mangroves. The patches produced by object-based image analysis approaches presented a more generalized appearance and could contiguously depict mangrove species communities. When the same machine learning algorithms were compared by McNemar’s test, a statistically significant difference in overall classification accuracy between the pixel-based and object-based classifications only existed in the RF algorithm. Regarding species, monoculture and dominant mangrove species Sonneratia apetala group 1 (SA1 as well as partly mixed and regular shape mangrove species Hibiscus tiliaceus (HT could well be identified. However, for complex and easily

  17. Probabilistic models and machine learning in structural bioinformatics

    DEFF Research Database (Denmark)

    Hamelryck, Thomas

    2009-01-01

    . Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis...

  18. Short-Term Distribution System State Forecast Based on Optimal Synchrophasor Sensor Placement and Extreme Learning Machine

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Huaiguang; Zhang, Yingchen

    2016-11-14

    This paper proposes an approach for distribution system state forecasting, which aims to provide an accurate and high speed state forecasting with an optimal synchrophasor sensor placement (OSSP) based state estimator and an extreme learning machine (ELM) based forecaster. Specifically, considering the sensor installation cost and measurement error, an OSSP algorithm is proposed to reduce the number of synchrophasor sensor and keep the whole distribution system numerically and topologically observable. Then, the weighted least square (WLS) based system state estimator is used to produce the training data for the proposed forecaster. Traditionally, the artificial neural network (ANN) and support vector regression (SVR) are widely used in forecasting due to their nonlinear modeling capabilities. However, the ANN contains heavy computation load and the best parameters for SVR are difficult to obtain. In this paper, the ELM, which overcomes these drawbacks, is used to forecast the future system states with the historical system states. The proposed approach is effective and accurate based on the testing results.

  19. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

    Science.gov (United States)

    Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

    2011-01-01

    Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433

  20. International Conference on Extreme Learning Machines 2014

    CERN Document Server

    Mao, Kezhi; Cambria, Erik; Man, Zhihong; Toh, Kar-Ann

    2015-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2014, which was held in Singapore, December 8-10, 2014. This conference brought together the researchers and practitioners of Extreme Learning Machine (ELM) from a variety of fields to promote research and development of “learning without iterative tuning”.  The book covers theories, algorithms and applications of ELM. It gives the readers a glance of the most recent advances of ELM.  

  1. An introduction to quantum machine learning

    OpenAIRE

    Schuld, M.; Sinayskiy, I.; Petruccione, F.

    2014-01-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum compute...

  2. International Conference on Extreme Learning Machine 2015

    CERN Document Server

    Mao, Kezhi; Wu, Jonathan; Lendasse, Amaury; ELM 2015; Theory, Algorithms and Applications (I); Theory, Algorithms and Applications (II)

    2016-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2015, which was held in Hangzhou, China, December 15-17, 2015. This conference brought together researchers and engineers to share and exchange R&D experience on both theoretical studies and practical applications of the Extreme Learning Machine (ELM) technique and brain learning. This book covers theories, algorithms ad applications of ELM. It gives readers a glance of the most recent advances of ELM. .

  3. Detection of Convective Initiation Using Meteorological Imager Onboard Communication, Ocean, and Meteorological Satellite Based on Machine Learning Approaches

    Directory of Open Access Journals (Sweden)

    Hyangsun Han

    2015-07-01

    Full Text Available As convective clouds in Northeast Asia are accompanied by various hazards related with heavy rainfall and thunderstorms, it is very important to detect convective initiation (CI in the region in order to mitigate damage by such hazards. In this study, a novel approach for CI detection using images from Meteorological Imager (MI, a payload of the Communication, Ocean, and Meteorological Satellite (COMS, was developed by improving the criteria of the interest fields of Rapidly Developing Cumulus Areas (RDCA derivation algorithm, an official CI detection algorithm for Multi-functional Transport SATellite-2 (MTSAT-2, based on three machine learning approaches—decision trees (DT, random forest (RF, and support vector machines (SVM. CI was defined as clouds within a 16 × 16 km window with the first detection of lightning occurrence at the center. A total of nine interest fields derived from visible, water vapor, and two thermal infrared images of MI obtained 15–75 min before the lightning occurrence were used as input variables for CI detection. RF produced slightly higher performance (probability of detection (POD of 75.5% and false alarm rate (FAR of 46.2% than DT (POD of 70.7% and FAR of 46.6% for detection of CI caused by migrating frontal cyclones and unstable atmosphere. SVM resulted in relatively poor performance with very high FAR ~83.3%. The averaged lead times of CI detection based on the DT and RF models were 36.8 and 37.7 min, respectively. This implies that CI over Northeast Asia can be forecasted ~30–45 min in advance using COMS MI data.

  4. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  5. Finding New Perovskite Halides via Machine learning

    Directory of Open Access Journals (Sweden)

    Ghanshyam ePilania

    2016-04-01

    Full Text Available Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning via building a support vector machine (SVM based classifier that uses elemental features (or descriptors to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  6. Finding New Perovskite Halides via Machine learning

    Science.gov (United States)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  7. Machine learning spatial geometry from entanglement features

    Science.gov (United States)

    You, Yi-Zhuang; Yang, Zhao; Qi, Xiao-Liang

    2018-02-01

    Motivated by the close relations of the renormalization group with both the holography duality and the deep learning, we propose that the holographic geometry can emerge from deep learning the entanglement feature of a quantum many-body state. We develop a concrete algorithm, call the entanglement feature learning (EFL), based on the random tensor network (RTN) model for the tensor network holography. We show that each RTN can be mapped to a Boltzmann machine, trained by the entanglement entropies over all subregions of a given quantum many-body state. The goal is to construct the optimal RTN that best reproduce the entanglement feature. The RTN geometry can then be interpreted as the emergent holographic geometry. We demonstrate the EFL algorithm on a 1D free fermion system and observe the emergence of the hyperbolic geometry (AdS3 spatial geometry) as we tune the fermion system towards the gapless critical point (CFT2 point).

  8. N-grams Based Supervised Machine Learning Model for Mobile Agent Platform Protection against Unknown Malicious Mobile Agents

    Directory of Open Access Journals (Sweden)

    Pallavi Bagga

    2017-12-01

    Full Text Available From many past years, the detection of unknown malicious mobile agents before they invade the Mobile Agent Platform has been the subject of much challenging activity. The ever-growing threat of malicious agents calls for techniques for automated malicious agent detection. In this context, the machine learning (ML methods are acknowledged more effective than the Signature-based and Behavior-based detection methods. Therefore, in this paper, the prime contribution has been made to detect the unknown malicious mobile agents based on n-gram features and supervised ML approach, which has not been done so far in the sphere of the Mobile Agents System (MAS security. To carry out the study, the n-grams ranging from 3 to 9 are extracted from a dataset containing 40 malicious and 40 non-malicious mobile agents. Subsequently, the classification is performed using different classifiers. A nested 5-fold cross validation scheme is employed in order to avoid the biasing in the selection of optimal parameters of classifier. The observations of extensive experiments demonstrate that the work done in this paper is suitable for the task of unknown malicious mobile agent detection in a Mobile Agent Environment, and also adds the ML in the interest list of researchers dealing with MAS security.

  9. A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

    Science.gov (United States)

    van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y

    2018-04-17

    Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to

  10. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  11. An introduction to machine learning with Scikit-Learn

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  12. Improving Simulations of Extreme Flows by Coupling a Physically-based Hydrologic Model with a Machine Learning Model

    Science.gov (United States)

    Mohammed, K.; Islam, A. S.; Khan, M. J. U.; Das, M. K.

    2017-12-01

    With the large number of hydrologic models presently available along with the global weather and geographic datasets, streamflows of almost any river in the world can be easily modeled. And if a reasonable amount of observed data from that river is available, then simulations of high accuracy can sometimes be performed after calibrating the model parameters against those observed data through inverse modeling. Although such calibrated models can succeed in simulating the general trend or mean of the observed flows very well, more often than not they fail to adequately simulate the extreme flows. This causes difficulty in tasks such as generating reliable projections of future changes in extreme flows due to climate change, which is obviously an important task due to floods and droughts being closely connected to people's lives and livelihoods. We propose an approach where the outputs of a physically-based hydrologic model are used as an input to a machine learning model to try and better simulate the extreme flows. To demonstrate this offline-coupling approach, the Soil and Water Assessment Tool (SWAT) was selected as the physically-based hydrologic model, the Artificial Neural Network (ANN) as the machine learning model and the Ganges-Brahmaputra-Meghna (GBM) river system as the study area. The GBM river system, located in South Asia, is the third largest in the world in terms of freshwater generated and forms the largest delta in the world. The flows of the GBM rivers were simulated separately in order to test the performance of this proposed approach in accurately simulating the extreme flows generated by different basins that vary in size, climate, hydrology and anthropogenic intervention on stream networks. Results show that by post-processing the simulated flows of the SWAT models with ANN models, simulations of extreme flows can be significantly improved. The mean absolute errors in simulating annual maximum/minimum daily flows were minimized from 4967

  13. Time series analytics using sliding window metaheuristic optimization-based machine learning system for identifying building energy consumption patterns

    International Nuclear Information System (INIS)

    Chou, Jui-Sheng; Ngo, Ngoc-Tri

    2016-01-01

    Highlights: • This study develops a novel time-series sliding window forecast system. • The system integrates metaheuristics, machine learning and time-series models. • Site experiment of smart grid infrastructure is installed to retrieve real-time data. • The proposed system accurately predicts energy consumption in residential buildings. • The forecasting system can help users minimize their electricity usage. - Abstract: Smart grids are a promising solution to the rapidly growing power demand because they can considerably increase building energy efficiency. This study developed a novel time-series sliding window metaheuristic optimization-based machine learning system for predicting real-time building energy consumption data collected by a smart grid. The proposed system integrates a seasonal autoregressive integrated moving average (SARIMA) model and metaheuristic firefly algorithm-based least squares support vector regression (MetaFA-LSSVR) model. Specifically, the proposed system fits the SARIMA model to linear data components in the first stage, and the MetaFA-LSSVR model captures nonlinear data components in the second stage. Real-time data retrieved from an experimental smart grid installed in a building were used to evaluate the efficacy and effectiveness of the proposed system. A k-week sliding window approach is proposed for employing historical data as input for the novel time-series forecasting system. The prediction system yielded high and reliable accuracy rates in 1-day-ahead predictions of building energy consumption, with a total error rate of 1.181% and mean absolute error of 0.026 kW h. Notably, the system demonstrates an improved accuracy rate in the range of 36.8–113.2% relative to those of the linear forecasting model (i.e., SARIMA) and nonlinear forecasting models (i.e., LSSVR and MetaFA-LSSVR). Therefore, end users can further apply the forecasted information to enhance efficiency of energy usage in their buildings, especially

  14. Unmanned Aerial System (UAS)-based phenotyping of soybean using multi-sensor data fusion and extreme learning machine

    Science.gov (United States)

    Maimaitijiang, Maitiniyazi; Ghulam, Abduwasit; Sidike, Paheding; Hartling, Sean; Maimaitiyiming, Matthew; Peterson, Kyle; Shavers, Ethan; Fishman, Jack; Peterson, Jim; Kadam, Suhas; Burken, Joel; Fritschi, Felix

    2017-12-01

    Estimating crop biophysical and biochemical parameters with high accuracy at low-cost is imperative for high-throughput phenotyping in precision agriculture. Although fusion of data from multiple sensors is a common application in remote sensing, less is known on the contribution of low-cost RGB, multispectral and thermal sensors to rapid crop phenotyping. This is due to the fact that (1) simultaneous collection of multi-sensor data using satellites are rare and (2) multi-sensor data collected during a single flight have not been accessible until recent developments in Unmanned Aerial Systems (UASs) and UAS-friendly sensors that allow efficient information fusion. The objective of this study was to evaluate the power of high spatial resolution RGB, multispectral and thermal data fusion to estimate soybean (Glycine max) biochemical parameters including chlorophyll content and nitrogen concentration, and biophysical parameters including Leaf Area Index (LAI), above ground fresh and dry biomass. Multiple low-cost sensors integrated on UASs were used to collect RGB, multispectral, and thermal images throughout the growing season at a site established near Columbia, Missouri, USA. From these images, vegetation indices were extracted, a Crop Surface Model (CSM) was advanced, and a model to extract the vegetation fraction was developed. Then, spectral indices/features were combined to model and predict crop biophysical and biochemical parameters using Partial Least Squares Regression (PLSR), Support Vector Regression (SVR), and Extreme Learning Machine based Regression (ELR) techniques. Results showed that: (1) For biochemical variable estimation, multispectral and thermal data fusion provided the best estimate for nitrogen concentration and chlorophyll (Chl) a content (RMSE of 9.9% and 17.1%, respectively) and RGB color information based indices and multispectral data fusion exhibited the largest RMSE 22.6%; the highest accuracy for Chl a + b content estimation was

  15. Status Checking System of Home Appliances using machine learning

    Directory of Open Access Journals (Sweden)

    Yoon Chi-Yurl

    2017-01-01

    Full Text Available This paper describes status checking system of home appliances based on machine learning, which can be applied to existing household appliances without networking function. Designed status checking system consists of sensor modules, a wireless communication module, cloud server, android application and a machine learning algorithm. The developed system applied to washing machine analyses and judges the four-kinds of appliance’s status such as staying, washing, rinsing and spin-drying. The measurements of sensor and transmission of sensing data are operated on an Arduino board and the data are transmitted to cloud server in real time. The collected data are parsed by an Android application and injected into the machine learning algorithm for learning the status of the appliances. The machine learning algorithm compares the stored learning data with collected real-time data from the appliances. Our results are expected to contribute as a base technology to design an automatic control system based on machine learning technology for household appliances in real-time.

  16. A fast hybrid methodology based on machine learning, quantum methods, and experimental measurements for evaluating material properties

    Science.gov (United States)

    Kong, Chang Sun; Haverty, Michael; Simka, Harsono; Shankar, Sadasivan; Rajan, Krishna

    2017-09-01

    We present a hybrid approach based on both machine learning and targeted ab-initio calculations to determine adhesion energies between dissimilar materials. The goals of this approach are to complement experimental and/or all ab-initio computational efforts, to identify promising materials rapidly and identify in a quantitative manner the relative contributions of the different material attributes affecting adhesion. Applications of the methodology to predict bulk modulus, yield strength, adhesion and wetting properties of copper (Cu) with other materials including metals, nitrides and oxides is discussed in this paper. In the machine learning component of this methodology, the parameters that were chosen can be roughly divided into four types: atomic and crystalline parameters (which are related to specific elements such as electronegativities, electron densities in Wigner-Seitz cells); bulk material properties (e.g. melting point), mechanical properties (e.g. modulus) and those representing atomic characteristics in ab-initio formalisms (e.g. pseudopotentials). The atomic parameters are defined over one dataset to determine property correlation with published experimental data. We then develop a semi-empirical model across multiple datasets to predict adhesion in material interfaces outside the original datasets. Since adhesion is between two materials, we appropriately use parameters which indicate differences between the elements that comprise the materials. These semi-empirical predictions agree reasonably with the trend in chemical work of adhesion predicted using ab-initio techniques and are used for fast materials screening. For the screened candidates, the ab-initio modeling component provides fundamental understanding of the chemical interactions at the interface, and explains the wetting thermodynamics of thin Cu layers on various substrates. Comparison against ultra-high vacuum (UHV) experiments for well-characterized Cu/Ta and Cu/α-Al2O3 interfaces is

  17. Discrimination of Brazilian propolis according to the seasoning using chemometrics and machine learning based on UV-Vis scanning data.

    Science.gov (United States)

    Tomazzoli, Maíra M; Pai Neto, Remi D; Moresco, Rodolfo; Westphal, Larissa; Zeggio, Amelia R S; Specht, Leandro; Costa, Christopher; Rocha, Miguel; Maraschin, Marcelo

    2015-12-01

    Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plant's resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( λ= 280-400 ηm), suggesting that besides the biological activities of those

  18. Machine Learning Methods to Predict Diabetes Complications.

    Science.gov (United States)

    Dagliati, Arianna; Marini, Simone; Sacchi, Lucia; Cogni, Giulia; Teliti, Marsida; Tibollo, Valentina; De Cata, Pasquale; Chiovato, Luca; Bellazzi, Riccardo

    2018-03-01

    One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical practice.

  19. Using Machine Learning in Adversarial Environments.

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Warren Leon [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-02-01

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approaches only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.

  20. A new intelligent classifier for breast cancer diagnosis based on a rough set and extreme learning machine: RS + ELM

    OpenAIRE

    KAYA, Yılmaz

    2014-01-01

    Breast cancer is one of the leading causes of death among women all around the world. Therefore, true and early diagnosis of breast cancer is an important problem. The rough set (RS) and extreme learning machine (ELM) methods were used collectively in this study for the diagnosis of breast cancer. The unnecessary attributes were discarded from the dataset by means of the RS approach. The classification process by means of ELM was performed using the remaining attributes. The Wisconsin B...

  1. A Novel Hybrid Model Based on Extreme Learning Machine, k-Nearest Neighbor Regression and Wavelet Denoising Applied to Short-Term Electric Load Forecasting

    Directory of Open Access Journals (Sweden)

    Weide Li

    2017-05-01

    Full Text Available Electric load forecasting plays an important role in electricity markets and power systems. Because electric load time series are complicated and nonlinear, it is very difficult to achieve a satisfactory forecasting accuracy. In this paper, a hybrid model, Wavelet Denoising-Extreme Learning Machine optimized by k-Nearest Neighbor Regression (EWKM, which combines k-Nearest Neighbor (KNN and Extreme Learning Machine (ELM based on a wavelet denoising technique is proposed for short-term load forecasting. The proposed hybrid model decomposes the time series into a low frequency-associated main signal and some detailed signals associated with high frequencies at first, then uses KNN to determine the independent and dependent variables from the low-frequency signal. Finally, the ELM is used to get the non-linear relationship between these variables to get the final prediction result for the electric load. Compared with three other models, Extreme Learning Machine optimized by k-Nearest Neighbor Regression (EKM, Wavelet Denoising-Extreme Learning Machine (WKM and Wavelet Denoising-Back Propagation Neural Network optimized by k-Nearest Neighbor Regression (WNNM, the model proposed in this paper can improve the accuracy efficiently. New South Wales is the economic powerhouse of Australia, so we use the proposed model to predict electric demand for that region. The accurate prediction has a significant meaning.

  2. Machine learning application in the life time of materials

    OpenAIRE

    Yu, Xiaojiao

    2017-01-01

    Materials design and development typically takes several decades from the initial discovery to commercialization with the traditional trial and error development approach. With the accumulation of data from both experimental and computational results, data based machine learning becomes an emerging field in materials discovery, design and property prediction. This manuscript reviews the history of materials science as a disciplinary the most common machine learning method used in materials sc...

  3. Machine learning in laboratory medicine: waiting for the flood?

    Science.gov (United States)

    Cabitza, Federico; Banfi, Giuseppe

    2018-03-28

    This review focuses on machine learning and on how methods and models combining data analytics and artificial intelligence have been applied to laboratory medicine so far. Although still in its infancy, the potential for applying machine learning to laboratory data for both diagnostic and prognostic purposes deserves more attention by the readership of this journal, as well as by physician-scientists who will want to take advantage of this new computer-based support in pathology and laboratory medicine.

  4. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2016-01-01

    Machine learning techniques relevant for nonlinearity mitigation, carrier recovery, and nanoscale device characterization are reviewed and employed. Markov Chain Monte Carlo in combination with Bayesian filtering is employed within the nonlinear state-space framework and demonstrated for parameter...

  5. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2015-01-01

    Techniques from the machine learning community are reviewed and employed for laser characterization, signal detection in the presence of nonlinear phase noise, and nonlinearity mitigation. Bayesian filtering and expectation maximization are employed within nonlinear state-space framework...

  6. Machine learning applications in genetics and genomics.

    Science.gov (United States)

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  7. Computer vision and machine learning for archaeology

    NARCIS (Netherlands)

    van der Maaten, L.J.P.; Boon, P.; Lange, G.; Paijmans, J.J.; Postma, E.

    2006-01-01

    Until now, computer vision and machine learning techniques barely contributed to the archaeological domain. The use of these techniques can support archaeologists in their assessment and classification of archaeological finds. The paper illustrates the use of computer vision techniques for

  8. Using Machine Learning for Land Suitability Classification

    African Journals Online (AJOL)

    User

    West African Journal of Applied Ecology, vol. ... evidence for the utility of machine learning methods in land suitability classification especially MCS methods. ... Artificial intelligence tools. ..... Numerical values of index for the various classes.

  9. Detection of Stress Levels from Biosignals Measured in Virtual Reality Environments Using a Kernel-Based Extreme Learning Machine.

    Science.gov (United States)

    Cho, Dongrae; Ham, Jinsil; Oh, Jooyoung; Park, Jeanho; Kim, Sayup; Lee, Nak-Kyu; Lee, Boreom

    2017-10-24

    Virtual reality (VR) is a computer technique that creates an artificial environment composed of realistic images, sounds, and other sensations. Many researchers have used VR devices to generate various stimuli, and have utilized them to perform experiments or to provide treatment. In this study, the participants performed mental tasks using a VR device while physiological signals were measured: a photoplethysmogram (PPG), electrodermal activity (EDA), and skin temperature (SKT). In general, stress is an important factor that can influence the autonomic nervous system (ANS). Heart-rate variability (HRV) is known to be related to ANS activity, so we used an HRV derived from the PPG peak interval. In addition, the peak characteristics of the skin conductance (SC) from EDA and SKT variation can also reflect ANS activity; we utilized them as well. Then, we applied a kernel-based extreme-learning machine (K-ELM) to correctly classify the stress levels induced by the VR task to reflect five different levels of stress situations: baseline, mild stress, moderate stress, severe stress, and recovery. Twelve healthy subjects voluntarily participated in the study. Three physiological signals were measured in stress environment generated by VR device. As a result, the average classification accuracy was over 95% using K-ELM and the integrated feature (IT = HRV + SC + SKT). In addition, the proposed algorithm can embed a microcontroller chip since K-ELM algorithm have very short computation time. Therefore, a compact wearable device classifying stress levels using physiological signals can be developed.

  10. Automated cell analysis tool for a genome-wide RNAi screen with support vector machine based supervised learning

    Science.gov (United States)

    Remmele, Steffen; Ritzerfeld, Julia; Nickel, Walter; Hesser, Jürgen

    2011-03-01

    RNAi-based high-throughput microscopy screens have become an important tool in biological sciences in order to decrypt mostly unknown biological functions of human genes. However, manual analysis is impossible for such screens since the amount of image data sets can often be in the hundred thousands. Reliable automated tools are thus required to analyse the fluorescence microscopy image data sets usually containing two or more reaction channels. The herein presented image analysis tool is designed to analyse an RNAi screen investigating the intracellular trafficking and targeting of acylated Src kinases. In this specific screen, a data set consists of three reaction channels and the investigated cells can appear in different phenotypes. The main issue of the image processing task is an automatic cell segmentation which has to be robust and accurate for all different phenotypes and a successive phenotype classification. The cell segmentation is done in two steps by segmenting the cell nuclei first and then using a classifier-enhanced region growing on basis of the cell nuclei to segment the cells. The classification of the cells is realized by a support vector machine which has to be trained manually using supervised learning. Furthermore, the tool is brightness invariant allowing different staining quality and it provides a quality control that copes with typical defects during preparation and acquisition. A first version of the tool has already been successfully applied for an RNAi-screen containing three hundred thousand image data sets and the SVM extended version is designed for additional screens.

  11. [Remote intelligent Brunnstrom assessment system for upper limb rehabilitation for post-stroke based on extreme learning machine].

    Science.gov (United States)

    Wang, Yue; Yu, Lei; Fu, Jianming; Fang, Qiang

    2014-04-01

    In order to realize an individualized and specialized rehabilitation assessment of remoteness and intelligence, we set up a remote intelligent assessment system of upper limb movement function of post-stroke patients during rehabilitation. By using the remote rehabilitation training sensors and client data sampling software, we collected and uploaded the gesture data from a patient's forearm and upper arm during rehabilitation training to database of the server. Then a remote intelligent assessment system, which had been developed based on the extreme learning machine (ELM) algorithm and Brunnstrom stage assessment standard, was used to evaluate the gesture data. To evaluate the reliability of the proposed method, a group of 23 stroke patients, whose upper limb movement functions were in different recovery stages, and 4 healthy people, whose upper limb movement functions were normal, were recruited to finish the same training task. The results showed that, compared to that of the experienced rehabilitation expert who used the Brunnstrom stage standard table, the accuracy of the proposed remote Brunnstrom intelligent assessment system can reach a higher level, as 92.1%. The practical effects of surgery have proved that the proposed system could realize the intelligent assessment of upper limb movement function of post-stroke patients remotely, and it could also make the rehabilitation of the post-stroke patients at home or in a community care center possible.

  12. Evaluation of different time domain peak models using extreme learning machine-based peak detection for EEG signal.

    Science.gov (United States)

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Cumming, Paul; Mubin, Marizan

    2016-01-01

    Various peak models have been introduced to detect and analyze peaks in the time domain analysis of electroencephalogram (EEG) signals. In general, peak model in the time domain analysis consists of a set of signal parameters, such as amplitude, width, and slope. Models including those proposed by Dumpala, Acir, Liu, and Dingle are routinely used to detect peaks in EEG signals acquired in clinical studies of epilepsy or eye blink. The optimal peak model is the most reliable peak detection performance in a particular application. A fair measure of performance of different models requires a common and unbiased platform. In this study, we evaluate the performance of the four different peak models using the extreme learning machine (ELM)-based peak detection algorithm. We found that the Dingle model gave the best performance, with 72 % accuracy in the analysis of real EEG data. Statistical analysis conferred that the Dingle model afforded significantly better mean testing accuracy than did the Acir and Liu models, which were in the range 37-52 %. Meanwhile, the Dingle model has no significant difference compared to Dumpala model.

  13. Model-Agnostic Interpretability of Machine Learning

    OpenAIRE

    Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos

    2016-01-01

    Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred f...

  14. Implementing Machine Learning in the PCWG Tool

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    2016-12-13

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  15. PCA-based polling strategy in machine learning framework for coronary artery disease risk assessment in intravascular ultrasound: A link between carotid and coronary grayscale plaque morphology.

    Science.gov (United States)

    Araki, Tadashi; Ikeda, Nobutaka; Shukla, Devarshi; Jain, Pankaj K; Londhe, Narendra D; Shrivastava, Vimal K; Banchhor, Sumit K; Saba, Luca; Nicolaides, Andrew; Shafique, Shoaib; Laird, John R; Suri, Jasjit S

    2016-05-01

    Percutaneous coronary interventional procedures need advance planning prior to stenting or an endarterectomy. Cardiologists use intravascular ultrasound (IVUS) for screening, risk assessment and stratification of coronary artery disease (CAD). We hypothesize that plaque components are vulnerable to rupture due to plaque progression. Currently, there are no standard grayscale IVUS tools for risk assessment of plaque rupture. This paper presents a novel strategy for risk stratification based on plaque morphology embedded with principal component analysis (PCA) for plaque feature dimensionality reduction and dominant feature selection technique. The risk assessment utilizes 56 grayscale coronary features in a machine learning framework while linking information from carotid and coronary plaque burdens due to their common genetic makeup. This system consists of a machine learning paradigm which uses a support vector machine (SVM) combined with PCA for optimal and dominant coronary artery morphological feature extraction. Carotid artery proven intima-media thickness (cIMT) biomarker is adapted as a gold standard during the training phase of the machine learning system. For the performance evaluation, K-fold cross validation protocol is adapted with 20 trials per fold. For choosing the dominant features out of the 56 grayscale features, a polling strategy of PCA is adapted where the original value of the features is unaltered. Different protocols are designed for establishing the stability and reliability criteria of the coronary risk assessment system (cRAS). Using the PCA-based machine learning paradigm and cross-validation protocol, a classification accuracy of 98.43% (AUC 0.98) with K=10 folds using an SVM radial basis function (RBF) kernel was achieved. A reliability index of 97.32% and machine learning stability criteria of 5% were met for the cRAS. This is the first Computer aided design (CADx) system of its kind that is able to demonstrate the ability of coronary

  16. On the Use of Machine Learning for Identifying Botnet Network Traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    contemporary approaches use machine learning techniques for identifying malicious traffic. This paper presents a survey of contemporary botnet detection methods that rely on machine learning for identifying botnet network traffic. The paper provides a comprehensive overview on the existing scientific work thus...... contributing to the better understanding of capabilities, limitations and opportunities of using machine learning for identifying botnet traffic. Furthermore, the paper outlines possibilities for the future development of machine learning-based botnet detection systems....

  17. Machine learning techniques for optical communication system optimization

    DEFF Research Database (Denmark)

    Zibar, Darko; Wass, Jesper; Thrane, Jakob

    In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction.......In this paper, machine learning techniques relevant to optical communication are presented and discussed. The focus is on applying machine learning tools to optical performance monitoring and performance prediction....

  18. Robust Matching Pursuit Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Zejian Yuan

    2018-01-01

    Full Text Available Extreme learning machine (ELM is a popular learning algorithm for single hidden layer feedforward networks (SLFNs. It was originally proposed with the inspiration from biological learning and has attracted massive attentions due to its adaptability to various tasks with a fast learning ability and efficient computation cost. As an effective sparse representation method, orthogonal matching pursuit (OMP method can be embedded into ELM to overcome the singularity problem and improve the stability. Usually OMP recovers a sparse vector by minimizing a least squares (LS loss, which is efficient for Gaussian distributed data, but may suffer performance deterioration in presence of non-Gaussian data. To address this problem, a robust matching pursuit method based on a novel kernel risk-sensitive loss (in short KRSLMP is first proposed in this paper. The KRSLMP is then applied to ELM to solve the sparse output weight vector, and the new method named the KRSLMP-ELM is developed for SLFN learning. Experimental results on synthetic and real-world data sets confirm the effectiveness and superiority of the proposed method.

  19. Towards Machine Learning of Motor Skills

    Science.gov (United States)

    Peters, Jan; Schaal, Stefan; Schölkopf, Bernhard

    Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

  20. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data.

    Science.gov (United States)

    Pesesky, Mitchell W; Hussain, Tahir; Wallace, Meghan; Patel, Sanket; Andleeb, Saadia; Burnham, Carey-Ann D; Dantas, Gautam

    2016-01-01

    The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0 and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance factors and

  1. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data

    Directory of Open Access Journals (Sweden)

    Mitchell Pesesky

    2016-11-01

    Full Text Available The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitate initial use of empiric (frequently broad-spectrum antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0% and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance

  2. An introduction to quantum machine learning

    Science.gov (United States)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  3. A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces

    Directory of Open Access Journals (Sweden)

    Noeline Wilhelmina Prins

    2014-05-01

    Full Text Available Brain-Machine Interfaces (BMIs can be used to restore function in people living with paralysis. Current BMIs require extensive calibration that increase the set-up times and external inputs for decoder training that may be difficult to produce in paralyzed individuals. Both these factors have presented challenges in transitioning the technology from research environments to activities of daily living (ADL. For BMIs to be seamlessly used in ADL, these issues should be handled with minimal external input thus reducing the need for a technician/caregiver to calibrate the system. Reinforcement Learning (RL based BMIs are a good tool to be used when there is no external training signal and can provide an adaptive modality to train BMI decoders. However, RL based BMIs are sensitive to the feedback provided to adapt the BMI. In actor-critic BMIs, this feedback is provided by the critic and the overall system performance is limited by the critic accuracy. In this work, we developed an adaptive BMI that could handle inaccuracies in the critic feedback in an effort to produce more accurate RL based BMIs. We developed a confidence measure, which indicated how appropriate the feedback is for updating the decoding parameters of the actor. The results show that with the new update formulation, the critic accuracy is no longer a limiting factor for the overall performance. We tested and validated the system on three different data sets: synthetic data generated by an Izhikevich neural spiking model, synthetic data with a Gaussian noise distribution, and data collected from a non-human primate engaged in a reaching task. All results indicated that the system with the critic confidence built in always outperformed the system without the critic confidence. Results of this study suggest the potential application of the technique in developing an autonomous BMI that does not need an external signal for training or extensive calibration.

  4. Virtual screening approach to identifying influenza virus neuraminidase inhibitors using molecular docking combined with machine-learning-based scoring function.

    Science.gov (United States)

    Zhang, Li; Ai, Hai-Xin; Li, Shi-Meng; Qi, Meng-Yuan; Zhao, Jian; Zhao, Qi; Liu, Hong-Sheng

    2017-10-10

    In recent years, an epidemic of the highly pathogenic avian influenza H7N9 virus has persisted in China, with a high mortality rate. To develop novel anti-influenza therapies, we have constructed a machine-learning-based scoring function (RF-NA-Score) for the effective virtual screening of lead compounds targeting the viral neuraminidase (NA) protein. RF-NA-Score is more accurate than RF-Score, with a root-mean-square error of 1.46, Pearson's correlation coefficient of 0.707, and Spearman's rank correlation coefficient of 0.707 in a 5-fold cross-validation study. The performance of RF-NA-Score in a docking-based virtual screening of NA inhibitors was evaluated with a dataset containing 281 NA inhibitors and 322 noninhibitors. Compared with other docking-rescoring virtual screening strategies, rescoring with RF-NA-Score significantly improved the efficiency of virtual screening, and a strategy that averaged the scores given by RF-NA-Score, based on the binding conformations predicted with AutoDock, AutoDock Vina, and LeDock, was shown to be the best strategy. This strategy was then applied to the virtual screening of NA inhibitors in the SPECS database. The 100 selected compounds were tested in an in vitro H7N9 NA inhibition assay, and two compounds with novel scaffolds showed moderate inhibitory activities. These results indicate that RF-NA-Score improves the efficiency of virtual screening for NA inhibitors, and can be used successfully to identify new NA inhibitor scaffolds. Scoring functions specific for other drug targets could also be established with the same method.

  5. Pathogenesis-based treatments in primary Sjogren's syndrome using artificial intelligence and advanced machine learning techniques: a systematic literature review.

    Science.gov (United States)

    Foulquier, Nathan; Redou, Pascal; Le Gal, Christophe; Rouvière, Bénédicte; Pers, Jacques-Olivier; Saraux, Alain

    2018-05-17

    Big data analysis has become a common way to extract information from complex and large datasets among most scientific domains. This approach is now used to study large cohorts of patients in medicine. This work is a review of publications that have used artificial intelligence and advanced machine learning techniques to study physio pathogenesis-based treatments in pSS. A systematic literature review retrieved all articles reporting on the use of advanced statistical analysis applied to the study of systemic autoimmune diseases (SADs) over the last decade. An automatic bibliography screening method has been developed to perform this task. The program called BIBOT was designed to fetch and analyze articles from the pubmed database using a list of keywords and Natural Language Processing approaches. The evolution of trends in statistical approaches, sizes of cohorts and number of publications over this period were also computed in the process. In all, 44077 abstracts were screened and 1017 publications were analyzed. The mean number of selected articles was 101.0 (S.D. 19.16) by year, but increased significantly over the time (from 74 articles in 2008 to 138 in 2017). Among them only 12 focused on pSS but none of them emphasized on the aspect of pathogenesis-based treatments. To conclude, medicine progressively enters the era of big data analysis and artificial intelligence, but these approaches are not yet used to describe pSS-specific pathogenesis-based treatment. Nevertheless, large multicentre studies are investigating this aspect with advanced algorithmic tools on large cohorts of SADs patients.

  6. Multiple-Machine Scheduling with Learning Effects and Cooperative Games

    Directory of Open Access Journals (Sweden)

    Yiyuan Zhou

    2015-01-01

    Full Text Available Multiple-machine scheduling problems with position-based learning effects are studied in this paper. There is an initial schedule in this scheduling problem. The optimal schedule minimizes the sum of the weighted completion times; the difference between the initial total weighted completion time and the minimal total weighted completion time is the cost savings. A multiple-machine sequencing game is introduced to allocate the cost savings. The game is balanced if the normal processing times of jobs that are on the same machine are equal and an equal number of jobs are scheduled on each machine initially.

  7. MODIS-Based Estimation of Terrestrial Latent Heat Flux over North America Using Three Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Xuanyu Wang

    2017-12-01

    Full Text Available Terrestrial latent heat flux (LE is a key component of the global terrestrial water, energy, and carbon exchanges. Accurate estimation of LE from moderate resolution imaging spectroradiometer (MODIS data remains a major challenge. In this study, we estimated the daily LE for different plant functional types (PFTs across North America using three machine learning algorithms: artificial neural network (ANN; support vector machines (SVM; and, multivariate adaptive regression spline (MARS driven by MODIS and Modern Era Retrospective Analysis for Research and Applications (MERRA meteorology data. These three predictive algorithms, which were trained and validated using observed LE over the period 2000–2007, all proved to be accurate. However, ANN outperformed the other two algorithms for the majority of the tested configurations for most PFTs and was the only method that arrived at 80% precision for LE estimation. We also applied three machine learning algorithms for MODIS data and MERRA meteorology to map the average annual terrestrial LE of North America during 2002–2004 using a spatial resolution of 0.05°, which proved to be useful for estimating the long-term LE over North America.

  8. Model-Based Collaborative Filtering Analysis of Student Response Data: Machine-Learning Item Response Theory

    Science.gov (United States)

    Bergner, Yoav; Droschler, Stefan; Kortemeyer, Gerd; Rayyan, Saif; Seaton, Daniel; Pritchard, David E.

    2012-01-01

    We apply collaborative filtering (CF) to dichotomously scored student response data (right, wrong, or no interaction), finding optimal parameters for each student and item based on cross-validated prediction accuracy. The approach is naturally suited to comparing different models, both unidimensional and multidimensional in ability, including a…

  9. Machine-Learning Identification of Airborne UAV-UEs Based on LTE Radio Measurements

    DEFF Research Database (Denmark)

    Amorim, Rafhael Medeiros de; Wigard, Jeroen; Nguyen, Huan Cong

    2017-01-01

    , which use standard LTE measurements from the UE as input, for detecting the presence of airborne users in the network. The algorithms are evaluated based on measurements done with mobile phones attached under a flying drone and on a car. Results are discussed showing the advantages and drawbacks...

  10. MACHINE LEARNING TECHNIQUES USED IN BIG DATA

    Directory of Open Access Journals (Sweden)

    STEFANIA LOREDANA NITA

    2016-07-01

    Full Text Available The classical tools used in data analysis are not enough in order to benefit of all advantages of big data. The amount of information is too large for a complete investigation, and the possible connections and relations between data could be missed, because it is difficult or even impossible to verify all assumption over the information. Machine learning is a great solution in order to find concealed correlations or relationships between data, because it runs at scale machine and works very well with large data sets. The more data we have, the more the machine learning algorithm is useful, because it “learns” from the existing data and applies the found rules on new entries. In this paper, we present some machine learning algorithms and techniques used in big data.

  11. Distributed Extreme Learning Machine for Nonlinear Learning over Network

    Directory of Open Access Journals (Sweden)

    Songyan Huang

    2015-02-01

    Full Text Available Distributed data collection and analysis over a network are ubiquitous, especially over a wireless sensor network (WSN. To our knowledge, the data model used in most of the distributed algorithms is linear. However, in real applications, the linearity of systems is not always guaranteed. In nonlinear cases, the single hidden layer feedforward neural network (SLFN with radial basis function (RBF hidden neurons has the ability to approximate any continuous functions and, thus, may be used as the nonlinear learning system. However, confined by the communication cost, using the distributed version of the conventional algorithms to train the neural network directly is usually prohibited. Fortunately, based on the theorems provided in the extreme learning machine (ELM literature, we only need to compute the output weights of the SLFN. Computing the output weights itself is a linear learning problem, although the input-output mapping of the overall SLFN is still nonlinear. Using the distributed algorithmto cooperatively compute the output weights of the SLFN, we obtain a distributed extreme learning machine (dELM for nonlinear learning in this paper. This dELM is applied to the regression problem and classification problem to demonstrate its effectiveness and advantages.

  12. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  13. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    Science.gov (United States)

    Tangaro, Sabina; Amoroso, Nicola; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Paolo, Inglese; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.

  14. Machine learning assisted first-principles calculation of multicomponent solid solutions: estimation of interface energy in Ni-based superalloys

    Science.gov (United States)

    Chandran, Mahesh; Lee, S. C.; Shim, Jae-Hyeok

    2018-02-01

    A disordered configuration of atoms in a multicomponent solid solution presents a computational challenge for first-principles calculations using density functional theory (DFT). The challenge is in identifying the few probable (low energy) configurations from a large configurational space before DFT calculation can be performed. The search for these probable configurations is possible if the configurational energy E({\\boldsymbol{σ }}) can be calculated accurately and rapidly (with a negligibly small computational cost). In this paper, we demonstrate such a possibility by constructing a machine learning (ML) model for E({\\boldsymbol{σ }}) trained with DFT-calculated energies. The feature vector for the ML model is formed by concatenating histograms of pair and triplet (only equilateral triangle) correlation functions, {g}(2)(r) and {g}(3)(r,r,r), respectively. These functions are a quantitative ‘fingerprint’ of the spatial arrangement of atoms, familiar in the field of amorphous materials and liquids. The ML model is used to generate an accurate distribution P(E({\\boldsymbol{σ }})) by rapidly spanning a large number of configurations. The P(E) contains full configurational information of the solid solution and can be selectively sampled to choose a few configurations for targeted DFT calculations. This new framework is employed to estimate (100) interface energy ({σ }{{IE}}) between γ and γ \\prime at 700 °C in Alloy 617, a Ni-based superalloy, with composition reduced to five components. The estimated {σ }{{IE}} ≈ 25.95 mJ m-2 is in good agreement with the value inferred by the precipitation model fit to experimental data. The proposed new ML-based ab initio framework can be applied to calculate the parameters and properties of alloys with any number of components, thus widening the reach of first-principles calculation to realistic compositions of industrially relevant materials and alloys.

  15. Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning

    Science.gov (United States)

    Bereau, Tristan; DiStasio, Robert A.; Tkatchenko, Alexandre; von Lilienfeld, O. Anatole

    2018-06-01

    Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions—electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters—optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically relevant molecules. We further focus on hydrogen-bonded complexes—essential but challenging due to their directional nature—where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML for denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.

  16. Machine learning-based Landsat-MODIS data fusion approach for 8-day 30m evapotranspiration monitoring

    Science.gov (United States)

    Im, J.; Ke, Y.; Park, S.

    2016-12-01

    Continuous monitoring of evapotranspiration (ET) is important for understanding of hydrological cycles and energy flux dynamics. At regional and local scales, routine ET estimation is a critical for efficient water management, drought impact assessment and ecosystem health monitoring, etc. Remote sensing has long been recognized to be able to provide ET monitoring over large areas. However, no single satellite could provide temporally continuous ET at relatively high spatial resolution due to the trade-off between the spatial and temporal resolution of current satellite sensors. Landsat-series satellites provide optical and thermal imagery at 30-100m resolution, whereas the 16-day revisit cycle hinders the observation of ET dynamics; MODIS provides sources of ET estimation at daily basis, but the 500-1000m ground sampling distance is too coarse for field level applications. In this study, we present a machine learning and STARFM based method for Landsat/MODIS ET fusion. The approach first downscales MODIS 8-day 1km ET (MOD16A2) to 30m based on eleven Landsat-derived indicators such as NDVI, EVI, NDWI etc on the cloud-free Landsat-available days using Random Forest approach. For the days when Landsat data are not available, downscaled ET is synthesized by MODIS and Landsat data fusion with STARFM and STI-FM approaches. The models are evaluated using in situ flux tower measurements at US-ARM and US-Twt AmeriFlux sites the United States. Results show that the downscaled 30m ET have good agreement with MODIS ET (RMSE=0.42-3.4mm/8days, rRMSE=3.2%-26%) and the downscaled ET have higher accuracy than MODIS ET when compared to in-situ measurements.

  17. Rainfall Prediction of Indian Peninsula: Comparison of Time Series Based Approach and Predictor Based Approach using Machine Learning Techniques

    Science.gov (United States)

    Dash, Y.; Mishra, S. K.; Panigrahi, B. K.

    2017-12-01

    Prediction of northeast/post monsoon rainfall which occur during October, November and December (OND) over Indian peninsula is a challenging task due to the dynamic nature of uncertain chaotic climate. It is imperative to elucidate this issue by examining performance of different machine leaning (ML) approaches. The prime objective of this research is to compare between a) statistical prediction using historical rainfall observations and global atmosphere-ocean predictors like Sea Surface Temperature (SST) and Sea Level Pressure (SLP) and b) empirical prediction based on a time series analysis of past rainfall data without using any other predictors. Initially, ML techniques have been applied on SST and SLP data (1948-2014) obtained from NCEP/NCAR reanalysis monthly mean provided by the NOAA ESRL PSD. Later, this study investigated the applicability of ML methods using OND rainfall time series for 1948-2014 and forecasted up to 2018. The predicted values of aforementioned methods were verified using observed time series data collected from Indian Institute of Tropical Meteorology and the result revealed good performance of ML algorithms with minimal error scores. Thus, it is found that both statistical and empirical methods are useful for long range climatic projections.

  18. IRB Process Improvements: A Machine Learning Analysis.

    Science.gov (United States)

    Shoenbill, Kimberly; Song, Yiqiang; Cobb, Nichelle L; Drezner, Marc K; Mendonca, Eneida A

    2017-06-01

    Clinical research involving humans is critically important, but it is a lengthy and expensive process. Most studies require institutional review board (IRB) approval. Our objective is to identify predictors of delays or accelerations in the IRB review process and apply this knowledge to inform process change in an effort to improve IRB efficiency, transparency, consistency and communication. We analyzed timelines of protocol submissions to determine protocol or IRB characteristics associated with different processing times. Our evaluation included single variable analysis to identify significant predictors of IRB processing time and machine learning methods to predict processing times through the IRB review system. Based on initial identified predictors, changes to IRB workflow and staffing procedures were instituted and we repeated our analysis. Our analysis identified several predictors of delays in the IRB review process including type of IRB review to be conducted, whether a protocol falls under Veteran's Administration purview and specific staff in charge of a protocol's review. We have identified several predictors of delays in IRB protocol review processing times using statistical and machine learning methods. Application of this knowledge to process improvement efforts in two IRBs has led to increased efficiency in protocol review. The workflow and system enhancements that are being made support our four-part goal of improving IRB efficiency, consistency, transparency, and communication.

  19. Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening.

    Science.gov (United States)

    Ericksen, Spencer S; Wu, Haozhen; Zhang, Huikun; Michael, Lauren A; Newton, Michael A; Hoffmann, F Michael; Wildman, Scott A

    2017-07-24

    In structure-based virtual screening, compound ranking through a consensus of scores from a variety of docking programs or scoring functions, rather than ranking by scores from a single program, provides better predictive performance and reduces target performance variability. Here we compare traditional consensus scoring methods with a novel, unsupervised gradient boosting approach. We also observed increased score variation among active ligands and developed a statistical mixture model consensus score based on combining score means and variances. To evaluate performance, we used the common performance metrics ROCAUC and EF1 on 21 benchmark targets from DUD-E. Traditional consensus methods, such as taking the mean of quantile normalized docking scores, outperformed individual docking methods and are more robust to target variation. The mixture model and gradient boosting provided further improvements over the traditional consensus methods. These methods are readily applicable to new targets in academic research and overcome the potentially poor performance of using a single docking method on a new target.

  20. Securing Metering Infrastructure of Smart Grid: A Machine Learning and Localization Based Key Management Approach

    Directory of Open Access Journals (Sweden)

    Imtiaz Parvez

    2016-08-01

    Full Text Available In smart cities, advanced metering infrastructure (AMI of the smart grid facilitates automated metering, control and monitoring of power distribution by employing a wireless network. Due to this wireless nature of communication, there exist potential threats to the data privacy in AMI. Decoding the energy consumption reading, injecting false data/command signals and jamming the networks are some hazardous measures against this technology. Since a smart meter possesses limited memory and computational capability, AMI demands a light, but robust security scheme. In this paper, we propose a localization-based key management system for meter data encryption. Data are encrypted by the key associated with the coordinate of the meter and a random key index. The encryption keys are managed and distributed by a trusted third party (TTP. Localization of the meter is proposed by a method based on received signal strength (RSS using the maximum likelihood estimator (MLE. The received packets are decrypted at the control center with the key mapped with the key index and the meter’s coordinates. Additionally, we propose the k-nearest neighbors (kNN algorithm for node/meter authentication, capitalizing further on data transmission security. Finally, we evaluate the security strength of a data packet numerically for our method.

  1. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development.

    Science.gov (United States)

    Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

    2015-01-01

    Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.

  2. MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

    Directory of Open Access Journals (Sweden)

    Yang Liu

    2015-01-01

    Full Text Available Artificial neural networks (ANNs have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

  3. MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.

    Science.gov (United States)

    Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

    2015-01-01

    Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

  4. Machine-based mapping of innovation portfolios

    NARCIS (Netherlands)

    de Visser, Matthias; Miao, Shengfa; Englebienne, Gwenn; Sools, Anna Maria; Visscher, Klaasjan

    2017-01-01

    Machine learning techniques show a great promise for improving innovation portfolio management. In this paper we experiment with different methods to classify innovation projects of a high-tech firm as either explorative or exploitative, and compare the results with a manual, theory-based mapping of

  5. A distributed algorithm for machine learning

    Science.gov (United States)

    Chen, Shihong

    2018-04-01

    This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.

  6. Interactive Algorithms for Unsupervised Machine Learning

    Science.gov (United States)

    2015-06-01

    in Neural Information Processing Systems, 2013. 14 [3] Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi. On combinato- rial...Myung Jin Choi, Vincent Y F Tan , Animashree Anandkumar, and Alan S Willsky. Learn- ing Latent Tree Graphical Models. Journal of Machine Learning

  7. Efficient tuning in supervised machine learning

    NARCIS (Netherlands)

    Koch, Patrick

    2013-01-01

    The tuning of learning algorithm parameters has become more and more important during the last years. With the fast growth of computational power and available memory databases have grown dramatically. This is very challenging for the tuning of parameters arising in machine learning, since the

  8. Machine learning-based screening of complex molecules for polymer solar cells

    Science.gov (United States)

    Jørgensen, Peter Bjørn; Mesta, Murat; Shil, Suranjan; García Lastra, Juan Maria; Jacobsen, Karsten Wedel; Thygesen, Kristian Sommer; Schmidt, Mikkel N.

    2018-06-01

    Polymer solar cells admit numerous potential advantages including low energy payback time and scalable high-speed manufacturing, but the power conversion efficiency is currently lower than for their inorganic counterparts. In a Phenyl-C_61-Butyric-Acid-Methyl-Ester (PCBM)-based blended polymer solar cell, the optical gap of the polymer and the energetic alignment of the lowest unoccupied molecular orbital (LUMO) of the polymer and the PCBM are crucial for the device efficiency. Searching for new and better materials for polymer solar cells is a computationally costly affair using density functional theory (DFT) calculations. In this work, we propose a screening procedure using a simple string representation for a promising class of donor-acceptor polymers in conjunction with a grammar variational autoencoder. The model is trained on a dataset of 3989 monomers obtained from DFT calculations and is able to predict LUMO and the lowest optical transition energy for unseen molecules with mean absolute errors of 43 and 74 meV, respectively, without knowledge of the atomic positions. We demonstrate the merit of the model for generating new molecules with the desired LUMO and optical gap energies which increases the chance of finding suitable polymers by more than a factor of five in comparison to the randomised search used in gathering the training set.

  9. Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.

    Science.gov (United States)

    Ekins, Sean; Madrid, Peter B; Sarker, Malabika; Li, Shao-Gang; Mittal, Nisha; Kumar, Pradeep; Wang, Xin; Stratton, Thomas P; Zimmerman, Matthew; Talcott, Carolyn; Bourbon, Pauline; Travers, Mike; Yadav, Maneesh; Freundlich, Joel S

    2015-01-01

    Integrated computational approaches for Mycobacterium tuberculosis (Mtb) are useful to identify new molecules that could lead to future tuberculosis (TB) drugs. Our approach uses information derived from the TBCyc pathway and genome database, the Collaborative Drug Discovery TB database combined with 3D pharmacophores and dual event Bayesian models of whole-cell activity and lack of cytotoxicity. We have prioritized a large number of molecules that may act as mimics of substrates and metabolites in the TB metabolome. We computationally searched over 200,000 commercial molecules using 66 pharmacophores based on substrates and metabolites from Mtb and further filtering with Bayesian models. We ultimately tested 110 compounds in vitro that resulted in two compounds of interest, BAS 04912643 and BAS 00623753 (MIC of 2.5 and 5 μg/mL, respectively). These molecules were used as a starting point for hit-to-lead optimization. The most promising class proved to be the quinoxaline di-N-oxides, evidenced by transcriptional profiling to induce mRNA level perturbations most closely resembling known protonophores. One of these, SRI58 exhibited an MIC = 1.25 μg/mL versus Mtb and a CC50 in Vero cells of >40 μg/mL, while featuring fair Caco-2 A-B permeability (2.3 x 10-6 cm/s), kinetic solubility (125 μM at pH 7.4 in PBS) and mouse metabolic stability (63.6% remaining after 1 h incubation with mouse liver microsomes). Despite demonstration of how a combined bioinformatics/cheminformatics approach afforded a small molecule with promising in vitro profiles, we found that SRI58 did not exhibit quantifiable blood levels in mice.

  10. Machine Learning Phases of Strongly Correlated Fermions

    Directory of Open Access Journals (Sweden)

    Kelvin Ch’ng

    2017-08-01

    Full Text Available Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling. We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  11. PACE: Probabilistic Assessment for Contributor Estimation- A machine learning-based assessment of the number of contributors in DNA mixtures.

    Science.gov (United States)

    Marciano, Michael A; Adelman, Jonathan D

    2017-03-01

    The deconvolution of DNA mixtures remains one of the most critical challenges in the field of forensic DNA analysis. In addition, of all the data features required to perform such deconvolution, the number of contributors in the sample is widely considered the most important, and, if incorrectly chosen, the most likely to negatively influence the mixture interpretation of a DNA profile. Unfortunately, most current approaches to mixture deconvolution require the assumption that the number of contributors is known by the analyst, an assumption that can prove to be especially faulty when faced with increasingly complex mixtures of 3 or more contributors. In this study, we propose a probabilistic approach for estimating the number of contributors in a DNA mixture that leverages the strengths of machine learning. To assess this approach, we compare classification performances of six machine learning algorithms and evaluate the model from the top-performing algorithm against the current state of the art in the field of contributor number classification. Overall results show over 98% accuracy in identifying the number of contributors in a DNA mixture of up to 4 contributors. Comparative results showed 3-person mixtures had a classification accuracy improvement of over 6% compared to the current best-in-field methodology, and that 4-person mixtures had a classification accuracy improvement of over 20%. The Probabilistic Assessment for Contributor Estimation (PACE) also accomplishes classification of mixtures of up to 4 contributors in less than 1s using a standard laptop or desktop computer. Considering the high classification accuracy rates, as well as the significant time commitment required by the current state of the art model versus seconds required by a machine learning-derived model, the approach described herein provides a promising means of estimating the number of contributors and, subsequently, will lead to improved DNA mixture interpretation. Copyright © 2016

  12. In silico machine learning methods in drug development.

    Science.gov (United States)

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases.

  13. Short-Term Electricity-Load Forecasting Using a TSK-Based Extreme Learning Machine with Knowledge Representation

    Directory of Open Access Journals (Sweden)

    Chan-Uk Yeom

    2017-10-01

    Full Text Available This paper discusses short-term electricity-load forecasting using an extreme learning machine (ELM with automatic knowledge representation from a given input-output data set. For this purpose, we use a Takagi-Sugeno-Kang (TSK-based ELM to develop a systematic approach to generating if-then rules, while the conventional ELM operates without knowledge information. The TSK-ELM design includes a two-phase development. First, we generate an initial random-partition matrix and estimate cluster centers for random clustering. The obtained cluster centers are used to determine the premise parameters of fuzzy if-then rules. Next, the linear weights of the TSK fuzzy type are estimated using the least squares estimate (LSE method. These linear weights are used as the consequent parameters in the TSK-ELM design. The experiments were performed on short-term electricity-load data for forecasting. The electricity-load data were used to forecast hourly day-ahead loads given temperature forecasts; holiday information; and historical loads from the New England ISO. In order to quantify the performance of the forecaster, we use metrics and statistical characteristics such as root mean squared error (RMSE as well as mean absolute error (MAE, mean absolute percent error (MAPE, and R-squared, respectively. The experimental results revealed that the proposed method showed good performance when compared with a conventional ELM with four activation functions such sigmoid, sine, radial basis function, and rectified linear unit (ReLU. It possessed superior prediction performance and knowledge information and a small number of rules.

  14. Predicting the dissolution kinetics of silicate glasses using machine learning

    Science.gov (United States)

    Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu

    2018-05-01

    Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.

  15. Prostate Cancer Probability Prediction By Machine Learning Technique.

    Science.gov (United States)

    Jović, Srđan; Miljković, Milica; Ivanović, Miljan; Šaranović, Milena; Arsić, Milena

    2017-11-26

    The main goal of the study was to explore possibility of prostate cancer prediction by machine learning techniques. In order to improve the survival probability of the prostate cancer patients it is essential to make suitable prediction models of the prostate cancer. If one make relevant prediction of the prostate cancer it is easy to create suitable treatment based on the prediction results. Machine learning techniques are the most common techniques for the creation of the predictive models. Therefore in this study several machine techniques were applied and compared. The obtained results were analyzed and discussed. It was concluded that the machine learning techniques could be used for the relevant prediction of prostate cancer.

  16. Inverse analysis of turbidites by machine learning

    Science.gov (United States)

    Naruse, H.; Nakao, K.

    2017-12-01

    This study aims to propose a method to estimate paleo-hydraulic conditions of turbidity currents from ancient turbidites by using machine-learning technique. In this method, numerical simulation was repeated under various initial conditions, which produces a data set of characteristic features of turbidites. Then, this data set of turbidites is used for supervised training of a deep-learning neural network (NN). Quantities of characteristic features of turbidites in the training data set are given to input nodes of NN, and output nodes are expected to provide the estimates of initial condition of the turbidity current. The optimization of weight coefficients of NN is then conducted to reduce root-mean-square of the difference between the true conditions and the output values of NN. The empirical relationship with numerical results and the initial conditions is explored in this method, and the discovered relationship is used for inversion of turbidity currents. This machine learning can potentially produce NN that estimates paleo-hydraulic conditions from data of ancient turbidites. We produced a preliminary implementation of this methodology. A forward model based on 1D shallow-water equations with a correction of density-stratification effect was employed. This model calculates a behavior of a surge-like turbidity current transporting mixed-size sediment, and outputs spatial distribution of volume per unit area of each grain-size class on the uniform slope. Grain-size distribution was discretized 3 classes. Numerical simulation was repeated 1000 times, and thus 1000 beds of turbidites were used as the training data for NN that has 21000 input nodes and 5 output nodes with two hidden-layers. After the machine learning finished, independent simulations were conducted 200 times in order to evaluate the performance of NN. As a result of this test, the initial conditions of validation data were successfully reconstructed by NN. The estimated values show very small

  17. Machine learning molecular dynamics for the simulation of infrared spectra.

    Science.gov (United States)

    Gastegger, Michael; Behler, Jörg; Marquetand, Philipp

    2017-10-01

    Machine learning has emerged as an invaluable tool in many research areas. In the present work, we harness this power to predict highly accurate molecular infrared spectra with unprecedented computational efficiency. To account for vibrational anharmonic and dynamical effects - typically neglected by conventional quantum chemistry approaches - we base our machine learning strategy on ab initio molecular dynamics simulations. While these simulations are usually extremely time consuming even for small molecules, we overcome these limitations by leveraging the power of a variety of machine learning techniques, not only accelerating simulations by several orders of magnitude, but also greatly extending the size of systems that can be treated. To this end, we develop a molecular dipole moment model based on environment dependent neural network charges and combine it with the neural network potential approach of Behler and Parrinello. Contrary to the prevalent big data philosophy, we are able to obtain very accurate machine learning models for the prediction of infrared spectra based on only a few hundreds of electronic structure reference points. This is made possible through the use of molecular forces during neural network potential training and the introduction of a fully automated sampling scheme. We demonstrate the power of our machine learning approach by applying it to model the infrared spectra of a methanol molecule, n -alkanes containing up to 200 atoms and the protonated alanine tripeptide, which at the same time represents the first application of machine learning techniques to simulate the dynamics of a peptide. In all of these case studies we find an excellent agreement between the infrared spectra predicted via machine learning models and the respective theoretical and experimental spectra.

  18. The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening.

    Science.gov (United States)

    Kurczab, Rafał; Bojarski, Andrzej J

    2017-01-01

    The machine learning-based virtual screening of molecular databases is a commonly used approach to identify hits. However, many aspects associated with training predictive models can influence the final performance and, consequently, the number of hits found. Thus, we performed a systematic study of the simultaneous influence of the proportion of negatives to positives in the testing set, the size of screening databases and the type of molecular representations on the effectiveness of classification. The results obtained for eight protein targets, five machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest), two types of molecular fingerprints (MACCS and CDK FP) and eight screening databases with different numbers of molecules confirmed our previous findings that increases in the ratio of negative to positive training instances greatly influenced most of the investigated parameters of the ML methods in simulated virtual screening experiments. However, the performance of screening was shown to also be highly dependent on the molecular library dimension. Generally, with the increasing size of the screened database, the optimal training ratio also increased, and this ratio can be rationalized using the proposed cost-effectiveness threshold approach. To increase the performance of machine learning-based virtual screening, the training set should be constructed in a way that considers the size of the screening database.

  19. Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

    Science.gov (United States)

    Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

    2017-05-01

    Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.

  20. Less is more: regularization perspectives on large scale machine learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Deep learning based techniques provide a possible solution at the expanse of theoretical guidance and, especially, of computational requirements. It is then a key challenge for large scale machine learning to devise approaches guaranteed to be accurate and yet computationally efficient. In this talk, we will consider a regularization perspectives on machine learning appealing to classical ideas in linear algebra and inverse problems to scale-up dramatically nonparametric methods such as kernel methods, often dismissed because of prohibitive costs. Our analysis derives optimal theoretical guarantees while providing experimental results at par or out-performing state of the art approaches.

  1. Machine Learning Based Dimensionality Reduction Facilitates Ligand Diffusion Paths Assessment: A Case of Cytochrome P450cam.

    Science.gov (United States)

    Rydzewski, J; Nowak, W

    2016-04-12

    In this work we propose an application of a nonlinear dimensionality reduction method to represent the high-dimensional configuration space of the ligand-protein dissociation process in a manner facilitating interpretation. Rugged ligand expulsion paths are mapped into 2-dimensional space. The mapping retains the main structural changes occurring during the dissociation. The topological similarity of the reduced paths may be easily studied using the Fréchet distances, and we show that this measure facilitates machine learning classification of the diffusion pathways. Further, low-dimensional configuration space allows for identification of residues active in transport during the ligand diffusion from a protein. The utility of this approach is illustrated by examination of the configuration space of cytochrome P450cam involved in expulsing camphor by means of enhanced all-atom molecular dynamics simulations. The expulsion trajectories are sampled and constructed on-the-fly during molecular dynamics simulations using the recently developed memetic algorithms [ Rydzewski, J.; Nowak, W. J. Chem. Phys. 2015 , 143 ( 12 ), 124101 ]. We show that the memetic algorithms are effective for enforcing the ligand diffusion and cavity exploration in the P450cam-camphor complex. Furthermore, we demonstrate that machine learning techniques are helpful in inspecting ligand diffusion landscapes and provide useful tools to examine structural changes accompanying rare events.

  2. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  3. Introduction to machine learning for brain imaging.

    Science.gov (United States)

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.

  4. Machine learning in the string landscape

    Science.gov (United States)

    Carifio, Jonathan; Halverson, James; Krioukov, Dmitri; Nelson, Brent D.

    2017-09-01

    We utilize machine learning to study the string landscape. Deep data dives and conjecture generation are proposed as useful frameworks for utilizing machine learning in the landscape, and examples of each are presented. A decision tree accurately predicts the number of weak Fano toric threefolds arising from reflexive polytopes, each of which determines a smooth F-theory compactification, and linear regression generates a previously proven conjecture for the gauge group rank in an ensemble of 4/3× 2.96× {10}^{755} F-theory compactifications. Logistic regression generates a new conjecture for when E 6 arises in the large ensemble of F-theory compactifications, which is then rigorously proven. This result may be relevant for the appearance of visible sectors in the ensemble. Through conjecture generation, machine learning is useful not only for numerics, but also for rigorous results.

  5. Virtual Things for Machine Learning Applications

    OpenAIRE

    Bovet , Gérôme; Ridi , Antonio; Hennebert , Jean

    2014-01-01

    International audience; Internet-of-Things (IoT) devices, especially sensors are pro-ducing large quantities of data that can be used for gather-ing knowledge. In this field, machine learning technologies are increasingly used to build versatile data-driven models. In this paper, we present a novel architecture able to ex-ecute machine learning algorithms within the sensor net-work, presenting advantages in terms of privacy and data transfer efficiency. We first argument that some classes of ...

  6. Application of machine learning methods in bioinformatics

    Science.gov (United States)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  7. Recent Advances in Predictive (Machine) Learning

    Energy Technology Data Exchange (ETDEWEB)

    Friedman, J

    2004-01-24

    Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

  8. Evaluating Machine Learning-Based Automated Personalized Daily Step Goals Delivered Through a Mobile Phone App: Randomized Controlled Trial.

    Science.gov (United States)

    Zhou, Mo; Fukuoka, Yoshimi; Mintz, Yonatan; Goldberg, Ken; Kaminsky, Philip; Flowers, Elena; Aswani, Anil

    2018-01-25

    Growing evidence shows that fixed, nonpersonalized daily step goals can discourage individuals, resulting in unchanged or even reduced physical activity. The aim of this randomized controlled trial (RCT) was to evaluate the efficacy of an automated mobile phone-based personalized and adaptive goal-setting intervention using machine learning as compared with an active control with steady daily step goals of 10,000. In this 10-week RCT, 64 participants were recruited via email announcements and were required to attend an initial in-person session. The participants were randomized into either the intervention or active control group with a one-to-one ratio after a run-in period for data collection. A study-developed mobile phone app (which delivers daily step goals using push notifications and allows real-time physical activity monitoring) was installed on each participant's mobile phone, and participants were asked to keep their phone in a pocket throughout the entire day. Through the app, the intervention group received fully automated adaptively personalized daily step goals, and the control group received constant step goals of 10,000 steps per day. Daily step count was objectively measured by the study-developed mobile phone app. The mean (SD) age of participants was 41.1 (11.3) years, and 83% (53/64) of participants were female. The baseline demographics between the 2 groups were similar (P>.05). Participants in the intervention group (n=34) had a decrease in mean (SD) daily step count of 390 (490) steps between run-in and 10 weeks, compared with a decrease of 1350 (420) steps among control participants (n=30; P=.03). The net difference in daily steps between the groups was 960 steps (95% CI 90-1830 steps). Both groups had a decrease in daily step count between run-in and 10 weeks because interventions were also provided during run-in and no natural baseline was collected. The results showed the short-term efficacy of this intervention, which should be formally

  9. Annual sums of carbon dioxide exchange over a heterogeneous urban landscape through machine learning based gap-filling

    Science.gov (United States)

    Menzer, Olaf; Meiring, Wendy; Kyriakidis, Phaedon C.; McFadden, Joseph P.

    2015-01-01

    A small, but growing, number of flux towers in urban environments measure surface-atmospheric exchanges of carbon dioxide by the eddy covariance method. As in all eddy covariance studies, obtaining annual sums of urban CO2 exchange requires imputation of data gaps due to low turbulence and non-stationary conditions, adverse weather, and instrument failures. Gap-filling approaches that are widely used for measurements from towers in natural vegetation are based on light and temperature response models. However, they do not account for key features of the urban environment including tower footprint heterogeneity and localized CO2 sources. Here, we present a novel gap-filling modeling framework that uses machine learning to select explanatory variables, such as continuous traffic counts and temporal variables, and then constrains models separately for spatially classified subsets of the data. We applied the modeling framework to a three year time series of measurements from a tall broadcast tower in a suburban neighborhood of Minneapolis-Saint Paul, Minnesota, USA. The gap-filling performance was similar to that reported for natural measurement sites, explaining 64% to 88% of the variability in the fluxes. Simulated carbon budgets were in good agreement with an ecophysiological bottom-up study at the same site. Total annual carbon dioxide flux sums for the tower site ranged from 1064 to 1382 g C m-2 yr-1, across different years and different gap-filling methods. Bias errors of annual sums resulting from gap-filling did not exceed 18 g C m-2 yr-1 and random uncertainties did not exceed ±44 g C m-2 yr-1 (or ±3.8% of the annual flux). Regardless of the gap-filling method used, the year-to-year differences in carbon exchange at this site were small. In contrast, the modeled annual sums of CO2 exchange differed by a factor of two depending on wind direction. This indicated that the modeled time series captured the spatial variability in both the biogenic and

  10. Tracking by Machine Learning Methods

    CERN Document Server

    Jofrehei, Arash

    2015-01-01

    Current track reconstructing methods start with two points and then for each layer loop through all possible hits to find proper hits to add to that track. Another idea would be to use this large number of already reconstructed events and/or simulated data and train a machine on this data to find tracks given hit pixels. Training time could be long but real time tracking is really fast Simulation might not be as realistic as real data but tacking has been done for that with 100 percent efficiency while by using real data we would probably be limited to current efficiency.

  11. Machine learning with quantum relative entropy

    Energy Technology Data Exchange (ETDEWEB)

    Tsuda, Koji [Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tuebingen, 72076 (Germany)], E-mail: koji.tsuda@tuebingen.mpg.de

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  12. Machine learning with quantum relative entropy

    International Nuclear Information System (INIS)

    Tsuda, Koji

    2009-01-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  13. ERD-based online brain-machine interfaces (BMI) in the context of neurorehabilitation: optimizing BMI learning and performance.

    Science.gov (United States)

    Soekadar, Surjo R; Witkowski, Matthias; Mellinger, Jürgen; Ramos, Ander; Birbaumer, Niels; Cohen, Leonardo G

    2011-10-01

    Event-related desynchronization (ERD) of sensori-motor rhythms (SMR) can be used for online brain-machine interface (BMI) control, but yields challenges related to the stability of ERD and feedback strategy to optimize BMI learning.Here, we compared two approaches to this challenge in 20 right-handed healthy subjects (HS, five sessions each, S1-S5) and four stroke patients (SP, 15 sessions each, S1-S15). ERD was recorded from a 275-sensor MEG system. During daily training,motor imagery-induced ERD led to visual and proprioceptive feedback delivered through an orthotic device attached to the subjects' hand and fingers. Group A trained with a heterogeneous reference value (RV) for ERD detection with binary feedback and Group B with a homogenous RV and graded feedback (10 HS and 2 SP in each group). HS in Group B showed better BMI performance than Group A (p learning was significantly better (p learning relative to use of a heterogeneous RV and binary feedback.

  14. Quantum learning and universal quantum matching machine

    International Nuclear Information System (INIS)

    Sasaki, Masahide; Carlini, Alberto

    2002-01-01

    Suppose that three kinds of quantum systems are given in some unknown states vertical bar f> xN , vertical bar g 1 > xK , and vertical bar g 2 > xK , and we want to decide which template state vertical bar g 1 > or vertical bar g 2 >, each representing the feature of the pattern class C 1 or C 2 , respectively, is closest to the input feature state vertical bar f>. This is an extension of the pattern matching problem into the quantum domain. Assuming that these states are known a priori to belong to a certain parametric family of pure qubit systems, we derive two kinds of matching strategies. The first one is a semiclassical strategy that is obtained by the natural extension of conventional matching strategies and consists of a two-stage procedure: identification (estimation) of the unknown template states to design the classifier (learning process to train the classifier) and classification of the input system into the appropriate pattern class based on the estimated results. The other is a fully quantum strategy without any intermediate measurement, which we might call as the universal quantum matching machine. We present the Bayes optimal solutions for both strategies in the case of K=1, showing that there certainly exists a fully quantum matching procedure that is strictly superior to the straightforward semiclassical extension of the conventional matching strategy based on the learning process

  15. Machine-learning-based classification of real-time tissue elastography for hepatic fibrosis in patients with chronic hepatitis B.

    Science.gov (United States)

    Chen, Yang; Luo, Yan; Huang, Wei; Hu, Die; Zheng, Rong-Qin; Cong, Shu-Zhen; Meng, Fan-Kun; Yang, Hong; Lin, Hong-Jun; Sun, Yan; Wang, Xiu-Yan; Wu, Tao; Ren, Jie; Pei, Shu-Fang; Zheng, Ying; He, Yun; Hu, Yu; Yang, Na; Yan, Hongmei

    2017-10-01

    Hepatic fibrosis is a common middle stage of the pathological processes of chronic liver diseases. Clinical intervention during the early stages of hepatic fibrosis can slow the development of liver cirrhosis and reduce the risk of developing liver cancer. Performing a liver biopsy, the gold standard for viral liver disease management, has drawbacks such as invasiveness and a relatively high sampling error rate. Real-time tissue elastography (RTE), one of the most recently developed technologies, might be promising imaging technology because it is both noninvasive and provides accurate assessments of hepatic fibrosis. However, determining the stage of liver fibrosis from RTE images in a clinic is a challenging task. In this study, in contrast to the previous liver fibrosis index (LFI) method, which predicts the stage of diagnosis using RTE images and multiple regression analysis, we employed four classical classifiers (i.e., Support Vector Machine, Naïve Bayes, Random Forest and K-Nearest Neighbor) to build a decision-support system to improve the hepatitis B stage diagnosis performance. Eleven RTE image features were obtained from 513 subjects who underwent liver biopsies in this multicenter collaborative research. The experimental results showed that the adopted classifiers significantly outperformed the LFI method and that the Random Forest(RF) classifier provided the highest average accuracy among the four machine algorithms. This result suggests that sophisticated machine-learning methods can be powerful tools for evaluating the stage of hepatic fibrosis and show promise for clinical applications. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Multiple machine learning based descriptive and predictive workflow for the identification of potential PTP1B inhibitors.

    Science.gov (United States)

    Chandra, Sharat; Pandey, Jyotsana; Tamrakar, Akhilesh Kumar; Siddiqi, Mohammad Imran

    2017-01-01

    In insulin and leptin signaling pathway, Protein-Tyrosine Phosphatase 1B (PTP1B) plays a crucial controlling role as a negative regulator, which makes it an attractive therapeutic target for both Type-2 Diabetes (T2D) and obesity. In this work, we have generated classification models by using the inhibition data set of known PTP1B inhibitors to identify new inhibitors of PTP1B utilizing multiple machine learning techniques like naïve Bayesian, random forest, support vector machine and k-nearest neighbors, along with structural fingerprints and selected molecular descriptors. Several models from each algorithm have been constructed and optimized, with the different combination of molecular descriptors and structural fingerprints. For the training and test sets, most of the predictive models showed more than 90% of overall prediction accuracies. The best model was obtained with support vector machine approach and has Matthews Correlation Coefficient of 0.82 for the external test set, which was further employed for the virtual screening of Maybridge small compound database. Five compounds were subsequently selected for experimental assay. Out of these two compounds were found to inhibit PTP1B with significant inhibitory activity in in-vitro inhibition assay. The structural fragments which are important for PTP1B inhibition were identified by naïve Bayesian method and can be further exploited to design new molecules around the identified scaffolds. The descriptive and predictive modeling strategy applied in this study is capable of identifying PTP1B inhibitors from the large compound libraries. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. A strategy for quantum algorithm design assisted by machine learning

    International Nuclear Information System (INIS)

    Bang, Jeongho; Lee, Jinhyoung; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin

    2014-01-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum–classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch–Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method. (paper)

  18. A strategy for quantum algorithm design assisted by machine learning

    Science.gov (United States)

    Bang, Jeongho; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin; Lee, Jinhyoung

    2014-07-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum-classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch-Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method.

  19. Machine learning applied to crime prediction

    OpenAIRE

    Vaquero Barnadas, Miquel

    2016-01-01

    Machine Learning is a cornerstone when it comes to artificial intelligence and big data analysis. It provides powerful algorithms that are capable of recognizing patterns, classifying data, and, basically, learn by themselves to perform a specific task. This field has incredibly grown in popularity these days, however, it still remains unknown for the majority of people, and even for most professionals. This project intends to provide an understandable explanation of what is it, what types ar...

  20. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    Science.gov (United States)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.