WorldWideScience

Sample records for machine learning frameworks

  1. A Machine Learning Based Framework for Adaptive Mobile Learning

    Science.gov (United States)

    Al-Hmouz, Ahmed; Shen, Jun; Yan, Jun

    Advances in wireless technology and handheld devices have created significant interest in mobile learning (m-learning) in recent years. Students nowadays are able to learn anywhere and at any time. Mobile learning environments must also cater for different user preferences and various devices with limited capability, where not all of the information is relevant and critical to each learning environment. To address this issue, this paper presents a framework that depicts the process of adapting learning content to satisfy individual learner characteristics by taking into consideration his/her learning style. We use a machine learning based algorithm for acquiring, representing, storing, reasoning and updating each learner acquired profile.

  2. A Machine Learning Framework for Plan Payment Risk Adjustment.

    Science.gov (United States)

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R(2) . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  3. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    Science.gov (United States)

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  4. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    Science.gov (United States)

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  5. Machine Learning

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  6. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    CERN Document Server

    Hassanzadeh, Hamed; 10.5121/ijwest.2011.2203

    2011-01-01

    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learn...

  7. Scalable Machine Learning Framework for Behavior-Based Access Control

    Science.gov (United States)

    2013-08-01

    Mahout [10] is an open-source project for scalable machine learning. It provide ready implementations for K-Means clustering following a MapReduce ...paradigm, but does not provide MapReduce implementations for SVMs, which are the most expensive models to train in BBAC. Massive Online Analysis

  8. ASAP: a machine learning framework for local protein properties

    Science.gov (United States)

    Brandes, Nadav; Ofer, Dan; Linial, Michal

    2016-01-01

    Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API. Database URL: ASAP’s and CleavePred source code, webtool and tutorials are available at: https://github.com/ddofer/asap; http://protonet.cs.huji.ac.il/cleavepred. PMID:27694209

  9. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  10. New Applications of Learning Machines

    DEFF Research Database (Denmark)

    Larsen, Jan

    * Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection......* Machine learning framework for sound search * Genre classification * Music separation * MIMO channel estimation and symbol detection...

  11. A machine learning-based framework to identify type 2 diabetes through electronic health records.

    Science.gov (United States)

    Zheng, Tao; Xie, Wei; Xu, Liling; He, Xiaoying; Zhang, Ya; You, Mingrong; Yang, Gong; Chen, You

    2017-01-01

    To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014. We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature

  12. Controlling misses and false alarms in a machine learning framework for predicting uniformity of printed pages

    Science.gov (United States)

    Nguyen, Minh Q.; Allebach, Jan P.

    2015-01-01

    In our previous work1 , we presented a block-based technique to analyze printed page uniformity both visually and metrically. The features learned from the models were then employed in a Support Vector Machine (SVM) framework to classify the pages into one of the two categories of acceptable and unacceptable quality. In this paper, we introduce a set of tools for machine learning in the assessment of printed page uniformity. This work is primarily targeted to the printing industry, specifically the ubiquitous laser, electrophotographic printer. We use features that are well-correlated with the rankings of expert observers to develop a novel machine learning framework that allows one to achieve the minimum "false alarm" rate, subject to a chosen "miss" rate. Surprisingly, most of the research that has been conducted on machine learning does not consider this framework. During the process of developing a new product, test engineers will print hundreds of test pages, which can be scanned and then analyzed by an autonomous algorithm. Among these pages, most may be of acceptable quality. The objective is to find the ones that are not. These will provide critically important information to systems designers, regarding issues that need to be addressed in improving the printer design. A "miss" is defined to be a page that is not of acceptable quality to an expert observer that the prediction algorithm declares to be a "pass". Misses are a serious problem, since they represent problems that will not be seen by the systems designers. On the other hand, "false alarms" correspond to pages that an expert observer would declare to be of acceptable quality, but which are flagged by the prediction algorithm as "fails". In a typical printer testing and development scenario, such pages would be examined by an expert, and found to be of acceptable quality after all. "False alarm" pages result in extra pages to be examined by expert observers, which increases labor cost. But "false

  13. Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.

    Science.gov (United States)

    Kagawa, Rina; Kawazoe, Yoshimasa; Ida, Yusuke; Shinohara, Emiko; Tanaka, Katsuya; Imai, Takeshi; Ohe, Kazuhiko

    2017-07-01

    Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects. We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects. We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value. The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects. We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users' objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.

  14. Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

    Science.gov (United States)

    Zheng, Shuai; Lu, James J; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A; Wang, Fusheng

    2017-05-09

    Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

  15. A generic trust framework for large-scale open systems using machine learning

    CERN Document Server

    Liu, Xin; Datta, Anwitaman

    2011-01-01

    In many large scale distributed systems and on the web, agents need to interact with other unknown agents to carry out some tasks or transactions. The ability to reason about and assess the potential risks in carrying out such transactions is essential for providing a safe and reliable environment. A traditional approach to reason about the trustworthiness of a transaction is to determine the trustworthiness of the specific agent involved, derived from the history of its behavior. As a departure from such traditional trust models, we propose a generic, machine learning approach based trust framework where an agent uses its own previous transactions (with other agents) to build a knowledge base, and utilize this to assess the trustworthiness of a transaction based on associated features, which are capable of distinguishing successful transactions from unsuccessful ones. These features are harnessed using appropriate machine learning algorithms to extract relationships between the potential transaction and prev...

  16. SMARTbot: A Behavioral Analysis Framework Augmented with Machine Learning to Identify Mobile Botnet Applications.

    Directory of Open Access Journals (Sweden)

    Ahmad Karim

    Full Text Available Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS, disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks' back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps' detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies.

  17. SMARTbot: A Behavioral Analysis Framework Augmented with Machine Learning to Identify Mobile Botnet Applications

    Science.gov (United States)

    Karim, Ahmad; Salleh, Rosli; Khan, Muhammad Khurram

    2016-01-01

    Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks’ back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps’ detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies. PMID:26978523

  18. SMARTbot: A Behavioral Analysis Framework Augmented with Machine Learning to Identify Mobile Botnet Applications.

    Science.gov (United States)

    Karim, Ahmad; Salleh, Rosli; Khan, Muhammad Khurram

    2016-01-01

    Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks' back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps' detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies.

  19. A Physics-Informed Machine Learning Framework for RANS-based Predictive Turbulence Modeling

    Science.gov (United States)

    Xiao, Heng; Wu, Jinlong; Wang, Jianxun; Ling, Julia

    2016-11-01

    Numerical models based on the Reynolds-averaged Navier-Stokes (RANS) equations are widely used in turbulent flow simulations in support of engineering design and optimization. In these models, turbulence modeling introduces significant uncertainties in the predictions. In light of the decades-long stagnation encountered by the traditional approach of turbulence model development, data-driven methods have been proposed as a promising alternative. We will present a data-driven, physics-informed machine-learning framework for predictive turbulence modeling based on RANS models. The framework consists of three components: (1) prediction of discrepancies in RANS modeled Reynolds stresses based on machine learning algorithms, (2) propagation of improved Reynolds stresses to quantities of interests with a modified RANS solver, and (3) quantitative, a priori assessment of predictive confidence based on distance metrics in the mean flow feature space. Merits of the proposed framework are demonstrated in a class of flows featuring massive separations. Significant improvements over the baseline RANS predictions are observed. The favorable results suggest that the proposed framework is a promising path toward RANS-based predictive turbulence in the era of big data. (SAND2016-7435 A).

  20. Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

    Science.gov (United States)

    Zheng, Shuai; Ghasemzadeh, Nima; Hayek, Salim S; Quyyumi, Arshed A

    2017-01-01

    Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. PMID:28487265

  1. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  2. Figure of merit for macrouniformity based on image quality ruler evaluation and machine learning framework

    Science.gov (United States)

    Wang, Weibao; Overall, Gary; Riggs, Travis; Silveston-Keith, Rebecca; Whitney, Julie; Chiu, George; Allebach, Jan P.

    2013-01-01

    Assessment of macro-uniformity is a capability that is important for the development and manufacture of printer products. Our goal is to develop a metric that will predict macro-uniformity, as judged by human subjects, by scanning and analyzing printed pages. We consider two different machine learning frameworks for the metric: linear regression and the support vector machine. We have implemented the image quality ruler, based on the recommendations of the INCITS W1.1 macro-uniformity team. Using 12 subjects at Purdue University and 20 subjects at Lexmark, evenly balanced with respect to gender, we conducted subjective evaluations with a set of 35 uniform b/w prints from seven different printers with five levels of tint coverage. Our results suggest that the image quality ruler method provides a reliable means to assess macro-uniformity. We then defined and implemented separate features to measure graininess, mottle, large area variation, jitter, and large-scale non-uniformity. The algorithms that we used are largely based on ISO image quality standards. Finally, we used these features computed for a set of test pages and the subjects' image quality ruler assessments of these pages to train the two different predictors - one based on linear regression and the other based on the support vector machine (SVM). Using five-fold cross-validation, we confirmed the efficacy of our predictor.

  3. Statistical and Machine-Learning Classifier Framework to Improve Pulse Shape Discrimination System Design

    Energy Technology Data Exchange (ETDEWEB)

    Wurtz, R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Kaplan, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-10-28

    Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-­realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-­building elements and their functions in a fully-­designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejection rate (GRR) relevant for realistic applications.

  4. A real-time phenotyping framework using machine learning for plant stress severity rating in soybean.

    Science.gov (United States)

    Naik, Hsiang Sing; Zhang, Jiaoping; Lofquist, Alec; Assefa, Teshale; Sarkar, Soumik; Ackerman, David; Singh, Arti; Singh, Asheesh K; Ganapathysubramanian, Baskar

    2017-01-01

    Phenotyping is a critical component of plant research. Accurate and precise trait collection, when integrated with genetic tools, can greatly accelerate the rate of genetic gain in crop improvement. However, efficient and automatic phenotyping of traits across large populations is a challenge; which is further exacerbated by the necessity of sampling multiple environments and growing replicated trials. A promising approach is to leverage current advances in imaging technology, data analytics and machine learning to enable automated and fast phenotyping and subsequent decision support. In this context, the workflow for phenotyping (image capture → data storage and curation → trait extraction → machine learning/classification → models/apps for decision support) has to be carefully designed and efficiently executed to minimize resource usage and maximize utility. We illustrate such an end-to-end phenotyping workflow for the case of plant stress severity phenotyping in soybean, with a specific focus on the rapid and automatic assessment of iron deficiency chlorosis (IDC) severity on thousands of field plots. We showcase this analytics framework by extracting IDC features from a set of ~4500 unique canopies representing a diverse germplasm base that have different levels of IDC, and subsequently training a variety of classification models to predict plant stress severity. The best classifier is then deployed as a smartphone app for rapid and real time severity rating in the field. We investigated 10 different classification approaches, with the best classifier being a hierarchical classifier with a mean per-class accuracy of ~96%. We construct a phenotypically meaningful 'population canopy graph', connecting the automatically extracted canopy trait features with plant stress severity rating. We incorporated this image capture → image processing → classification workflow into a smartphone app that enables automated real-time evaluation of IDC

  5. A Conceptual Framework over Contextual Analysis of Concept Learning within Human-Machine Interplays

    DEFF Research Database (Denmark)

    Badie, Farshad

    2016-01-01

    This research provides a contextual description concerning existential and structural analysis of ‘Relations’ between human beings and machines. Subsequently, it will focus on conceptual and epistemological analysis of (i) my own semantics-based framework [for human meaning construction] and of (...

  6. Machine Learning Markets

    CERN Document Server

    Storkey, Amos

    2011-01-01

    Prediction markets show considerable promise for developing flexible mechanisms for machine learning. Here, machine learning markets for multivariate systems are defined, and a utility-based framework is established for their analysis. This differs from the usual approach of defining static betting functions. It is shown that such markets can implement model combination methods used in machine learning, such as product of expert and mixture of expert approaches as equilibrium pricing models, by varying agent utility functions. They can also implement models composed of local potentials, and message passing methods. Prediction markets also allow for more flexible combinations, by combining multiple different utility functions. Conversely, the market mechanisms implement inference in the relevant probabilistic models. This means that market mechanism can be utilized for implementing parallelized model building and inference for probabilistic modelling.

  7. Understanding world soils: Machine Learning as a framework for analyzing global soil-landscape relationships

    Science.gov (United States)

    Hengl, Tomislav; Mendes de Jesus, Jorge

    2016-04-01

    Soil information is an increasingly important input to global geochemical modelling, hydrological modelling, spatial planning and agricultural extension. Soil remains one of the least developed environmental layers globally with data available only at coarse resolutions and with limited accuracy. In 2013/2014 ISRIC - World Soil Information has released a Global Soil Information system (SoilGrids1km) and an app to serve 3D soil information globally in near real time (DOI: 10.1371/journal.pone.0105992)). At the time, this system was a proof of concept demonstrating that global compilations of soil profiles can be used in an automated framework to produce complete and consistent spatial predictions of soil properties and classes. It was primarily been based on linear statistical modelling, which resulted in a limited fitting success. Global models fit to large, noisy data, can often result in significant oversmoothing of the measured variation. In year 2015, focus of the SoilGrids project has shifted towards improving data quality primarily considering of spatial detail and attribute accuracy. Initial testing using African soil data (DOI: 10.1371/journal.pone.0125814) has shown that the key to improving accuracy might lay in using Machine learning techniques such as random forests, neural networks and similar that are able to better represent complex, often non-linear soil-landscape relationships. In 2015 we have fitted machine learning using larger global compilations of soil profiles (about 150,000 points) and covariates at 250 m spatial resolution (about 150 covariates; mainly MODIS seasonal land products, SRTM DEM derivatives, climatic images, lithological and land cover and landform maps) and extracted more significant global soil-landscape relationships (R-square ranging from 0.42 to 0.83). Our results show that the key predictors for mapping soil classes are most commonly hydrological DEM parameters and climatic data; for soil texture fractions lithology and

  8. A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

    Science.gov (United States)

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.

  9. The Deflector Selector: A Machine Learning Framework for Prioritizing Deflection Technology Development

    Science.gov (United States)

    Van Heerden, Elmarie; Erasmus, Nicolas; Greenberg, Adam; Nesvold, Erika; Galache, Jose Luis; Dahlstrom, Eric; Marchis, Franck

    2016-10-01

    On 15 February, 2013, a ~15 m diameter asteroid entered the Earth's atmosphere over Russia. The resulting shockwave injured nearly 1500 people, and incurred ~33 million (USD) in infrastructure damages. The Chelyabinsk meteor served as a forceful demonstration of the threat posed to Earth by the hundreds of potentially hazardous objects (PHOs) that pass near the Earth every year. Although no objects have yet been discovered on an impact course for Earth, an impact is virtually statistically guaranteed at some point in the future. While many impactor deflection technologies have been proposed, humanity has yet to demonstrate the ability to divert an impactor when one is found. Developing and testing any single proposed technology will require significant research time and funding. This leaves open an obvious question - towards which technologies should funding and research be directed, in order to maximize our preparedness for when an impactor is eventually found?To help answer this question, we have created a detailed framework for analyzing various deflection technologies and their effectiveness. Using an n-body integrator (REBOUND), we have simulated the attempted deflections of a population of Earth-impacting objects with a variety of velocity perturbations (∂Vs), and measured the effects that these perturbations had on impact probability. We then mapped the ∂Vs applied in the orbital simulations to the technologies capable of achieving those perturbations, and analyzed which set of technologies would be most effective at preventing a PHO from impacting the earth. As a final step, we used the results of these simulations to train a machine learning algorithm. This algorithm, combined with a simulated PHO population, can predict which technologies are most likely to be needed. The algorithm can also reveal which impactor observables (mass, spin, orbit, etc.) have the greatest effect on the choice of deflection technology. These results can be used as a tool to

  10. Mood Inference Machine: Framework to Infer Affective Phenomena in ROODA Virtual Learning Environment

    Directory of Open Access Journals (Sweden)

    Magalí Teresinha Longhi

    2012-02-01

    Full Text Available This article presents a mechanism to infer mood states, aiming to provide virtual learning environments (VLEs with a tool able to recognize the student’s motivation. The inference model has as its parameters personality traits, motivational factors obtained through behavioral standards and the affective subjectivity identified in texts made available in the communication functionalities of the VLE. In the inference machine, such variables are treated under probability reasoning, more precisely by Bayesian networks.

  11. Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework.

    Science.gov (United States)

    Liu, Wenbo; Li, Ming; Yi, Li

    2016-08-01

    The atypical face scanning patterns in individuals with Autism Spectrum Disorder (ASD) has been repeatedly discovered by previous research. The present study examined whether their face scanning patterns could be potentially useful to identify children with ASD by adopting the machine learning algorithm for the classification purpose. Particularly, we applied the machine learning method to analyze an eye movement dataset from a face recognition task [Yi et al., 2016], to classify children with and without ASD. We evaluated the performance of our model in terms of its accuracy, sensitivity, and specificity of classifying ASD. Results indicated promising evidence for applying the machine learning algorithm based on the face scanning patterns to identify children with ASD, with a maximum classification accuracy of 88.51%. Nevertheless, our study is still preliminary with some constraints that may apply in the clinical practice. Future research should shed light on further valuation of our method and contribute to the development of a multitask and multimodel approach to aid the process of early detection and diagnosis of ASD. Autism Res 2016, 9: 888-898. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.

  12. Human Machine Learning Symbiosis

    Science.gov (United States)

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  13. A hybrid stock trading framework integrating technical analysis with machine learning techniques

    Directory of Open Access Journals (Sweden)

    Rajashree Dash

    2016-03-01

    Full Text Available In this paper, a novel decision support system using a computational efficient functional link artificial neural network (CEFLANN and a set of rules is proposed to generate the trading decisions more effectively. Here the problem of stock trading decision prediction is articulated as a classification problem with three class values representing the buy, hold and sell signals. The CEFLANN network used in the decision support system produces a set of continuous trading signals within the range 0–1 by analyzing the nonlinear relationship exists between few popular technical indicators. Further the output trading signals are used to track the trend and to produce the trading decision based on that trend using some trading rules. The novelty of the approach is to engender the profitable stock trading decision points through integration of the learning ability of CEFLANN neural network with the technical analysis rules. For assessing the potential use of the proposed method, the model performance is also compared with some other machine learning techniques such as Support Vector Machine (SVM, Naive Bayesian model, K nearest neighbor model (KNN and Decision Tree (DT model.

  14. Quantum machine learning.

    Science.gov (United States)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  15. A Framework for Final Drive Simultaneous Failure Diagnosis Based on Fuzzy Entropy and Sparse Bayesian Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Qing Ye

    2015-01-01

    Full Text Available This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach.

  16. GraphLab: A Distributed Framework for Machine Learning in the Cloud

    CERN Document Server

    Low, Yucheng; Kyrola, Aapo; Bickson, Danny; Guestrin, Carlos

    2011-01-01

    Machine Learning (ML) techniques are indispensable in a wide range of fields. Unfortunately, the exponential increase of dataset sizes are rapidly extending the runtime of sequential algorithms and threatening to slow future progress in ML. With the promise of affordable large-scale parallel computing, Cloud systems offer a viable platform to resolve the computational challenges in ML. However, designing and implementing efficient, provably correct distributed ML algorithms is often prohibitively challenging. To enable ML researchers to easily and efficiently use parallel systems, we introduced the GraphLab abstraction which is designed to represent the computational patterns in ML algorithms while permitting efficient parallel and distributed implementations. In this paper we provide a formal description of the GraphLab parallel abstraction and present an efficient distributed implementation. We conduct a comprehensive evaluation of GraphLab on three state-of-the-art ML algorithms using real large-scale data...

  17. Unsupervised learning framework for large-scale flight data analysis of cockpit human machine interaction issues

    Science.gov (United States)

    Vaidya, Abhishek B.

    As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic clustering techniques are combined with a consensus-based approach to group HMI events and create a data-driven model from the FOQA data. The proposed framework is able to effectively compress a large dataset into a small set of representative clusters within a data-driven model, enabling subject matter experts to effectively investigate identified potential HMI issues.

  18. Extreme Learning Machine Framework for Risk Stratification of Fatty Liver Disease Using Ultrasound Tissue Characterization.

    Science.gov (United States)

    Kuppili, Venkatanareshbabu; Biswas, Mainak; Sreekumar, Aswini; Suri, Harman S; Saba, Luca; Edla, Damodar Reddy; Marinhoe, Rui Tato; Sanches, J Miguel; Suri, Jasjit S

    2017-08-23

    Fatty Liver Disease (FLD) is caused by the deposition of fat in liver cells and leads to deadly diseases such as liver cancer. Several FLD detection and characterization systems using machine learning (ML) based on Support Vector Machines (SVM) have been applied. These ML systems utilize large number of ultrasonic grayscale features, pooling strategy for selecting the best features and several combinations of training/testing. As result, they are computationally intensive, slow and do not guarantee high performance due to mismatch between grayscale features and classifier type. This study proposes a reliable and fast Extreme Learning Machine (ELM)-based tissue characterization system (a class of Symtosis) for risk stratification of ultrasound liver images. ELM is used to train single layer feed forward neural network (SLFFNN). The input-to-hidden layer weights are randomly generated reducing computational cost. The only weights to be trained are hidden-to-output layer which is done in a single pass (without any iteration) making ELM faster than conventional ML methods. Adapting four types of K-fold cross-validation (K = 2, 3, 5 and 10) protocols on three kinds of data sizes: S0-original, S4-four splits, S8-sixty four splits (a total of 12 cases) and 46 types of grayscale features, we stratify the FLD US images using ELM and benchmark against SVM. Using the US liver database of 63 patients (27 normal/36 abnormal), our results demonstrate superior performance of ELM compared to SVM, for all cross-validation protocols (K2, K3, K5 and K10) and all types of US data sets (S0, S4, and S8) in terms of sensitivity, specificity, accuracy and area under the curve (AUC). Using the K10 cross-validation protocol on S8 data set, ELM showed an accuracy of 96.75% compared to 89.01% for SVM, and correspondingly, the AUC: 0.97 and 0.91, respectively. Further experiments also showed the mean reliability of 99% for ELM classifier, along with the mean speed improvement of 40% using

  19. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  20. Learning with Support Vector Machines

    CERN Document Server

    Campbell, Colin

    2010-01-01

    Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such a

  1. Microsoft Azure machine learning

    CERN Document Server

    Mund, Sumit

    2015-01-01

    The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

  2. Pattern recognition & machine learning

    CERN Document Server

    Anzai, Y

    1992-01-01

    This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.

  3. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  4. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  5. A novel machine learning-enabled framework for instantaneous heart rate monitoring from motion-artifact-corrupted electrocardiogram signals.

    Science.gov (United States)

    Zhang, Qingxue; Zhou, Dian; Zeng, Xuan

    2016-11-01

    This paper proposes a novel machine learning-enabled framework to robustly monitor the instantaneous heart rate (IHR) from wrist-electrocardiography (ECG) signals continuously and heavily corrupted by random motion artifacts in wearable applications. The framework includes two stages, i.e. heartbeat identification and refinement, respectively. In the first stage, an adaptive threshold-based auto-segmentation approach is proposed to select out heartbeat candidates, including the real heartbeats and large amounts of motion-artifact-induced interferential spikes. Then twenty-six features are extracted for each candidate in time, spatial, frequency and statistical domains, and evaluated by a spare support vector machine (SVM) to select out ten critical features which can effectively reveal residual heartbeat information. Afterwards, an SVM model, created on the training data using the selected feature set, is applied to find high confident heartbeats from a large number of candidates in the testing data. In the second stage, the SVM classification results are further refined by two steps: (1) a rule-based classifier with two attributes named 'continuity check' and 'locality check' for outlier (false positives) removal, and (2) a heartbeat interpolation strategy for missing-heartbeat (false negatives) recovery. The framework is evaluated on a wrist-ECG dataset acquired by a semi-customized platform and also a public dataset. When the signal-to-noise ratio is as low as  -7 dB, the mean absolute error of the estimated IHR is 1.4 beats per minute (BPM) and the root mean square error is 6.5 BPM. The proposed framework greatly outperforms well-established approaches, demonstrating that it can effectively identify the heartbeats from ECG signals continuously corrupted by intense motion artifacts and robustly estimate the IHR. This study is expected to contribute to robust long-term wearable IHR monitoring for pervasive heart health and fitness management.

  6. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  7. On-the-Fly Learning in a Perpetual Learning Machine

    OpenAIRE

    2015-01-01

    Despite the promise of brain-inspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brain-like dynamic 'on the fly' learning because it exists in a self-supervised state of Perpetual Stochastic Gradient Descent. Thus, we provide the means to unify learning and memory within a machine learning framework. We also explore ...

  8. Simultaneous Multi-vehicle Detection and Tracking Framework with Pavement Constraints Based on Machine Learning and Particle Filter Algorithm

    Institute of Scientific and Technical Information of China (English)

    WANG Ke; HUANG Zhi; ZHONG Zhihua

    2014-01-01

    Due to the large variations of environment with ever-changing background and vehicles with different shapes, colors and appearances, to implement a real-time on-board vehicle recognition system with high adaptability, efficiency and robustness in complicated environments, remains challenging. This paper introduces a simultaneous detection and tracking framework for robust on-board vehicle recognition based on monocular vision technology. The framework utilizes a novel layered machine learning and particle filter to build a multi-vehicle detection and tracking system. In the vehicle detection stage, a layered machine learning method is presented, which combines coarse-search and fine-search to obtain the target using the AdaBoost-based training algorithm. The pavement segmentation method based on characteristic similarity is proposed to estimate the most likely pavement area. Efficiency and accuracy are enhanced by restricting vehicle detection within the downsized area of pavement. In vehicle tracking stage, a multi-objective tracking algorithm based on target state management and particle filter is proposed. The proposed system is evaluated by roadway video captured in a variety of traffics, illumination, and weather conditions. The evaluating results show that, under conditions of proper illumination and clear vehicle appearance, the proposed system achieves 91.2% detection rate and 2.6% false detection rate. Experiments compared to typical algorithms show that, the presented algorithm reduces the false detection rate nearly by half at the cost of decreasing 2.7%–8.6% detection rate. This paper proposes a multi-vehicle detection and tracking system, which is promising for implementation in an on-board vehicle recognition system with high precision, strong robustness and low computational cost.

  9. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2015-01-01

    Perhaps you already know a bit about machine learning but have never used R, or perhaps you know a little R but are new to machine learning. In either case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic programming concepts, but no prior experience is required.

  10. Attention: A Machine Learning Perspective

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2012-01-01

    We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low...

  11. Probabilistic machine learning and artificial intelligence.

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  12. Probabilistic machine learning and artificial intelligence

    Science.gov (United States)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  13. Machine Learning for Hackers

    CERN Document Server

    Conway, Drew

    2012-01-01

    If you're an experienced programmer interested in crunching data, this book will get you started with machine learning-a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyz

  14. Development and Validation of a Learning Analytics Framework: Two Case Studies Using Support Vector Machines

    Science.gov (United States)

    Ifenthaler, Dirk; Widanapathirana, Chathuranga

    2014-01-01

    Interest in collecting and mining large sets of educational data on student background and performance to conduct research on learning and instruction has developed as an area generally referred to as learning analytics. Higher education leaders are recognizing the value of learning analytics for improving not only learning and teaching but also…

  15. Machine Learning and Radiology

    Science.gov (United States)

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  16. A Framework for Intelligent Instructional Systems: An Artificial Intelligence Machine Learning Approach.

    Science.gov (United States)

    Becker, Lee A.

    1987-01-01

    Presents and develops a general model of the nature of a learning system and a classification for learning systems. Highlights include the relationship between artificial intelligence and cognitive psychology; computer-based instructional systems; intelligent instructional systems; and the role of the learner's knowledge base in an intelligent…

  17. mlpy: Machine Learning Python

    CERN Document Server

    Albanese, Davide; Merler, Stefano; Riccadonna, Samantha; Jurman, Giuseppe; Furlanello, Cesare

    2012-01-01

    mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu.

  18. mlpy: Machine Learning Python

    OpenAIRE

    Albanese, Davide; Visintainer, Roberto; Merler, Stefano; Riccadonna, Samantha; Jurman, Giuseppe; Furlanello, Cesare

    2012-01-01

    mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu.

  19. Game-powered machine learning.

    Science.gov (United States)

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  20. Computer-assisted framework for machine-learning-based delineation of GTV regions on datasets of planning CT and PET/CT images.

    Science.gov (United States)

    Ikushima, Koujiro; Arimura, Hidetaka; Jin, Ze; Yabu-Uchi, Hidetake; Kuwazuru, Jumpei; Shioyama, Yoshiyuki; Sasaki, Tomonari; Honda, Hiroshi; Sasaki, Masayuki

    2017-01-01

    We have proposed a computer-assisted framework for machine-learning-based delineation of gross tumor volumes (GTVs) following an optimum contour selection (OCS) method. The key idea of the proposed framework was to feed image features around GTV contours (determined based on the knowledge of radiation oncologists) into a machine-learning classifier during the training step, after which the classifier produces the 'degree of GTV' for each voxel in the testing step. Initial GTV regions were extracted using a support vector machine (SVM) that learned the image features inside and outside each tumor region (determined by radiation oncologists). The leave-one-out-by-patient test was employed for training and testing the steps of the proposed framework. The final GTV regions were determined using the OCS method that can be used to select a global optimum object contour based on multiple active delineations with a LSM around the GTV. The efficacy of the proposed framework was evaluated in 14 lung cancer cases [solid: 6, ground-glass opacity (GGO): 4, mixed GGO: 4] using the 3D Dice similarity coefficient (DSC), which denotes the degree of region similarity between the GTVs contoured by radiation oncologists and those determined using the proposed framework. The proposed framework achieved an average DSC of 0.777 for 14 cases, whereas the OCS-based framework produced an average DSC of 0.507. The average DSCs for GGO and mixed GGO were 0.763 and 0.701, respectively, obtained by the proposed framework. The proposed framework can be employed as a tool to assist radiation oncologists in delineating various GTV regions. © The Author 2016. Published by Oxford University Press on behalf of The Japan Radiation Research Society and Japanese Society for Radiation Oncology.

  1. Malware Propagation on Social Time Varying Networks: A Comparative Study of Machine Learning Frameworks

    Directory of Open Access Journals (Sweden)

    A.A. Ojugo

    2014-08-01

    Full Text Available Significant research into the logarithmic analysis of complex networks yields solution to help minimize virus spread and propagation over networks. This task of virus propagation is been a recurring subject, and design of complex models will yield modeling solutions used in a number of events not limited to and include propagation, dataflow, network immunization, resource management, service distribution, adoption of viral marketing etc. Stochastic models are successfully used to predict the virus propagation processes and its effects on networks. The study employs SI-models for independent cascade and the dynamic models with Enron dataset (of e-mail addresses and presents comparative result using varied machine models. Study samples 25,000 emails of Enron dataset with Entropy and Information Gain computed to address issues of blocking targeting and extent of virus spread on graphs. Study addressed the problem of the expected spread immunization and the expected epidemic spread minimization; but not the epidemic threshold (for space constraint.

  2. Machine learning framework for analysis of transport through complex networks in porous, granular media: A focus on permeability

    Science.gov (United States)

    van der Linden, Joost H.; Narsilio, Guillermo A.; Tordesillas, Antoinette

    2016-08-01

    We present a data-driven framework to study the relationship between fluid flow at the macroscale and the internal pore structure, across the micro- and mesoscales, in porous, granular media. Sphere packings with varying particle size distribution and confining pressure are generated using the discrete element method. For each sample, a finite element analysis of the fluid flow is performed to compute the permeability. We construct a pore network and a particle contact network to quantify the connectivity of the pores and particles across the mesoscopic spatial scales. Machine learning techniques for feature selection are employed to identify sets of microstructural properties and multiscale complex network features that optimally characterize permeability. We find a linear correlation (in log-log scale) between permeability and the average closeness centrality of the weighted pore network. With the pore network links weighted by the local conductance, the average closeness centrality represents a multiscale measure of efficiency of flow through the pore network in terms of the mean geodesic distance (or shortest path) between all pore bodies in the pore network. Specifically, this study objectively quantifies a hypothesized link between high permeability and efficient shortest paths that thread through relatively large pore bodies connected to each other by high conductance pore throats, embodying connectivity and pore structure.

  3. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2015-01-01

    Techniques from the machine learning community are reviewed and employed for laser characterization, signal detection in the presence of nonlinear phase noise, and nonlinearity mitigation. Bayesian filtering and expectation maximization are employed within nonlinear state-space framework...

  4. Machine learning techniques in optical communication

    DEFF Research Database (Denmark)

    Zibar, Darko; Piels, Molly; Jones, Rasmus Thomas

    2016-01-01

    Machine learning techniques relevant for nonlinearity mitigation, carrier recovery, and nanoscale device characterization are reviewed and employed. Markov Chain Monte Carlo in combination with Bayesian filtering is employed within the nonlinear state-space framework and demonstrated for parameter...

  5. Tree based machine learning framework for predicting ground state energies of molecules

    Science.gov (United States)

    Himmetoglu, Burak

    2016-10-01

    We present an application of the boosted regression tree algorithm for predicting ground state energies of molecules made up of C, H, N, O, P, and S (CHNOPS). The PubChem chemical compound database has been incorporated to construct a dataset of 16 242 molecules, whose electronic ground state energies have been computed using density functional theory. This dataset is used to train the boosted regression tree algorithm, which allows a computationally efficient and accurate prediction of molecular ground state energies. Predictions from boosted regression trees are compared with neural network regression, a widely used method in the literature, and shown to be more accurate with significantly reduced computational cost. The performance of the regression model trained using the CHNOPS set is also tested on a set of distinct molecules that contain additional Cl and Si atoms. It is shown that the learning algorithms lead to a rich and diverse possibility of applications in molecular discovery and materials informatics.

  6. Tree based machine learning framework for predicting ground state energies of molecules

    CERN Document Server

    Himmetoglu, Burak

    2016-01-01

    We present an application of the boosted regression tree algorithm for predicting ground state energies of molecules made up of C, H, N, O, P, and S (CHNOPS). The PubChem chemical compound database has been incorporated to construct a dataset of 16,242 molecules, whose electronic ground state energies have been computed using density functional theory. This dataset is used to train the boosted regression tree algorithm, which allows a computationally efficient and accurate prediction of molecular ground state energies. Predictions from boosted regression trees are compared with neural network regression, a widely used method in the literature, and shown to be more accurate with significantly reduced computational cost. The performance of the regression model trained using the CHNOPS set is also tested on a set of distinct molecules that contain additional Cl and Si atoms. It is shown that the learning algorithms lead to a rich and diverse possibility of applications in molecular discovery and materials inform...

  7. Machine Learning with Distances

    Science.gov (United States)

    2015-02-16

    and demonstrated their usefulness in experiments. 1 Introduction The goal of machine learning is to find useful knowledge behind data. Many machine...212, 172]. However, direct divergence approximators still suffer from the curse of dimensionality. A possible cure for this problem is to combine them...obtain the global optimal solution or even a good local solution without any prior knowledge . For this reason, we decided to introduce the unit-norm

  8. Machine Learning in Medicine.

    Science.gov (United States)

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.

  9. Clojure for machine learning

    CERN Document Server

    Wali, Akhil

    2014-01-01

    A book that brings out the strengths of Clojure programming that have to facilitate machine learning. Each topic is described in substantial detail, and examples and libraries in Clojure are also demonstrated.This book is intended for Clojure developers who want to explore the area of machine learning. Basic understanding of the Clojure programming language is required, but thorough acquaintance with the standard Clojure library or any libraries are not required. Familiarity with theoretical concepts and notation of mathematics and statistics would be an added advantage.

  10. Mastering machine learning with scikit-learn

    CERN Document Server

    Hackeling, Gavin

    2014-01-01

    If you are a software developer who wants to learn how machine learning models work and how to apply them effectively, this book is for you. Familiarity with machine learning fundamentals and Python will be helpful, but is not essential.

  11. Machine Learning for Security

    CERN Document Server

    CERN. Geneva

    2015-01-01

    Applied statistics, aka ‘Machine Learning’, offers a wealth of techniques for answering security questions. It’s a much hyped topic in the big data world, with many companies now providing machine learning as a service. This talk will demystify these techniques, explain the math, and demonstrate their application to security problems. The presentation will include how-to’s on classifying malware, looking into encrypted tunnels, and finding botnets in DNS data. About the speaker Josiah is a security researcher with HP TippingPoint DVLabs Research Group. He has over 15 years of professional software development experience. Josiah used to do AI, with work focused on graph theory, search, and deductive inference on large knowledge bases. As rules only get you so far, he moved from AI to using machine learning techniques identifying failure modes in email traffic. There followed digressions into clustered data storage and later integrated control systems. Current ...

  12. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop comp

  13. Machine learning in image steganalysis

    CERN Document Server

    Schaathun, Hans Georg

    2012-01-01

    "The only book to look at steganalysis from the perspective of machine learning theory, and to apply the common technique of machine learning to the particular field of steganalysis; ideal for people working in both disciplines"--

  14. Discovering charge density functionals and structure-property relationships with PROPhet: A general framework for coupling machine learning and first-principles methods.

    Science.gov (United States)

    Kolb, Brian; Lentz, Levi C; Kolpak, Alexie M

    2017-04-26

    Modern ab initio methods have rapidly increased our understanding of solid state materials properties, chemical reactions, and the quantum interactions between atoms. However, poor scaling often renders direct ab initio calculations intractable for large or complex systems. There are two obvious avenues through which to remedy this problem: (i) develop new, less expensive methods to calculate system properties, or (ii) make existing methods faster. This paper describes an open source framework designed to pursue both of these avenues. PROPhet (short for PROPerty Prophet) utilizes machine learning techniques to find complex, non-linear mappings between sets of material or system properties. The result is a single code capable of learning analytical potentials, non-linear density functionals, and other structure-property or property-property relationships. These capabilities enable highly accurate mesoscopic simulations, facilitate computation of expensive properties, and enable the development of predictive models for systematic materials design and optimization. This work explores the coupling of machine learning to ab initio methods through means both familiar (e.g., the creation of various potentials and energy functionals) and less familiar (e.g., the creation of density functionals for arbitrary properties), serving both to demonstrate PROPhet's ability to create exciting post-processing analysis tools and to open the door to improving ab initio methods themselves with these powerful machine learning techniques.

  15. PCA-based polling strategy in machine learning framework for coronary artery disease risk assessment in intravascular ultrasound: A link between carotid and coronary grayscale plaque morphology.

    Science.gov (United States)

    Araki, Tadashi; Ikeda, Nobutaka; Shukla, Devarshi; Jain, Pankaj K; Londhe, Narendra D; Shrivastava, Vimal K; Banchhor, Sumit K; Saba, Luca; Nicolaides, Andrew; Shafique, Shoaib; Laird, John R; Suri, Jasjit S

    2016-05-01

    Percutaneous coronary interventional procedures need advance planning prior to stenting or an endarterectomy. Cardiologists use intravascular ultrasound (IVUS) for screening, risk assessment and stratification of coronary artery disease (CAD). We hypothesize that plaque components are vulnerable to rupture due to plaque progression. Currently, there are no standard grayscale IVUS tools for risk assessment of plaque rupture. This paper presents a novel strategy for risk stratification based on plaque morphology embedded with principal component analysis (PCA) for plaque feature dimensionality reduction and dominant feature selection technique. The risk assessment utilizes 56 grayscale coronary features in a machine learning framework while linking information from carotid and coronary plaque burdens due to their common genetic makeup. This system consists of a machine learning paradigm which uses a support vector machine (SVM) combined with PCA for optimal and dominant coronary artery morphological feature extraction. Carotid artery proven intima-media thickness (cIMT) biomarker is adapted as a gold standard during the training phase of the machine learning system. For the performance evaluation, K-fold cross validation protocol is adapted with 20 trials per fold. For choosing the dominant features out of the 56 grayscale features, a polling strategy of PCA is adapted where the original value of the features is unaltered. Different protocols are designed for establishing the stability and reliability criteria of the coronary risk assessment system (cRAS). Using the PCA-based machine learning paradigm and cross-validation protocol, a classification accuracy of 98.43% (AUC 0.98) with K=10 folds using an SVM radial basis function (RBF) kernel was achieved. A reliability index of 97.32% and machine learning stability criteria of 5% were met for the cRAS. This is the first Computer aided design (CADx) system of its kind that is able to demonstrate the ability of coronary

  16. Development of Machine Learning Tools in ROOT

    Science.gov (United States)

    Gleyzer, S. V.; Moneta, L.; Zapata, Omar A.

    2016-10-01

    ROOT is a framework for large-scale data analysis that provides basic and advanced statistical methods used by the LHC experiments. These include machine learning algorithms from the ROOT-integrated Toolkit for Multivariate Analysis (TMVA). We present several recent developments in TMVA, including a new modular design, new algorithms for variable importance and cross-validation, interfaces to other machine-learning software packages and integration of TMVA with Jupyter, making it accessible with a browser.

  17. Machine Learning at Scale

    OpenAIRE

    Izrailev, Sergei; Stanley, Jeremy M.

    2014-01-01

    It takes skill to build a meaningful predictive model even with the abundance of implementations of modern machine learning algorithms and readily available computing resources. Building a model becomes challenging if hundreds of terabytes of data need to be processed to produce the training data set. In a digital advertising technology setting, we are faced with the need to build thousands of such models that predict user behavior and power advertising campaigns in a 24/7 chaotic real-time p...

  18. Machine learning with R cookbook

    CERN Document Server

    Chiu, Yu-Wei

    2015-01-01

    If you want to learn how to use R for machine learning and gain insights from your data, then this book is ideal for you. Regardless of your level of experience, this book covers the basics of applying R to machine learning through to advanced techniques. While it is helpful if you are familiar with basic programming or machine learning concepts, you do not require prior experience to benefit from this book.

  19. Soft computing in machine learning

    CERN Document Server

    Park, Jooyoung; Inoue, Atsushi

    2014-01-01

    As users or consumers are now demanding smarter devices, intelligent systems are revolutionizing by utilizing machine learning. Machine learning as part of intelligent systems is already one of the most critical components in everyday tools ranging from search engines and credit card fraud detection to stock market analysis. You can train machines to perform some things, so that they can automatically detect, diagnose, and solve a variety of problems. The intelligent systems have made rapid progress in developing the state of the art in machine learning based on smart and deep perception. Using machine learning, the intelligent systems make widely applications in automated speech recognition, natural language processing, medical diagnosis, bioinformatics, and robot locomotion. This book aims at introducing how to treat a substantial amount of data, to teach machines and to improve decision making models. And this book specializes in the developments of advanced intelligent systems through machine learning. It...

  20. Quantum-Enhanced Machine Learning.

    Science.gov (United States)

    Dunjko, Vedran; Taylor, Jacob M; Briegel, Hans J

    2016-09-23

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  1. Quantum-Enhanced Machine Learning

    Science.gov (United States)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  2. Quantum adiabatic machine learning

    CERN Document Server

    Pudenz, Kristen L

    2011-01-01

    We develop an approach to machine learning and anomaly detection via quantum adiabatic evolution. In the training phase we identify an optimal set of weak classifiers, to form a single strong classifier. In the testing phase we adiabatically evolve one or more strong classifiers on a superposition of inputs in order to find certain anomalous elements in the classification space. Both the training and testing phases are executed via quantum adiabatic evolution. We apply and illustrate this approach in detail to the problem of software verification and validation.

  3. Machine learning in healthcare informatics

    CERN Document Server

    Acharya, U; Dua, Prerna

    2014-01-01

    The book is a unique effort to represent a variety of techniques designed to represent, enhance, and empower multi-disciplinary and multi-institutional machine learning research in healthcare informatics. The book provides a unique compendium of current and emerging machine learning paradigms for healthcare informatics and reflects the diversity, complexity and the depth and breath of this multi-disciplinary area. The integrated, panoramic view of data and machine learning techniques can provide an opportunity for novel clinical insights and discoveries.

  4. Stacked Extreme Learning Machines.

    Science.gov (United States)

    Zhou, Hongming; Huang, Guang-Bin; Lin, Zhiping; Wang, Han; Soh, Yeng Chai

    2015-09-01

    Extreme learning machine (ELM) has recently attracted many researchers' interest due to its very fast learning speed, good generalization ability, and ease of implementation. It provides a unified solution that can be used directly to solve regression, binary, and multiclass classification problems. In this paper, we propose a stacked ELMs (S-ELMs) that is specially designed for solving large and complex data problems. The S-ELMs divides a single large ELM network into multiple stacked small ELMs which are serially connected. The S-ELMs can approximate a very large ELM network with small memory requirement. To further improve the testing accuracy on big data problems, the ELM autoencoder can be implemented during each iteration of the S-ELMs algorithm. The simulation results show that the S-ELMs even with random hidden nodes can achieve similar testing accuracy to support vector machine (SVM) while having low memory requirements. With the help of ELM autoencoder, the S-ELMs can achieve much better testing accuracy than SVM and slightly better accuracy than deep belief network (DBN) with much faster training speed.

  5. Machine Learning Exciton Dynamics

    CERN Document Server

    Häse, Florian; Pyzer-Knapp, Edward; Aspuru-Guzik, Alán

    2015-01-01

    Obtaining the exciton dynamics of large photosynthetic complexes by using mixed quantum mechanics/molecular mechanics (QM/MM) is computationally demanding. We propose a machine learning technique, multi-layer perceptrons, as a tool to reduce the time required to compute excited state energies. With this approach we predict time-dependent density functional theory (TDDFT) excited state energies of bacteriochlorophylls in the Fenna-Matthews-Olson (FMO) complex. Additionally we compute spectral densities and exciton populations from the predictions. Different methods to determine multi-layer perceptron training sets are introduced, leading to several initial data selections. In addition, we compute spectral densities and exciton populations. Once multi-layer perceptrons are trained, predicting excited state energies was found to be significantly faster than the corresponding QM/MM calculations. We showed that multi-layer perceptrons can successfully reproduce the energies of QM/MM calculations to a high degree o...

  6. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of poten

  7. Parallelization of TMVA Machine Learning Algorithms

    CERN Document Server

    Hajili, Mammad

    2017-01-01

    This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.

  8. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  9. Machine learning approximation techniques using dual trees

    OpenAIRE

    Ergashbaev, Denis

    2015-01-01

    This master thesis explores a dual-tree framework as applied to a particular class of machine learning problems that are collectively referred to as generalized n-body problems. It builds a new algorithm on top of it and improves existing Boosted OGE classifier.

  10. Machine Learning for Medical Imaging.

    Science.gov (United States)

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. (©)RSNA, 2017.

  11. Learning from Distributions via Support Measure Machines

    CERN Document Server

    Muandet, Krikamol; Fukumizu, Kenji; Dinuzzo, Francesco

    2012-01-01

    This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.

  12. A machine learning framework for auto classification of imaging system exams in hospital setting for utilization optimization.

    Science.gov (United States)

    Patil, Meru A; Patil, Ravindra B; Krishnamoorthy, P; John, Jacob; Patil, Meru A; Patil, Ravindra B; Krishnamoorthy, P; John, Jacob; Patil, Meru A; John, Jacob; Patil, Ravindra B; Krishnamoorthy, P

    2016-08-01

    In clinical environment, Interventional X-Ray (IXR) system is used on various anatomies and for various types of the procedures. It is important to classify correctly each exam of IXR system into respective procedures and/or assign to correct anatomy. This classification enhances productivity of the system in terms of better scheduling of the Cath lab, also provides means to perform device usage/revenue forecast of the system by hospital management and focus on targeted treatment planning for a disease/anatomy. Although it may appear classification of each exam into respective procedure/anatomy a simple task. However, in real-life hospital settings, it is well-known that same system settings are used to perform different types of procedures. Though, such usage leads to under-utilization of the system. In this work, a method is developed to classify exams into respective anatomical type by applying machine-learning techniques (SVM, KNN and decision trees) on log information of the systems. The classification result is promising with accuracy of greater than 90%.

  13. Statistical and machine learning approaches for network analysis

    CERN Document Server

    Dehmer, Matthias

    2012-01-01

    Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internation

  14. Machine learning in virtual screening.

    Science.gov (United States)

    Melville, James L; Burke, Edmund K; Hirst, Jonathan D

    2009-05-01

    In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

  15. Learning thermodynamics with Boltzmann machines

    Science.gov (United States)

    Torlai, Giacomo; Melko, Roger G.

    2016-10-01

    A Boltzmann machine is a stochastic neural network that has been extensively used in the layers of deep architectures for modern machine learning applications. In this paper, we develop a Boltzmann machine that is capable of modeling thermodynamic observables for physical systems in thermal equilibrium. Through unsupervised learning, we train the Boltzmann machine on data sets constructed with spin configurations importance sampled from the partition function of an Ising Hamiltonian at different temperatures using Monte Carlo (MC) methods. The trained Boltzmann machine is then used to generate spin states, for which we compare thermodynamic observables to those computed by direct MC sampling. We demonstrate that the Boltzmann machine can faithfully reproduce the observables of the physical system. Further, we observe that the number of neurons required to obtain accurate results increases as the system is brought close to criticality.

  16. Emerging Paradigms in Machine Learning

    CERN Document Server

    Jain, Lakhmi; Howlett, Robert

    2013-01-01

    This  book presents fundamental topics and algorithms that form the core of machine learning (ML) research, as well as emerging paradigms in intelligent system design. The  multidisciplinary nature of machine learning makes it a very fascinating and popular area for research.  The book is aiming at students, practitioners and researchers and captures the diversity and richness of the field of machine learning and intelligent systems.  Several chapters are devoted to computational learning models such as granular computing, rough sets and fuzzy sets An account of applications of well-known learning methods in biometrics, computational stylistics, multi-agent systems, spam classification including an extremely well-written survey on Bayesian networks shed light on the strengths and weaknesses of the methods. Practical studies yielding insight into challenging problems such as learning from incomplete and imbalanced data, pattern recognition of stochastic episodic events and on-line mining of non-stationary ...

  17. Machine Learning examples on Invenio

    CERN Document Server

    CERN. Geneva

    2017-01-01

    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

  18. Machine learning for healthcare technologies

    CERN Document Server

    Clifton, David A

    2016-01-01

    This book brings together chapters on the state-of-the-art in machine learning (ML) as it applies to the development of patient-centred technologies, with a special emphasis on 'big data' and mobile data.

  19. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  20. Machine learning methods for planning

    CERN Document Server

    Minton, Steven

    1993-01-01

    Machine Learning Methods for Planning provides information pertinent to learning methods for planning and scheduling. This book covers a wide variety of learning methods and learning architectures, including analogical, case-based, decision-tree, explanation-based, and reinforcement learning.Organized into 15 chapters, this book begins with an overview of planning and scheduling and describes some representative learning systems that have been developed for these tasks. This text then describes a learning apprentice for calendar management. Other chapters consider the problem of temporal credi

  1. Machine learning for evolution strategies

    CERN Document Server

    Kramer, Oliver

    2016-01-01

    This book introduces numerous algorithmic hybridizations between both worlds that show how machine learning can improve and support evolution strategies. The set of methods comprises covariance matrix estimation, meta-modeling of fitness and constraint functions, dimensionality reduction for search and visualization of high-dimensional optimization processes, and clustering-based niching. After giving an introduction to evolution strategies and machine learning, the book builds the bridge between both worlds with an algorithmic and experimental perspective. Experiments mostly employ a (1+1)-ES and are implemented in Python using the machine learning library scikit-learn. The examples are conducted on typical benchmark problems illustrating algorithmic concepts and their experimental behavior. The book closes with a discussion of related lines of research.

  2. A GEOBIA framework to estimate forest parameters from lidar transects, Quickbird imagery and machine learning: A case study in Quebec, Canada

    Science.gov (United States)

    Chen, Gang; Hay, Geoffrey J.; St-Onge, Benoît

    2012-04-01

    The GEOgraphic Object-Based Image Analysis (GEOBIA) paradigm continues to prove its efficacy in remote sensing image analysis by providing tools which emulate human perception and combine analyst's experience with meaningful image-objects. However, challenges remain in the evolution of this new paradigm as sophisticated methods attempt to deliver on the goal of automated geo-intelligence (i.e., geospatial content within context) from geospatial sources. In order to generate geo-intelligence from a forest scene, this article introduces a GEOBIA framework to estimate canopy height, above-ground biomass (AGB) and volume by combining lidar (light detection and ranging) transects, Quickbird imagery and machine learning algorithms. This framework is comprised three main components: (i) image-object extraction, (ii) lidar transect selection, and (iii) forest parameter generalization. The rational for integrating these methods is to provide a semi-automatic GEOBIA approach from which detailed forest information is obtained at the individual tree crown or small tree cluster level (i.e., mean object size of 0.04 ha); while also dramatically reducing airborne lidar data acquisition costs. Analysis is performed over a 16,330 ha forested study site in Quebec, Canada. Forest parameter estimation results derived from our GEOBIA framework demonstrate a strong relationship with those using the full lidar cover; where the highest estimates for canopy height ( R = 0.85; RMSE = 3.37 m), AGB ( R = 0.85; RMSE = 39.48 Mg/ha) and volume ( R = 0.85; RMSE = 52.59 m 3/ha) were achieved using a lidar transect sample representing only 7.6% of the total study area.

  3. Quantum machine learning what quantum computing means to data mining

    CERN Document Server

    Wittek, Peter

    2014-01-01

    Quantum Machine Learning bridges the gap between abstract developments in quantum computing and the applied research on machine learning. Paring down the complexity of the disciplines involved, it focuses on providing a synthesis that explains the most important machine learning algorithms in a quantum framework. Theoretical advances in quantum computing are hard to follow for computer scientists, and sometimes even for researchers involved in the field. The lack of a step-by-step guide hampers the broader understanding of this emergent interdisciplinary body of research. Quantum Machine L

  4. Higgs Machine Learning Challenge 2014

    CERN Multimedia

    Olivier, A-P; Bourdarios, C ; LAL / Orsay; Goldfarb, S ; University of Michigan

    2014-01-01

    High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics. We are now exploring new ways to improve the cross-fertilization of the two fields by setting up a data challenge, following the footsteps of, among others, the astrophysics community (dark matter and galaxy zoo challenges) and neurobiology (connectomics and decoding the human brain). The organization committee consists of ATLAS physicists and machine learning researchers. The Challenge will run from Monday 12th to September 2014.

  5. Machine learning methods in chemoinformatics

    Science.gov (United States)

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  6. Machine learning phases of matter

    Science.gov (United States)

    Carrasquilla, Juan; Melko, Roger G.

    2017-02-01

    Condensed-matter physics is the study of the collective behaviour of infinitely complex assemblies of electrons, nuclei, magnetic moments, atoms or qubits. This complexity is reflected in the size of the state space, which grows exponentially with the number of particles, reminiscent of the `curse of dimensionality' commonly encountered in machine learning. Despite this curse, the machine learning community has developed techniques with remarkable abilities to recognize, classify, and characterize complex sets of data. Here, we show that modern machine learning architectures, such as fully connected and convolutional neural networks, can identify phases and phase transitions in a variety of condensed-matter Hamiltonians. Readily programmable through modern software libraries, neural networks can be trained to detect multiple types of order parameter, as well as highly non-trivial states with no conventional order, directly from raw state configurations sampled with Monte Carlo.

  7. Modern machine learning techniques and their applications in cartoon animation research

    CERN Document Server

    Yu, Jun

    2013-01-01

    The integration of machine learning techniques and cartoon animation research is fast becoming a hot topic. This book helps readers learn the latest machine learning techniques, including patch alignment framework; spectral clustering, graph cuts, and convex relaxation; ensemble manifold learning; multiple kernel learning; multiview subspace learning; and multiview distance metric learning. It then presents the applications of these modern machine learning techniques in cartoon animation research. With these techniques, users can efficiently utilize the cartoon materials to generate animations

  8. Machine Learning in Medicine

    National Research Council Canada - National Science Library

    Deo, Rahul C

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success...

  9. Climatic response variability and machine learning: development of a modular technology framework for predicting bio-climatic change in pacific northwest ecosystems"

    Science.gov (United States)

    Seamon, E.; Gessler, P. E.; Flathers, E.

    2015-12-01

    The creation and use of large amounts of data in scientific investigations has become common practice. Data collection and analysis for large scientific computing efforts are not only increasing in volume as well as number, the methods and analysis procedures are evolving toward greater complexity (Bell, 2009, Clarke, 2009, Maimon, 2010). In addition, the growth of diverse data-intensive scientific computing efforts (Soni, 2011, Turner, 2014, Wu, 2008) has demonstrated the value of supporting scientific data integration. Efforts to bridge this gap between the above perspectives have been attempted, in varying degrees, with modular scientific computing analysis regimes implemented with a modest amount of success (Perez, 2009). This constellation of effects - 1) an increasing growth in the volume and amount of data, 2) a growing data-intensive science base that has challenging needs, and 3) disparate data organization and integration efforts - has created a critical gap. Namely, systems of scientific data organization and management typically do not effectively enable integrated data collaboration or data-intensive science-based communications. Our research efforts attempt to address this gap by developing a modular technology framework for data science integration efforts - with climate variation as the focus. The intention is that this model, if successful, could be generalized to other application areas. Our research aim focused on the design and implementation of a modular, deployable technology architecture for data integration. Developed using aspects of R, interactive python, SciDB, THREDDS, Javascript, and varied data mining and machine learning techniques, the Modular Data Response Framework (MDRF) was implemented to explore case scenarios for bio-climatic variation as they relate to pacific northwest ecosystem regions. Our preliminary results, using historical NETCDF climate data for calibration purposes across the inland pacific northwest region

  10. Machine Learning in Parliament Elections

    Directory of Open Access Journals (Sweden)

    Ahmad Esfandiari

    2012-09-01

    Full Text Available Parliament is considered as one of the most important pillars of the country governance. The parliamentary elections and prediction it, had been considered by scholars of from various field like political science long ago. Some important features are used to model the results of consultative parliament elections. These features are as follows: reputation and popularity, political orientation, tradesmen's support, clergymen's support, support from political wings and the type of supportive wing. Two parameters of reputation and popularity and the support of clergymen and religious scholars that have more impact in reducing of prediction error in election results, have been used as input parameters in implementation. In this study, the Iranian parliamentary elections, modeled and predicted using learnable machines of neural network and neuro-fuzzy. Neuro-fuzzy machine combines the ability of knowledge representation of fuzzy sets and the learning power of neural networks simultaneously. In predicting the social and political behavior, the neural network is first trained by two learning algorithms using the training data set and then this machine predict the result on test data. Next, the learning of neuro-fuzzy inference machine is performed. Then, be compared the results of two machines.

  11. Machine Learning for Education: Learning to Teach

    Science.gov (United States)

    2016-12-01

    1 Machine Learning for Education: Learning to Teach Matthew C. Gombolay, Reed Jensen, Sung-Hyun Son Massachusetts Institute of Technology Lincoln...training tools and develop military strategies within their training environment. Second, we develop methods for improving warfighter education: learning to...and do not necessarily reflect the views of the Department of the Navy. RAMS # 1001485 Fig. 1. SGD enables development of automated teaching tools for

  12. Machine Learning applications in CMS

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Machine Learning is used in many aspects of CMS data taking, monitoring, processing and analysis. We review a few of these use cases and the most recent developments, with an outlook to future applications in the LHC Run III and for the High-Luminosity phase.

  13. Learning scikit-learn machine learning in Python

    CERN Document Server

    Garreta, Raúl

    2013-01-01

    The book adopts a tutorial-based approach to introduce the user to Scikit-learn.If you are a programmer who wants to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this the book for you. No previous experience with machine-learning algorithms is required.

  14. Machine Learning for ATLAS DDM Network Metrics

    CERN Document Server

    Lassnig, Mario; The ATLAS collaboration; Vamosi, Ralf

    2016-01-01

    The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from our ongoing automation efforts. First, we describe our framework for distributed data management and network metrics, automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  15. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  16. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  17. Machine learning a probabilistic perspective

    CERN Document Server

    Murphy, Kevin P

    2012-01-01

    Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic method...

  18. Learning Extended Finite State Machines

    Science.gov (United States)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  19. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  20. Learning Machine Learning: A Case Study

    Science.gov (United States)

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  1. Conformal prediction for reliable machine learning theory, adaptations and applications

    CERN Document Server

    Balasubramanian, Vineeth; Vovk, Vladimir

    2014-01-01

    The conformal predictions framework is a recent development in machine learning that can associate a reliable measure of confidence with a prediction in any real-world pattern recognition application, including risk-sensitive applications such as medical diagnosis, face recognition, and financial risk prediction. Conformal Predictions for Reliable Machine Learning: Theory, Adaptations and Applications captures the basic theory of the framework, demonstrates how to apply it to real-world problems, and presents several adaptations, including active learning, change detection, and anomaly detecti

  2. Attractor Control Using Machine Learning

    CERN Document Server

    Duriez, Thomas; Noack, Bernd R; Cordier, Laurent; Segond, Marc; Abel, Markus

    2013-01-01

    We propose a general strategy for feedback control design of complex dynamical systems exploiting the nonlinear mechanisms in a systematic unsupervised manner. These dynamical systems can have a state space of arbitrary dimension with finite number of actuators (multiple inputs) and sensors (multiple outputs). The control law maps outputs into inputs and is optimized with respect to a cost function, containing physics via the dynamical or statistical properties of the attractor to be controlled. Thus, we are capable of exploiting nonlinear mechanisms, e.g. chaos or frequency cross-talk, serving the control objective. This optimization is based on genetic programming, a branch of machine learning. This machine learning control is successfully applied to the stabilization of nonlinearly coupled oscillators and maximization of Lyapunov exponent of a forced Lorenz system. We foresee potential applications to most nonlinear multiple inputs/multiple outputs control problems, particulary in experiments.

  3. Machine learning phases of matter

    OpenAIRE

    Carrasquilla, Juan; Melko, Roger G.

    2016-01-01

    Neural networks can be used to identify phases and phase transitions in condensed matter systems via supervised machine learning. Readily programmable through modern software libraries, we show that a standard feed-forward neural network can be trained to detect multiple types of order parameter directly from raw state configurations sampled with Monte Carlo. In addition, they can detect highly non-trivial states such as Coulomb phases, and if modified to a convolutional neural network, topol...

  4. Galaxy Classification using Machine Learning

    Science.gov (United States)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  5. Using Machine Learning in Adversarial Environments.

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Warren Leon [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-02-01

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approaches only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.

  6. Machine learning for medical images analysis.

    Science.gov (United States)

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods.

  7. Continual Learning through Evolvable Neural Turing Machines

    DEFF Research Database (Denmark)

    Lüders, Benno; Schläger, Mikkel; Risi, Sebastian

    2016-01-01

    Continual learning, i.e. the ability to sequentially learn tasks without catastrophic forgetting of previously learned ones, is an important open challenge in machine learning. In this paper we take a step in this direction by showing that the recently proposed Evolving Neural Turing Machine (ENTM......) approach is able to perform one-shot learning in a reinforcement learning task without catastrophic forgetting of previously stored associations....

  8. Reverse hypothesis machine learning a practitioner's perspective

    CERN Document Server

    Kulkarni, Parag

    2017-01-01

    This book introduces a paradigm of reverse hypothesis machines (RHM), focusing on knowledge innovation and machine learning. Knowledge- acquisition -based learning is constrained by large volumes of data and is time consuming. Hence Knowledge innovation based learning is the need of time. Since under-learning results in cognitive inabilities and over-learning compromises freedom, there is need for optimal machine learning. All existing learning techniques rely on mapping input and output and establishing mathematical relationships between them. Though methods change the paradigm remains the same—the forward hypothesis machine paradigm, which tries to minimize uncertainty. The RHM, on the other hand, makes use of uncertainty for creative learning. The approach uses limited data to help identify new and surprising solutions. It focuses on improving learnability, unlike traditional approaches, which focus on accuracy. The book is useful as a reference book for machine learning researchers and professionals as ...

  9. Sparse extreme learning machine for classification.

    Science.gov (United States)

    Bai, Zuo; Huang, Guang-Bin; Wang, Danwei; Wang, Han; Westover, M Brandon

    2014-10-01

    Extreme learning machine (ELM) was initially proposed for single-hidden-layer feedforward neural networks (SLFNs). In the hidden layer (feature mapping), nodes are randomly generated independently of training data. Furthermore, a unified ELM was proposed, providing a single framework to simplify and unify different learning methods, such as SLFNs, least square support vector machines, proximal support vector machines, and so on. However, the solution of unified ELM is dense, and thus, usually plenty of storage space and testing time are required for large-scale applications. In this paper, a sparse ELM is proposed as an alternative solution for classification, reducing storage space and testing time. In addition, unified ELM obtains the solution by matrix inversion, whose computational complexity is between quadratic and cubic with respect to the training size. It still requires plenty of training time for large-scale problems, even though it is much faster than many other traditional methods. In this paper, an efficient training algorithm is specifically developed for sparse ELM. The quadratic programming problem involved in sparse ELM is divided into a series of smallest possible sub-problems, each of which are solved analytically. Compared with SVM, sparse ELM obtains better generalization performance with much faster training speed. Compared with unified ELM, sparse ELM achieves similar generalization performance for binary classification applications, and when dealing with large-scale binary classification problems, sparse ELM realizes even faster training speed than unified ELM.

  10. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  11. A Machine Learning Framework for Gait Classification Using Inertial Sensors: Application to Elderly, Post-Stroke and Huntington’s Disease Patients

    Directory of Open Access Journals (Sweden)

    Andrea Mannini

    2016-01-01

    Full Text Available Machine learning methods have been widely used for gait assessment through the estimation of spatio-temporal parameters. As a further step, the objective of this work is to propose and validate a general probabilistic modeling approach for the classification of different pathological gaits. Specifically, the presented methodology was tested on gait data recorded on two pathological populations (Huntington’s disease and post-stroke subjects and healthy elderly controls using data from inertial measurement units placed at shank and waist. By extracting features from group-specific Hidden Markov Models (HMMs and signal information in time and frequency domain, a Support Vector Machines classifier (SVM was designed and validated. The 90.5% of subjects was assigned to the right group after leave-one-subject–out cross validation and majority voting. The long-term goal we point to is the gait assessment in everyday life to early detect gait alterations.

  12. Archetypal Analysis for Machine Learning

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai

    2010-01-01

    Archetypal analysis (AA) proposed by Cutler and Breiman in [1] estimates the principal convex hull of a data set. As such AA favors features that constitute representative ’corners’ of the data, i.e. distinct aspects or archetypes. We will show that AA enjoys the interpretability of clustering - ...... for K-means [2]. We demonstrate that the AA model is relevant for feature extraction and dimensional reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, text mining and collaborative filtering....

  13. Extreme Learning Machine for land cover classification

    OpenAIRE

    Pal, Mahesh

    2008-01-01

    This paper explores the potential of extreme learning machine based supervised classification algorithm for land cover classification. In comparison to a backpropagation neural network, which requires setting of several user-defined parameters and may produce local minima, extreme learning machine require setting of one parameter and produce a unique solution. ETM+ multispectral data set (England) was used to judge the suitability of extreme learning machine for remote sensing classifications...

  14. Machine learning in genetics and genomics

    Science.gov (United States)

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  15. Introducing Machine Learning Concepts with WEKA.

    Science.gov (United States)

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  16. Trends in Machine Learning for Signal Processing

    DEFF Research Database (Denmark)

    Adali, Tulay; Miller, David J.; Diamantaras, Konstantinos I.

    2011-01-01

    By putting the accent on learning from the data and the environment, the Machine Learning for SP (MLSP) Technical Committee (TC) provides the essential bridge between the machine learning and SP communities. While the emphasis in MLSP is on learning and data-driven approaches, SP defines the main...... applications of interest, and thus the constraints and requirements on solutions, which include computational efficiency, online adaptation, and learning with limited supervision/reference data....

  17. Advanced Machine learning Algorithm Application for Rotating Machine Health Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Kanemoto, Shigeru; Watanabe, Masaya [The University of Aizu, Aizuwakamatsu (Japan); Yusa, Noritaka [Tohoku University, Sendai (Japan)

    2014-08-15

    The present paper tries to evaluate the applicability of conventional sound analysis techniques and modern machine learning algorithms to rotating machine health monitoring. These techniques include support vector machine, deep leaning neural network, etc. The inner ring defect and misalignment anomaly sound data measured by a rotating machine mockup test facility are used to verify the above various kinds of algorithms. Although we cannot find remarkable difference of anomaly discrimination performance, some methods give us the very interesting eigen patterns corresponding to normal and abnormal states. These results will be useful for future more sensitive and robust anomaly monitoring technology.

  18. Incremental Support Vector Machine Framework for Visual Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yuichi Motai

    2007-01-01

    Full Text Available Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.

  19. Machine learning in medicine cookbook

    CERN Document Server

    Cleophas, Ton J

    2014-01-01

    The amount of data in medical databases doubles every 20 months, and physicians are at a loss to analyze them. Also, traditional methods of data analysis have difficulty to identify outliers and patterns in big data and data with multiple exposure / outcome variables and analysis-rules for surveys and questionnaires, currently common methods of data collection, are, essentially, missing. Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks. General purposes ...

  20. A Concrete Framework for Environment Machines

    DEFF Research Database (Denmark)

    Biernacka, Malgorzata; Danvy, Olivier

    2007-01-01

    calculus with explicit substitutions), we extend it minimally so that it can also express one-step reduction strategies, and we methodically derive a series of environment machines from the specification of two one-step reduction strategies for the lambda-calculus: normal order and applicative order....... The derivation extends Danvy and Nielsen’s refocusing-based construction of abstract machines with two new steps: one for coalescing two successive transitions into one, and the other for unfolding a closure into a term and an environment in the resulting abstract machine. The resulting environment machines...... include both the Krivine machine and the original version of Krivine’s machine, Felleisen et al.’s CEK machine, and Leroy’s Zinc abstract machine....

  1. Amplifying human ability through autonomics and machine learning in IMPACT

    Science.gov (United States)

    Dzieciuch, Iryna; Reeder, John; Gutzwiller, Robert; Gustafson, Eric; Coronado, Braulio; Martinez, Luis; Croft, Bryan; Lange, Douglas S.

    2017-05-01

    Amplifying human ability for controlling complex environments featuring autonomous units can be aided by learned models of human and system performance. In developing a command and control system that allows a small number of people to control a large number of autonomous teams, we employ an autonomics framework to manage the networks that represent mission plans and the networks that are composed of human controllers and their autonomous assistants. Machine learning allows us to build models of human and system performance useful for monitoring plans and managing human attention and task loads. Machine learning also aids in the development of tactics that human supervisors can successfully monitor through the command and control system.

  2. Building machine learning systems with Python

    CERN Document Server

    Coelho, Luis Pedro

    2015-01-01

    This book primarily targets Python developers who want to learn and use Python's machine learning capabilities and gain valuable insights from data to develop effective solutions for business problems.

  3. Extreme learning machines 2013 algorithms and applications

    CERN Document Server

    Toh, Kar-Ann; Romay, Manuel; Mao, Kezhi

    2014-01-01

    In recent years, ELM has emerged as a revolutionary technique of computational intelligence, and has attracted considerable attentions. An extreme learning machine (ELM) is a single layer feed-forward neural network alike learning system, whose connections from the input layer to the hidden layer are randomly generated, while the connections from the hidden layer to the output layer are learned through linear learning methods. The outstanding merits of extreme learning machine (ELM) are its fast learning speed, trivial human intervene and high scalability.   This book contains some selected papers from the International Conference on Extreme Learning Machine 2013, which was held in Beijing China, October 15-17, 2013. This conference aims to bring together the researchers and practitioners of extreme learning machine from a variety of fields including artificial intelligence, biomedical engineering and bioinformatics, system modelling and control, and signal and image processing, to promote research and discu...

  4. Machine learning of network metrics in ATLAS Distributed Data Management

    CERN Document Server

    Lassnig, Mario; The ATLAS collaboration

    2017-01-01

    The increasing volume of physics data poses a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from one of our ongoing automation efforts that focuses on network metrics. First, we describe our machine learning framework built atop the ATLAS Analytics Platform. This framework can automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for network-aware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our m...

  5. Machine Learning Interface for Medical Image Analysis.

    Science.gov (United States)

    Zhang, Yi C; Kagen, Alexander C

    2016-10-11

    TensorFlow is a second-generation open-source machine learning software library with a built-in framework for implementing neural networks in wide variety of perceptual tasks. Although TensorFlow usage is well established with computer vision datasets, the TensorFlow interface with DICOM formats for medical imaging remains to be established. Our goal is to extend the TensorFlow API to accept raw DICOM images as input; 1513 DaTscan DICOM images were obtained from the Parkinson's Progression Markers Initiative (PPMI) database. DICOM pixel intensities were extracted and shaped into tensors, or n-dimensional arrays, to populate the training, validation, and test input datasets for machine learning. A simple neural network was constructed in TensorFlow to classify images into normal or Parkinson's disease groups. Training was executed over 1000 iterations for each cross-validation set. The gradient descent optimization and Adagrad optimization algorithms were used to minimize cross-entropy between the predicted and ground-truth labels. Cross-validation was performed ten times to produce a mean accuracy of 0.938 ± 0.047 (95 % CI 0.908-0.967). The mean sensitivity was 0.974 ± 0.043 (95 % CI 0.947-1.00) and mean specificity was 0.822 ± 0.207 (95 % CI 0.694-0.950). We extended the TensorFlow API to enable DICOM compatibility in the context of DaTscan image analysis. We implemented a neural network classifier that produces diagnostic accuracies on par with excellent results from previous machine learning models. These results indicate the potential role of TensorFlow as a useful adjunct diagnostic tool in the clinical setting.

  6. Learning as a Machine: Crossovers between Humans and Machines

    Science.gov (United States)

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  7. Machine Learning wins the Higgs Challenge

    CERN Multimedia

    Abha Eli Phoboo

    2014-01-01

    The winner of the four-month-long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from the Netherlands and Pierre Courtiol from France. The challenge explored the potential of advanced machine learning methods to improve the significance of the Higgs discovery.   Winners of the Higgs Machine Learning Challenge: Gábor Melis and Tim Salimans (top row), Tianqi Chen and Tong He (bottom row). Participants in the Higgs Machine Learning Challenge were tasked with developing an algorithm to improve the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data* that contains few signal and a majority of non-Higgs boson “background” events. No knowledge of particle physics was required for the challenge but skills in machine learning - the training of computers to recognise patterns in data – were essential. The Challenge, hosted by Ka...

  8. Machine learning in sedimentation modelling.

    Science.gov (United States)

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    The paper presents machine learning (ML) models that predict sedimentation in the harbour basin of the Port of Rotterdam. The important factors affecting the sedimentation process such as waves, wind, tides, surge, river discharge, etc. are studied, the corresponding time series data is analysed, missing values are estimated and the most important variables behind the process are chosen as the inputs. Two ML methods are used: MLP ANN and M5 model tree. The latter is a collection of piece-wise linear regression models, each being an expert for a particular region of the input space. The models are trained on the data collected during 1992-1998 and tested by the data of 1999-2000. The predictive accuracy of the models is found to be adequate for the potential use in the operational decision making.

  9. Machine learning in motion control

    Science.gov (United States)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  10. Machine learning in motion control

    Science.gov (United States)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  11. Induction and physical theory formation by Machine Learning

    CERN Document Server

    Svozil, Alexander

    2016-01-01

    Machine learning presents a general, systematic framework for the generation of formal theoretical models for physical description and prediction. Tentatively standard linear modeling techniques are reviewed; followed by a brief discussion of generalizations to deep forward networks for approximating nonlinear phenomena.

  12. Building machine learning systems with Python

    CERN Document Server

    Richert, Willi

    2013-01-01

    This is a tutorial-driven and practical, but well-grounded book showcasing good Machine Learning practices. There will be an emphasis on using existing technologies instead of showing how to write your own implementations of algorithms. This book is a scenario-based, example-driven tutorial. By the end of the book you will have learnt critical aspects of Machine Learning Python projects and experienced the power of ML-based systems by actually working on them.This book primarily targets Python developers who want to learn about and build Machine Learning into their projects, or who want to pro

  13. Applied genetic programming and machine learning

    CERN Document Server

    Iba, Hitoshi; Paul, Topon Kumar

    2009-01-01

    What do financial data prediction, day-trading rule development, and bio-marker selection have in common? They are just a few of the tasks that could potentially be resolved with genetic programming and machine learning techniques. Written by leaders in this field, Applied Genetic Programming and Machine Learning delineates the extension of Genetic Programming (GP) for practical applications. Reflecting rapidly developing concepts and emerging paradigms, this book outlines how to use machine learning techniques, make learning operators that efficiently sample a search space, navigate the searc

  14. Adaptive Learning Systems: Beyond Teaching Machines

    Science.gov (United States)

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  15. International Conference on Extreme Learning Machines 2014

    CERN Document Server

    Mao, Kezhi; Cambria, Erik; Man, Zhihong; Toh, Kar-Ann

    2015-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2014, which was held in Singapore, December 8-10, 2014. This conference brought together the researchers and practitioners of Extreme Learning Machine (ELM) from a variety of fields to promote research and development of “learning without iterative tuning”.  The book covers theories, algorithms and applications of ELM. It gives the readers a glance of the most recent advances of ELM.  

  16. International Conference on Extreme Learning Machine 2015

    CERN Document Server

    Mao, Kezhi; Wu, Jonathan; Lendasse, Amaury; ELM 2015; Theory, Algorithms and Applications (I); Theory, Algorithms and Applications (II)

    2016-01-01

    This book contains some selected papers from the International Conference on Extreme Learning Machine 2015, which was held in Hangzhou, China, December 15-17, 2015. This conference brought together researchers and engineers to share and exchange R&D experience on both theoretical studies and practical applications of the Extreme Learning Machine (ELM) technique and brain learning. This book covers theories, algorithms ad applications of ELM. It gives readers a glance of the most recent advances of ELM. .

  17. Python for probability, statistics, and machine learning

    CERN Document Server

    Unpingco, José

    2016-01-01

    This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowl...

  18. An introduction to machine learning with Scikit-Learn

    CERN Document Server

    CERN. Geneva

    2015-01-01

    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.

  19. Machine Learning Techniques in Clinical Vision Sciences.

    Science.gov (United States)

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  20. Machine Learning and Cosmological Simulations

    Science.gov (United States)

    Kamdar, Harshil; Turk, Matthew; Brunner, Robert

    2016-01-01

    We explore the application of machine learning (ML) to the problem of galaxy formation and evolution in a hierarchical universe. Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively evaluating the extent of the influence of dark matter halo properties on small-scale structure formation. For our analyses, we use both semi-analytical models (Millennium simulation) and N-body + hydrodynamical simulations (Illustris simulation). The ML algorithms are trained on important dark matter halo properties (inputs) and galaxy properties (outputs). The trained models are able to robustly predict the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity. Moreover, the ML simulated galaxies obey fundamental observational constraints implying that the population of ML predicted galaxies is physically and statistically robust. Next, ML algorithms are trained on an N-body + hydrodynamical simulation and applied to an N-body only simulation (Dark Sky simulation, Illustris Dark), populating this new simulation with galaxies. We can examine how structure formation changes with different cosmological parameters and are able to mimic a full-blown hydrodynamical simulation in a computation time that is orders of magnitude smaller. We find that the set of ML simulated galaxies in Dark Sky obey the same observational constraints, further solidifying ML's place as an intriguing and promising technique in future galaxy formation studies and rapid mock galaxy catalog creation.

  1. Security Frameworks for Machine-to-Machine Devices and Networks

    Science.gov (United States)

    Demblewski, Michael

    Attacks against mobile systems have escalated over the past decade. There have been increases of fraud, platform attacks, and malware. The Internet of Things (IoT) offers a new attack vector for Cybercriminals. M2M contributes to the growing number of devices that use wireless systems for Internet connection. As new applications and platforms are created, old vulnerabilities are transferred to next-generation systems. There is a research gap that exists between the current approaches for security framework development and the understanding of how these new technologies are different and how they are similar. This gap exists because system designers, security architects, and users are not fully aware of security risks and how next-generation devices can jeopardize safety and personal privacy. Current techniques, for developing security requirements, do not adequately consider the use of new technologies, and this weakens countermeasure implementations. These techniques rely on security frameworks for requirements development. These frameworks lack a method for identifying next generation security concerns and processes for comparing, contrasting and evaluating non-human device security protections. This research presents a solution for this problem by offering a novel security framework that is focused on the study of the "functions and capabilities" of M2M devices and improves the systems development life cycle for the overall IoT ecosystem.

  2. Lane Detection Based on Machine Learning Algorithm

    National Research Council Canada - National Science Library

    Chao Fan; Jingbo Xu; Shuai Di

    2013-01-01

    In order to improve accuracy and robustness of the lane detection in complex conditions, such as the shadows and illumination changing, a novel detection algorithm was proposed based on machine learning...

  3. Implementing Machine Learning in the PCWG Tool

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    2016-12-13

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  4. Addressing uncertainty in atomistic machine learning

    DEFF Research Database (Denmark)

    Peterson, Andrew A.; Christensen, Rune; Khorshidi, Alireza

    2017-01-01

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predi......Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility...... of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We...... suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate...

  5. Addressing uncertainty in atomistic machine learning.

    Science.gov (United States)

    Peterson, Andrew A; Christensen, Rune; Khorshidi, Alireza

    2017-05-10

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate of the uncertainty when the width is comparable to that in the training data. Intriguingly, we also show that the uncertainty can be localized to specific atoms in the simulation, which may offer hints for the generation of training data to strategically improve the machine-learned representation.

  6. MACHINE LEARNING TECHNIQUES USED IN BIG DATA

    Directory of Open Access Journals (Sweden)

    STEFANIA LOREDANA NITA

    2016-07-01

    Full Text Available The classical tools used in data analysis are not enough in order to benefit of all advantages of big data. The amount of information is too large for a complete investigation, and the possible connections and relations between data could be missed, because it is difficult or even impossible to verify all assumption over the information. Machine learning is a great solution in order to find concealed correlations or relationships between data, because it runs at scale machine and works very well with large data sets. The more data we have, the more the machine learning algorithm is useful, because it “learns” from the existing data and applies the found rules on new entries. In this paper, we present some machine learning algorithms and techniques used in big data.

  7. Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients.

    Directory of Open Access Journals (Sweden)

    Andres Colubri

    2016-03-01

    Full Text Available Assessment of the response to the 2014-15 Ebola outbreak indicates the need for innovations in data collection, sharing, and use to improve case detection and treatment. Here we introduce a Machine Learning pipeline for Ebola Virus Disease (EVD prognosis prediction, which packages the best models into a mobile app to be available in clinical care settings. The pipeline was trained on a public EVD clinical dataset, from 106 patients in Sierra Leone.We used a new tool for exploratory analysis, Mirador, to identify the most informative clinical factors that correlate with EVD outcome. The small sample size and high prevalence of missing records were significant challenges. We applied multiple imputation and bootstrap sampling to address missing data and quantify overfitting. We trained several predictors over all combinations of covariates, which resulted in an ensemble of predictors, with and without viral load information, with an area under the receiver operator characteristic curve of 0.8 or more, after correcting for optimistic bias. We ranked the predictors by their F1-score, and those above a set threshold were compiled into a mobile app, Ebola CARE (Computational Assignment of Risk Estimates.This method demonstrates how to address small sample sizes and missing data, while creating predictive models that can be readily deployed to assist treatment in future outbreaks of EVD and other infectious diseases. By generating an ensemble of predictors instead of relying on a single model, we are able to handle situations where patient data is partially available. The prognosis app can be updated as new data become available, and we made all the computational protocols fully documented and open-sourced to encourage timely data sharing, independent validation, and development of better prediction models in outbreak response.

  8. Outsmarting neural networks: an alternative paradigm for machine learning

    Energy Technology Data Exchange (ETDEWEB)

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  9. Teraflop-scale Incremental Machine Learning

    CERN Document Server

    Özkural, Eray

    2011-01-01

    We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-using previous solutions, learning programming idioms and discovery of frequent subprograms. Experiments with two training sequences demonstrate that our approach to incremental learning is effective.

  10. Machine learning a Bayesian and optimization perspective

    CERN Document Server

    Theodoridis, Sergios

    2015-01-01

    This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...

  11. Machine Learning Phases of Strongly Correlated Fermions

    Directory of Open Access Journals (Sweden)

    Kelvin Ch’ng

    2017-08-01

    Full Text Available Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling. We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  12. Machine Learning Phases of Strongly Correlated Fermions

    Science.gov (United States)

    Ch'ng, Kelvin; Carrasquilla, Juan; Melko, Roger G.; Khatami, Ehsan

    2017-07-01

    Machine learning offers an unprecedented perspective for the problem of classifying phases in condensed matter physics. We employ neural-network machine learning techniques to distinguish finite-temperature phases of the strongly correlated fermions on cubic lattices. We show that a three-dimensional convolutional network trained on auxiliary field configurations produced by quantum Monte Carlo simulations of the Hubbard model can correctly predict the magnetic phase diagram of the model at the average density of one (half filling). We then use the network, trained at half filling, to explore the trend in the transition temperature as the system is doped away from half filling. This transfer learning approach predicts that the instability to the magnetic phase extends to at least 5% doping in this region. Our results pave the way for other machine learning applications in correlated quantum many-body systems.

  13. Machine learning: Trends, perspectives, and prospects.

    Science.gov (United States)

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.

  14. Machine Learning and Cosmological Simulations II: Hydrodynamical Simulations

    CERN Document Server

    Kamdar, Harshil M; Brunner, Robert J

    2015-01-01

    We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris Simulation to train and test various sophisticated machine learning algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydro...

  15. Machine Learning and Cosmological Simulations I: Semi-Analytical Models

    OpenAIRE

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2015-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various so...

  16. Geometry Algorisms of Dynkin Diagrams in Lie Group Machine Learning

    Institute of Scientific and Technical Information of China (English)

    Huan Xu; Fanzhang Li

    2006-01-01

    This paper uses the geometric method to describe Lie group machine learning (LML)based on the theoretical framework of LML, which gives the geometric algorithms of Dynkin diagrams in LML. It includes the basic conceptions of Dynkin diagrams in LML ,the classification theorems of Dynkin diagrams in LML, the classification algorithm of Dynkin diagrams in LML and the verification of the classification algorithm with experimental results.

  17. Deep Extreme Learning Machine and Its Application in EEG Classification

    OpenAIRE

    Shifei Ding; Nan Zhang; Xinzheng Xu; Lili Guo; Jian Zhang

    2015-01-01

    Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM) is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM appr...

  18. Deep Extreme Learning Machine and Its Application in EEG Classification

    OpenAIRE

    2015-01-01

    Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM) is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM appr...

  19. Machine Learning for Biological Trajectory Classification Applications

    Science.gov (United States)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  20. Heterogeneous versus Homogeneous Machine Learning Ensembles

    Directory of Open Access Journals (Sweden)

    Petrakova Aleksandra

    2015-12-01

    Full Text Available The research demonstrates efficiency of the heterogeneous model ensemble application for a cancer diagnostic procedure. Machine learning methods used for the ensemble model training are neural networks, random forest, support vector machine and offspring selection genetic algorithm. Training of models and the ensemble design is performed by means of HeuristicLab software. The data used in the research have been provided by the General Hospital of Linz, Austria.

  1. Machine learning for identifying botnet network traffic

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2013-01-01

    . Due to promise of non-invasive and resilient detection, botnet detection based on network traffic analysis has drawn a special attention of the research community. Furthermore, many authors have turned their attention to the use of machine learning algorithms as the mean of inferring botnet......-related knowledge from the monitored traffic. This paper presents a review of contemporary botnet detection methods that use machine learning as a tool of identifying botnet-related traffic. The main goal of the paper is to provide a comprehensive overview on the field by summarizing current scientific efforts....... The contribution of the paper is three-fold. First, the paper provides a detailed insight on the existing detection methods by investigating which bot-related heuristic were assumed by the detection systems and how different machine learning techniques were adapted in order to capture botnet-related knowledge...

  2. Machine learning paradigms applications in recommender systems

    CERN Document Server

    Lampropoulos, Aristomenis S

    2015-01-01

    This timely book presents Applications in Recommender Systems which are making recommendations using machine learning algorithms trained via examples of content the user likes or dislikes. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare. It is exactly this problem that the authors address in the monograph at hand. Specifically, the books approach is based on one-class classification methodologies that have been appearing in recent machine learning research. The blending of recommender systems and one-class classification provides a new very fertile field for research, innovation and development with potential applications in “big data” as well as “sparse data” problems. The book will be useful to researchers, practitioners and graduate students dealing with problems of extensive and complex data. It is intended for both the expert/researcher in the fields of Pattern Recognition, Machine Learning and ...

  3. Application of Machine Learning to Rotorcraft Health Monitoring

    Science.gov (United States)

    Cody, Tyler; Dempsey, Paula J.

    2017-01-01

    Machine learning is a powerful tool for data exploration and model building with large data sets. This project aimed to use machine learning techniques to explore the inherent structure of data from rotorcraft gear tests, relationships between features and damage states, and to build a system for predicting gear health for future rotorcraft transmission applications. Classical machine learning techniques are difficult, if not irresponsible to apply to time series data because many make the assumption of independence between samples. To overcome this, Hidden Markov Models were used to create a binary classifier for identifying scuffing transitions and Recurrent Neural Networks were used to leverage long distance relationships in predicting discrete damage states. When combined in a workflow, where the binary classifier acted as a filter for the fatigue monitor, the system was able to demonstrate accuracy in damage state prediction and scuffing identification. The time dependent nature of the data restricted data exploration to collecting and analyzing data from the model selection process. The limited amount of available data was unable to give useful information, and the division of training and testing sets tended to heavily influence the scores of the models across combinations of features and hyper-parameters. This work built a framework for tracking scuffing and fatigue on streaming data and demonstrates that machine learning has much to offer rotorcraft health monitoring by using Bayesian learning and deep learning methods to capture the time dependent nature of the data. Suggested future work is to implement the framework developed in this project using a larger variety of data sets to test the generalization capabilities of the models and allow for data exploration.

  4. Machine Learning Optimization of Evolvable Artificial Cells

    DEFF Research Database (Denmark)

    Caschera, F.; Rasmussen, S.; Hanczyc, M.

    2011-01-01

    can be explored. A machine learning approach (Evo-DoE) could be applied to explore this experimental space and define optimal interactions according to a specific fitness function. Herein an implementation of an evolutionary design of experiments to optimize chemical and biochemical systems based...... on a machine learning process is presented. The optimization proceeds over generations of experiments in iterative loop until optimal compositions are discovered. The fitness function is experimentally measured every time the loop is closed. Two examples of complex systems, namely a liposomal drug formulation...

  5. Paradigms for Realizing Machine Learning Algorithms.

    Science.gov (United States)

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data.

  6. A Machine Learning Approach to Automated Negotiation

    Institute of Scientific and Technical Information of China (English)

    Zhang Huaxiang(张化祥); Zhang Liang; Huang Shangteng; Ma Fanyuan

    2004-01-01

    Automated negotiation between two competitive agents is analyzed, and a multi-issue negotiation model based on machine learning, time belief, offer belief and state-action pair expected Q value is developed. Unlike the widely used approaches such as game theory approach, heuristic approach and argumentation approach, This paper uses a machine learning method to compute agents' average Q values in each negotiation stage. The delayed reward is used to generate agents' offer and counteroffer of every issue. The effect of time and discount rate on negotiation outcome is analyzed. Theory analysis and experimental data show this negotiation model is practical.

  7. Machine learning methods for nanolaser characterization

    CERN Document Server

    Zibar, Darko; Winther, Ole; Moerk, Jesper; Schaeffer, Christian

    2016-01-01

    Nanocavity lasers, which are an integral part of an on-chip integrated photonic network, are setting stringent requirements on the sensitivity of the techniques used to characterize the laser performance. Current characterization tools cannot provide detailed knowledge about nanolaser noise and dynamics. In this progress article, we will present tools and concepts from the Bayesian machine learning and digital coherent detection that offer novel approaches for highly-sensitive laser noise characterization and inference of laser dynamics. The goal of the paper is to trigger new research directions that combine the fields of machine learning and nanophotonics for characterizing nanolasers and eventually integrated photonic networks

  8. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    Science.gov (United States)

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  9. Machine Learning Methods for Attack Detection in the Smart Grid.

    Science.gov (United States)

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  10. Application of machine learning in SNP discovery

    Directory of Open Access Journals (Sweden)

    Cregan Perry B

    2006-01-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNP constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures. Results The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False. The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites (12 Mb in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb. SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV (i.e., fraction of candidate SNP being real were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes. Conclusion A machine learning (ML method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study

  11. Machine learning with quantum relative entropy

    Energy Technology Data Exchange (ETDEWEB)

    Tsuda, Koji [Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tuebingen, 72076 (Germany)], E-mail: koji.tsuda@tuebingen.mpg.de

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  12. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  13. Photometric Supernova Classification With Machine Learning

    CERN Document Server

    Lochner, Michelle; Peiris, Hiranya V; Lahav, Ofer; Winter, Max K

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Telescope (LSST), given that spectroscopic confirmation of type for all supernovae discovered with these surveys will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques fitting parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks and boosted decision trees. We test the pipeline on simulated multi-ba...

  14. Data Mining and Machine Learning in Astronomy

    CERN Document Server

    Ball, Nicholas M

    2009-01-01

    We review the current state of data mining and machine learning in Astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. On the one hand, it is a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, which promises almost limitless scientific advances. On the other, it can be the application of black-box computing algorithms that at best give little physical insight, and at worst provide questionable results. Here, we give an overview of the entire data mining process, from data collection through the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines; applications from a broad range of Astronomy, with an emphasis on those where data mining resulted in improved physical insights, and important current and future directions, including the construction of full probability density functions, parallel algorithm...

  15. Rule based systems for big data a machine learning approach

    CERN Document Server

    Liu, Han; Cocea, Mihaela

    2016-01-01

    The ideas introduced in this book explore the relationships among rule based systems, machine learning and big data. Rule based systems are seen as a special type of expert systems, which can be built by using expert knowledge or learning from real data. The book focuses on the development and evaluation of rule based systems in terms of accuracy, efficiency and interpretability. In particular, a unified framework for building rule based systems, which consists of the operations of rule generation, rule simplification and rule representation, is presented. Each of these operations is detailed using specific methods or techniques. In addition, this book also presents some ensemble learning frameworks for building ensemble rule based systems.

  16. Tracking by Machine Learning Methods

    CERN Document Server

    Jofrehei, Arash

    2015-01-01

    Current track reconstructing methods start with two points and then for each layer loop through all possible hits to find proper hits to add to that track. Another idea would be to use this large number of already reconstructed events and/or simulated data and train a machine on this data to find tracks given hit pixels. Training time could be long but real time tracking is really fast Simulation might not be as realistic as real data but tacking has been done for that with 100 percent efficiency while by using real data we would probably be limited to current efficiency.

  17. Supporting visual quality assessment with machine learning

    NARCIS (Netherlands)

    Gastaldo, P.; Zunino, R.; Redi, J.

    2013-01-01

    Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality

  18. The ATLAS Higgs Machine Learning Challenge

    CERN Document Server

    Cowan, Glen; The ATLAS collaboration; Bourdarios, Claire

    2015-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 1990s with Artificial Neural Net and more recently with Boosted Decision Trees, Random Forest etc. Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, data scientists are developing new Machine Learning algorithms to extract meaning from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, and at the same time data scientists have advanced algorithms: the goal of the HiggsML project was to bring the two together by a “challenge”: participants from all over the world and any scientific background could compete online to obtain the best Higgs to tau tau signal significance on a set of ATLAS fully simulated Monte Carlo signal and background. Instead of HEP physicists browsing through machine learning papers and trying to infer which new algorithms might be useful for HEP, then c...

  19. Fast, Continuous Audiogram Estimation Using Machine Learning.

    Science.gov (United States)

    Song, Xinyu D; Wallace, Brittany M; Gardner, Jacob R; Ledbetter, Noah M; Weinberger, Kilian Q; Barbour, Dennis L

    2015-01-01

    Pure-tone audiometry has been a staple of hearing assessments for decades. Many different procedures have been proposed for measuring thresholds with pure tones by systematically manipulating intensity one frequency at a time until a discrete threshold function is determined. The authors have developed a novel nonparametric approach for estimating a continuous threshold audiogram using Bayesian estimation and machine learning classification. The objective of this study was to assess the accuracy and reliability of this new method relative to a commonly used threshold measurement technique. The authors performed air conduction pure-tone audiometry on 21 participants between the ages of 18 and 90 years with varying degrees of hearing ability. Two repetitions of automated machine learning audiogram estimation and one repetition of conventional modified Hughson-Westlake ascending-descending audiogram estimation were acquired by an audiologist. The estimated hearing thresholds of these two techniques were compared at standard audiogram frequencies (i.e., 0.25, 0.5, 1, 2, 4, 8 kHz). The two threshold estimate methods delivered very similar estimates at standard audiogram frequencies. Specifically, the mean absolute difference between estimates was 4.16 ± 3.76 dB HL. The mean absolute difference between repeated measurements of the new machine learning procedure was 4.51 ± 4.45 dB HL. These values compare favorably with those of other threshold audiogram estimation procedures. Furthermore, the machine learning method generated threshold estimates from significantly fewer samples than the modified Hughson-Westlake procedure while returning a continuous threshold estimate as a function of frequency. The new machine learning audiogram estimation technique produces continuous threshold audiogram estimates accurately, reliably, and efficiently, making it a strong candidate for widespread application in clinical and research audiometry.

  20. Machine learning based Intelligent cognitive network using fog computing

    Science.gov (United States)

    Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

    2017-05-01

    In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.

  1. Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics

    Science.gov (United States)

    Hristopulos, Dionissios T.

    2015-12-01

    Machine learning and geostatistics are powerful mathematical frameworks for modeling spatial data. Both approaches, however, suffer from poor scaling of the required computational resources for large data applications. We present the Stochastic Local Interaction (SLI) model, which employs a local representation to improve computational efficiency. SLI combines geostatistics and machine learning with ideas from statistical physics and computational geometry. It is based on a joint probability density function defined by an energy functional which involves local interactions implemented by means of kernel functions with adaptive local kernel bandwidths. SLI is expressed in terms of an explicit, typically sparse, precision (inverse covariance) matrix. This representation leads to a semi-analytical expression for interpolation (prediction), which is valid in any number of dimensions and avoids the computationally costly covariance matrix inversion.

  2. Evaluating the Security of Machine Learning Algorithms

    Science.gov (United States)

    2008-05-20

    description of this setting and several results appear in Cesa -Bianchi and Lugosi [2006]. 2.5 Summary In this chapter we have presented a framework for...Learning Research (JMLR), 3:993–1022, 2003. ISSN 1533-7928. Nicolò Cesa -Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University

  3. Machine learning a theoretical approach

    CERN Document Server

    Natarajan, Balas K

    2014-01-01

    This is the first comprehensive introduction to computational learning theory. The author's uniform presentation of fundamental results and their applications offers AI researchers a theoretical perspective on the problems they study. The book presents tools for the analysis of probabilistic models of learning, tools that crisply classify what is and is not efficiently learnable. After a general introduction to Valiant's PAC paradigm and the important notion of the Vapnik-Chervonenkis dimension, the author explores specific topics such as finite automata and neural networks. The presentation

  4. Interface Metaphors for Interactive Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Jasper, Robert J.; Blaha, Leslie M.

    2017-07-14

    To promote more interactive and dynamic machine learn- ing, we revisit the notion of user-interface metaphors. User-interface metaphors provide intuitive constructs for supporting user needs through interface design elements. A user-interface metaphor provides a visual or action pattern that leverages a user’s knowledge of another domain. Metaphors suggest both the visual representations that should be used in a display as well as the interactions that should be afforded to the user. We argue that user-interface metaphors can also offer a method of extracting interaction-based user feedback for use in machine learning. Metaphors offer indirect, context-based information that can be used in addition to explicit user inputs, such as user-provided labels. Implicit information from user interactions with metaphors can augment explicit user input for active learning paradigms. Or it might be leveraged in systems where explicit user inputs are more challenging to obtain. Each interaction with the metaphor provides an opportunity to gather data and learn. We argue this approach is especially important in streaming applications, where we desire machine learning systems that can adapt to dynamic, changing data.

  5. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  6. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  7. Financial signal processing and machine learning

    CERN Document Server

    Kulkarni,Sanjeev R; Dmitry M. Malioutov

    2016-01-01

    The modern financial industry has been required to deal with large and diverse portfolios in a variety of asset classes often with limited market data available. Financial Signal Processing and Machine Learning unifies a number of recent advances made in signal processing and machine learning for the design and management of investment portfolios and financial engineering. This book bridges the gap between these disciplines, offering the latest information on key topics including characterizing statistical dependence and correlation in high dimensions, constructing effective and robust risk measures, and their use in portfolio optimization and rebalancing. The book focuses on signal processing approaches to model return, momentum, and mean reversion, addressing theoretical and implementation aspects. It highlights the connections between portfolio theory, sparse learning and compressed sensing, sparse eigen-portfolios, robust optimization, non-Gaussian data-driven risk measures, graphical models, causal analy...

  8. Teaching an Old Log New Tricks with Machine Learning.

    Science.gov (United States)

    Schnell, Krista; Puri, Colin; Mahler, Paul; Dukatz, Carl

    2014-03-01

    To most people, the log file would not be considered an exciting area in technology today. However, these relatively benign, slowly growing data sources can drive large business transformations when combined with modern-day analytics. Accenture Technology Labs has built a new framework that helps to expand existing vendor solutions to create new methods of gaining insights from these benevolent information springs. This framework provides a systematic and effective machine-learning mechanism to understand, analyze, and visualize heterogeneous log files. These techniques enable an automated approach to analyzing log content in real time, learning relevant behaviors, and creating actionable insights applicable in traditionally reactive situations. Using this approach, companies can now tap into a wealth of knowledge residing in log file data that is currently being collected but underutilized because of its overwhelming variety and volume. By using log files as an important data input into the larger enterprise data supply chain, businesses have the opportunity to enhance their current operational log management solution and generate entirely new business insights-no longer limited to the realm of reactive IT management, but extending from proactive product improvement to defense from attacks. As we will discuss, this solution has immediate relevance in the telecommunications and security industries. However, the most forward-looking companies can take it even further. How? By thinking beyond the log file and applying the same machine-learning framework to other log file use cases (including logistics, social media, and consumer behavior) and any other transactional data source.

  9. Machine learning analysis of binaural rowing sounds

    DEFF Research Database (Denmark)

    Johard, Leonard; Ruffaldi, Emanuele; Hoffmann, Pablo F.

    2011-01-01

    Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition metho...... methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train....

  10. Machine Learning Analysis of Binaural Rowing Sounds

    Directory of Open Access Journals (Sweden)

    Filippeschi Alessandro

    2011-12-01

    Full Text Available Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train.

  11. Machine Learning Assessments of Soil Drying

    Science.gov (United States)

    Coopersmith, E. J.; Minsker, B. S.; Wenzel, C.; Gilmore, B. J.

    2011-12-01

    Agricultural activities require the use of heavy equipment and vehicles on unpaved farmlands. When soil conditions are wet, equipment can cause substantial damage, leaving deep ruts. In extreme cases, implements can sink and become mired, causing considerable delays and expense to extricate the equipment. Farm managers, who are often located remotely, cannot assess sites before allocating equipment, causing considerable difficulty in reliably assessing conditions of countless sites with any reliability and frequency. For example, farmers often trace serpentine paths of over one hundred miles each day to assess the overall status of various tracts of land spanning thirty, forty, or fifty miles in each direction. One means of assessing the moisture content of a field lies in the strategic positioning of remotely-monitored in situ sensors. Unfortunately, land owners are often reluctant to place sensors across their properties due to the significant monetary cost and complexity. This work aspires to overcome these limitations by modeling the process of wetting and drying statistically - remotely assessing field readiness using only information that is publically accessible. Such data includes Nexrad radar and state climate network sensors, as well as Twitter-based reports of field conditions for validation. Three algorithms, classification trees, k-nearest-neighbors, and boosted perceptrons are deployed to deliver statistical field readiness assessments of an agricultural site located in Urbana, IL. Two of the three algorithms performed with 92-94% accuracy, with the majority of misclassifications falling within the calculated margins of error. This demonstrates the feasibility of using a machine learning framework with only public data, knowledge of system memory from previous conditions, and statistical tools to assess "readiness" without the need for real-time, on-site physical observation. Future efforts will produce a workflow assimilating Nexrad, climate network

  12. Machine Learning for Computer Vision

    CERN Document Server

    Battiato, Sebastiano; Farinella, Giovanni

    2013-01-01

    Computer vision is the science and technology of making machines that see. It is concerned with the theory, design and implementation of algorithms that can automatically process visual data to recognize objects, track and recover their shape and spatial layout. The International Computer Vision Summer School - ICVSS was established in 2007 to provide both an objective and clear overview and an in-depth analysis of the state-of-the-art research in Computer Vision. The courses are delivered by world renowned experts in the field, from both academia and industry, and cover both theoretical and practical aspects of real Computer Vision problems. The school is organized every year by University of Cambridge (Computer Vision and Robotics Group) and University of Catania (Image Processing Lab). Different topics are covered each year. A summary of the past Computer Vision Summer Schools can be found at: http://www.dmi.unict.it/icvss This edited volume contains a selection of articles covering some of the talks and t...

  13. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

    Science.gov (United States)

    Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao

    2017-09-04

    Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Machine learning in geosciences and remote sensing

    Institute of Scientific and Technical Information of China (English)

    David J. Lary; Amir H. Alavi; Amir H. Gandomi; Annette L. Walker

    2016-01-01

    Learning incorporates a broad range of complex procedures. Machine learning (ML) is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc.) that can provide multivariate, nonlinear, nonparametric regres-sion or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the ef-ficiency of ML for tackling the geosciences and remote sensing problems.

  15. Machine learning in geosciences and remote sensing

    Directory of Open Access Journals (Sweden)

    David J. Lary

    2016-01-01

    Full Text Available Learning incorporates a broad range of complex procedures. Machine learning (ML is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc. that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems.

  16. Machine learning analysis of binaural rowing sounds

    DEFF Research Database (Denmark)

    Johard, Leonard; Ruffaldi, Emanuele; Hoffmann, Pablo F.

    2011-01-01

    Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition metho...... methodology and the evaluation of different machine learning techniques for classifying rowing-sound data. We see that a combination of principal component analysis and shallow networks perform equally well as deep architectures, while being much faster to train.......Techniques for machine hearing are increasing their potentiality due to new application domains. In this work we are addressing the analysis of rowing sounds in natural context for the purpose of supporting a training system based on virtual environments. This paper presents the acquisition...

  17. An e-Learning Theoretical Framework

    Science.gov (United States)

    Aparicio, Manuela; Bacao, Fernando; Oliveira, Tiago

    2016-01-01

    E-learning systems have witnessed a usage and research increase in the past decade. This article presents the e-learning concepts ecosystem. It summarizes the various scopes on e-learning studies. Here we propose an e-learning theoretical framework. This theory framework is based upon three principal dimensions: users, technology, and services…

  18. An e-Learning Theoretical Framework

    Science.gov (United States)

    Aparicio, Manuela; Bacao, Fernando; Oliveira, Tiago

    2016-01-01

    E-learning systems have witnessed a usage and research increase in the past decade. This article presents the e-learning concepts ecosystem. It summarizes the various scopes on e-learning studies. Here we propose an e-learning theoretical framework. This theory framework is based upon three principal dimensions: users, technology, and services…

  19. Machine learning for vessel trajectories using compression, alignments and domain knowledge

    NARCIS (Netherlands)

    de Vries, G.K.D.; van Someren, M.

    2012-01-01

    In this paper we present a machine learning framework to analyze moving object trajectories from maritime vessels. Within this framework we perform the tasks of clustering, classification and outlier detection with vessel trajectory data. First, we apply a piecewise linear segmentation method to the

  20. Machine Learning with Operational Costs

    CERN Document Server

    Tulabandhula, Theja

    2011-01-01

    This work concerns the way that statistical models are used to make decisions. In particular, we aim to merge the way estimation algorithms are designed with how they are used for a subsequent task. Our methodology considers the operational cost of carrying out a policy, based on a predictive model. The operational cost becomes a regularization term in the learning algorithm's objective function, allowing either an \\textit{optimistic} or \\textit{pessimistic} view of possible costs. Limiting the operational cost reduces the hypothesis space for the predictive model, and can thus improve generalization. We show that different types of operational problems can lead to the same type of restriction on the hypothesis space, namely the restriction to an intersection of an $\\ell_{q}$ ball with a halfspace. We bound the complexity of such hypothesis spaces by proposing a technique that involves counting integer points in polyhedrons.

  1. Discharge estimation based on machine learning

    Institute of Scientific and Technical Information of China (English)

    Zhu JIANG; Hui-yan WANG; Wen-wu SONG

    2013-01-01

    To overcome the limitations of the traditional stage-discharge models in describing the dynamic characteristics of a river, a machine learning method of non-parametric regression, the locally weighted regression method was used to estimate discharge. With the purpose of improving the precision and efficiency of river discharge estimation, a novel machine learning method is proposed:the clustering-tree weighted regression method. First, the training instances are clustered. Second, the k-nearest neighbor method is used to cluster new stage samples into the best-fit cluster. Finally, the daily discharge is estimated. In the estimation process, the interference of irrelevant information can be avoided, so that the precision and efficiency of daily discharge estimation are improved. Observed data from the Luding Hydrological Station were used for testing. The simulation results demonstrate that the precision of this method is high. This provides a new effective method for discharge estimation.

  2. Network anomaly detection a machine learning perspective

    CERN Document Server

    Bhattacharyya, Dhruba Kumar

    2013-01-01

    With the rapid rise in the ubiquity and sophistication of Internet technology and the accompanying growth in the number of network attacks, network intrusion detection has become increasingly important. Anomaly-based network intrusion detection refers to finding exceptional or nonconforming patterns in network traffic data compared to normal behavior. Finding these anomalies has extensive applications in areas such as cyber security, credit card and insurance fraud detection, and military surveillance for enemy activities. Network Anomaly Detection: A Machine Learning Perspective presents mach

  3. Ozone ensemble forecast with machine learning algorithms

    OpenAIRE

    Mallet, Vivien; Stoltz, Gilles; Mauricette, Boris

    2009-01-01

    International audience; We apply machine learning algorithms to perform sequential aggregation of ozone forecasts. The latter rely on a multimodel ensemble built for ozone forecasting with the modeling system Polyphemus. The ensemble simulations are obtained by changes in the physical parameterizations, the numerical schemes, and the input data to the models. The simulations are carried out for summer 2001 over western Europe in order to forecast ozone daily peaks and ozone hourly concentrati...

  4. The ATLAS Higgs machine learning challenge

    CERN Document Server

    Davey, W; The ATLAS collaboration; Rousseau, D; Cowan, G; Kegl, B; Germain-Renaud, C; Guyon, I

    2014-01-01

    High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 90's with Artificial Neural Net for example, more recently with Boosted Decision Trees, Random Forest etc... Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, Data Scientists are developing new Machine Learning algorithms to extract sense from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, data scientists have advanced algorithms: the goal of the HiggsML project is to bring the two together by a “challenge”: participants from all over the world and any scientific background can compete online ( https://www.kaggle.com/c/higgs-boson ) to obtain the best Higgs to tau tau signal significance on a set of ATLAS full simulated Monte Carlo signal and background. Winners with the best scores will receive money prizes ; authors of the best method (most usable) will be invited t...

  5. Quantum Loop Topography for Machine Learning

    Science.gov (United States)

    Zhang, Yi; Kim, Eun-Ah

    2017-05-01

    Despite rapidly growing interest in harnessing machine learning in the study of quantum many-body systems, training neural networks to identify quantum phases is a nontrivial challenge. The key challenge is in efficiently extracting essential information from the many-body Hamiltonian or wave function and turning the information into an image that can be fed into a neural network. When targeting topological phases, this task becomes particularly challenging as topological phases are defined in terms of nonlocal properties. Here, we introduce quantum loop topography (QLT): a procedure of constructing a multidimensional image from the "sample" Hamiltonian or wave function by evaluating two-point operators that form loops at independent Monte Carlo steps. The loop configuration is guided by the characteristic response for defining the phase, which is Hall conductivity for the cases at hand. Feeding QLT to a fully connected neural network with a single hidden layer, we demonstrate that the architecture can be effectively trained to distinguish the Chern insulator and the fractional Chern insulator from trivial insulators with high fidelity. In addition to establishing the first case of obtaining a phase diagram with a topological quantum phase transition with machine learning, the perspective of bridging traditional condensed matter theory with machine learning will be broadly valuable.

  6. Machine Learning for Flood Prediction in Google Earth Engine

    Science.gov (United States)

    Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

    2015-12-01

    With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.

  7. Ensemble Machine Learning Methods and Applications

    CERN Document Server

    Ma, Yunqian

    2012-01-01

    It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed “ensemble learning” by researchers in computational intelligence and machine learning, it is known to improve a decision system’s robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as “boosting” and “random forest” facilitate solutions to key computational issues such as face detection and are now being applied in areas as diverse as object trackingand bioinformatics.   Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs. At once a solid theoretical study and a practical guide, the volume is a windfall for r...

  8. WEB MINING BASED FRAMEWORK FOR ONTOLOGY LEARNING

    Directory of Open Access Journals (Sweden)

    C.Ramesh

    2015-07-01

    Full Text Available Today, the notion of Semantic Web has emerged as a prominent solution to the problem of organizing the immense information provided by World Wide Web, and its focus on supporting a better co-operation between humans and machines is noteworthy. Ontology forms the major component of Semantic Web in its realization. However, manual method of ontology construction is time-consuming, costly, error-prone and inflexible to change and in addition, it requires a complete participation of knowledge engineer or domain expert. To address this issue, researchers hoped that a semi-automatic or automatic process would result in faster and better ontology construction and enrichment. Ontology learning has become recently a major area of research, whose goal is to facilitate construction of ontologies, which reduces the effort in developing ontology for a new domain. However, there are few research studies that attempt to construct ontology from semi-structured Web pages. In this paper, we present a complete framework for ontology learning that facilitates the semi-automation of constructing and enriching web site ontology from semi structured Web pages. The proposed framework employs Web Content Mining and Web Usage mining in extracting conceptual relationship from Web. The main idea behind this concept was to incorporate the web author's ideas as well as web users’ intentions in the ontology development and its evolution.

  9. The Couzens Machine. A Computerized Learning Exchange. Final Report, 1973-74.

    Science.gov (United States)

    Davis, Ken, Comp.; Libengood, Richard, Comp.

    The Couzens Machine is a computerized learning exchange and information service developed for the residents of Couzens Hall, a dormitory at the University of Michigan. Organized as a collective within the framework of a course and supported by an instructional development grant from the Center for Research on Learning and Teaching, the Couzens…

  10. Deep Extreme Learning Machine and Its Application in EEG Classification

    Directory of Open Access Journals (Sweden)

    Shifei Ding

    2015-01-01

    Full Text Available Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM approximate the complicated function but it also does not need to iterate during the training process. We combining with MLELM and extreme learning machine with kernel (KELM put forward deep extreme learning machine (DELM and apply it to EEG classification in this paper. This paper focuses on the application of DELM in the classification of the visual feedback experiment, using MATLAB and the second brain-computer interface (BCI competition datasets. By simulating and analyzing the results of the experiments, effectiveness of the application of DELM in EEG classification is confirmed.

  11. Scalable Machine Learning for Massive Astronomical Datasets

    Science.gov (United States)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  12. A Unified Active Learning Framework for Biomedical Relation Extraction

    Institute of Scientific and Technical Information of China (English)

    Hong-Tao Zhang; Min-Lie Huang; Xiao-Yan Zhu

    2012-01-01

    Supervised machine learning methods have been employed with great success in the task of biomedical relation extraction.However,existing methods are not practical enough,since manual construction of large training data is very expensive.Therefore,active learning is urgently needed for designing practical relation extraction methods with little human effort.In this paper,we describe a unified active learning framework.Particularly,our framework systematically addresses some practical issues during active learning process,including a strategy for selecting informative data,a data diversity selection algorithm,an active feature acquisition method,and an informative feature selection algorithm,in order to meet the challenges due to the immense amount of complex and diverse biomedical text.The framework is evaluated on protein-protein interaction (PPI) extraction and is shown to achieve promising results with a significant reduction in editorial effort and labeling time.

  13. Machine-learning-assisted materials discovery using failed experiments

    Science.gov (United States)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted

  14. Machine-learning-assisted materials discovery using failed experiments.

    Science.gov (United States)

    Raccuglia, Paul; Elbert, Katherine C; Adler, Philip D F; Falk, Casey; Wenny, Malia B; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A; Schrier, Joshua; Norquist, Alexander J

    2016-05-05

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions

  15. Using financial risk measures for analyzing generalization performance of machine learning models.

    Science.gov (United States)

    Takeda, Akiko; Kanamori, Takafumi

    2014-09-01

    We propose a unified machine learning model (UMLM) for two-class classification, regression and outlier (or novelty) detection via a robust optimization approach. The model embraces various machine learning models such as support vector machine-based and minimax probability machine-based classification and regression models. The unified framework makes it possible to compare and contrast existing learning models and to explain their differences and similarities. In this paper, after relating existing learning models to UMLM, we show some theoretical properties for UMLM. Concretely, we show an interpretation of UMLM as minimizing a well-known financial risk measure (worst-case value-at risk (VaR) or conditional VaR), derive generalization bounds for UMLM using such a risk measure, and prove that solving problems of UMLM leads to estimators with the minimized generalization bounds. Those theoretical properties are applicable to related existing learning models.

  16. Finding New Perovskite Halides via Machine learning

    Directory of Open Access Journals (Sweden)

    Ghanshyam ePilania

    2016-04-01

    Full Text Available Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning via building a support vector machine (SVM based classifier that uses elemental features (or descriptors to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  17. Finding New Perovskite Halides via Machine learning

    Science.gov (United States)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  18. BENCHMARKING MACHINE LEARNING TECHNIQUES FOR SOFTWARE DEFECT DETECTION

    Directory of Open Access Journals (Sweden)

    Saiqa Aleem

    2015-06-01

    Full Text Available Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques help developers to retrieve useful information after the classification and enable them to analyse data from different perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction. Results showed most of the machine learning methods performed well on software bug datasets.

  19. Entanglement-based machine learning on a quantum computer.

    Science.gov (United States)

    Cai, X-D; Wu, D; Su, Z-E; Chen, M-C; Wang, X-L; Li, Li; Liu, N-L; Lu, C-Y; Pan, J-W

    2015-03-20

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  20. Evaluation of Learning Materials: A Holistic Framework

    Science.gov (United States)

    Bundsgaard, Jeppe; Hansen, Thomas Illum

    2011-01-01

    This paper presents a holistic framework for evaluating learning materials and designs for learning. A holistic evaluation comprises investigations of the potential learning potential, the actualised learning potential, and the actual learning. Each aspect is explained and exemplified through theoretical models and definitions. (Contains 3 figures…

  1. Learning frameworks as an alternative to repositories

    DEFF Research Database (Denmark)

    Dalsgaard, Christian

    2005-01-01

    a learning object repository contains all kinds of materials, a learning framework consists of an organisation of materials related to a common theme. Further, a repository consists of single, self-contained objects, whereas a learning framework is an open-ended environment which presents a number...... of different possibilities and potentials for student activities....

  2. Machine Learning of Protein Interactions in Fungal Secretory Pathways

    Science.gov (United States)

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker’s yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  3. Machine Learning of Protein Interactions in Fungal Secretory Pathways.

    Directory of Open Access Journals (Sweden)

    Jana Kludas

    Full Text Available In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL, pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR in supervised and semi-supervised modes as well as output kernel trees (OK3. In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities.

  4. Using machine learning to emulate human hearing for predictive maintenance of equipment

    Science.gov (United States)

    Verma, Dinesh; Bent, Graham

    2017-05-01

    At the current time, interfaces between humans and machines use only a limited subset of senses that humans are capable of. The interaction among humans and computers can become much more intuitive and effective if we are able to use more senses, and create other modes of communicating between them. New machine learning technologies can make this type of interaction become a reality. In this paper, we present a framework for a holistic communication between humans and machines that uses all of the senses, and discuss how a subset of this capability can allow machines to talk to humans to indicate their health for various tasks such as predictive maintenance.

  5. Applying Machine Learning to Star Cluster Classification

    Science.gov (United States)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  6. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    Science.gov (United States)

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-09-12

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  7. Machine Learning and Cosmological Simulations I: Semi-Analytical Models

    CERN Document Server

    Kamdar, Harshil M; Brunner, Robert J

    2016-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analyzing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated machine learning algorithms (k-Nearest Neighbors, decision trees, random forests and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of $M>10^{12} M_{\\odot}$ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore...

  8. Machine Learning Algorithms in Web Page Classification

    Directory of Open Access Journals (Sweden)

    W.A.AWAD

    2012-11-01

    Full Text Available In this paper we use machine learning algorithms like SVM, KNN and GIS to perform a behaviorcomparison on the web pages classifications problem, from the experiment we see in the SVM with smallnumber of negative documents to build the centroids has the smallest storage requirement and the least online test computation cost. But almost all GIS with different number of nearest neighbors have an evenhigher storage requirement and on line test computation cost than KNN. This suggests that some futurework should be done to try to reduce the storage requirement and on list test cost of GIS.

  9. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    OpenAIRE

    C. V. Subbulakshmi; Deepa, S. N.

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learni...

  10. Predicting Networked Strategic Behavior via Machine Learning and Game Theory

    Science.gov (United States)

    2015-01-13

    Report: Predicting Networked Strategic Behavior via Machine Learning and Game Theory The views, opinions and/or findings contained in this report...2211 machine learning, game theory , microeconomics, behavioral data REPORT DOCUMENTATION PAGE 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 10. SPONSOR...Strategic Behavior via Machine Learning and Game Theory Report Title The funding for this project was used to develop basic models, methodology

  11. Performance of machine learning methods for classification tasks

    OpenAIRE

    B. Krithika; Dr. V. Ramalingam; Rajan, K

    2013-01-01

    In this paper, the performance of various machine learning methods on pattern classification and recognition tasks are proposed. The proposed method for evaluating performance will be based on the feature representation, feature selection and setting model parameters. The nature of the data, the methods of feature extraction and feature representation are discussed. The results of the Machine Learning algorithms on the classification task are analysed. The performance of Machine Learning meth...

  12. New Theoretical Frameworks for Machine Learning

    Science.gov (United States)

    2008-09-15

    distribution. In particular, the error rate (also called “0-1 loss”) of a given hypothesis f is defined as err(f) = errD(f) = Prx ∼D[f(x) 6= c∗(x)]. For any...two hypotheses f1, f2, the distance with respect to D between f1 and f2 is defined as d(f1, f2) = dD(f1, f2) = Prx ∼D[f1(x) 6= f2(x)]. We will use êrr...assumption to reduce the need for labeled data. In this case one could define compatibility of 〈f1, f2〉 with D as Prx ∼D[f1(x) = f2(x)], or the similar notions

  13. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  14. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier.

    Science.gov (United States)

    Subbulakshmi, C V; Deepa, S N

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  15. Machine Learning: developing an image recognition program : with Python, Scikit Learn and OpenCV

    OpenAIRE

    Nguyen, Minh

    2016-01-01

    Machine Learning is one of the most debated topic in computer world these days, especially after the first Computer Go program has beaten human Go world champion. Among endless application of Machine Learning, image recognition, which problem is processing enormous amount of data from dynamic input. This thesis will present the basic concept of Machine Learning, Machine Learning algorithms, Python programming language and Scikit Learn – a simple and efficient tool for data analysis in P...

  16. Machine learning techniques and drug design.

    Science.gov (United States)

    Gertrudes, J C; Maltarollo, V G; Silva, R A; Oliveira, P R; Honório, K M; da Silva, A B F

    2012-01-01

    The interest in the application of machine learning techniques (MLT) as drug design tools is growing in the last decades. The reason for this is related to the fact that the drug design is very complex and requires the use of hybrid techniques. A brief review of some MLT such as self-organizing maps, multilayer perceptron, bayesian neural networks, counter-propagation neural network and support vector machines is described in this paper. A comparison between the performance of the described methods and some classical statistical methods (such as partial least squares and multiple linear regression) shows that MLT have significant advantages. Nowadays, the number of studies in medicinal chemistry that employ these techniques has considerably increased, in particular the use of support vector machines. The state of the art and the future trends of MLT applications encompass the use of these techniques to construct more reliable QSAR models. The models obtained from MLT can be used in virtual screening studies as well as filters to develop/discovery new chemicals. An important challenge in the drug design field is the prediction of pharmacokinetic and toxicity properties, which can avoid failures in the clinical phases. Therefore, this review provides a critical point of view on the main MLT and shows their potential ability as a valuable tool in drug design.

  17. Accelerating the BSM interpretation of LHC data with machine learning

    CERN Document Server

    Bertone, Gianfranco; Kim, Jong Soo; Liem, Sebastian; de Austri, Roberto Ruiz; Welling, Max

    2016-01-01

    The interpretation of Large Hadron Collider (LHC) data in the framework of Beyond the Standard Model (BSM) theories is hampered by the need to run computationally expensive event generators and detector simulators. Performing statistically convergent scans of high-dimensional BSM theories is consequently challenging, and in practice unfeasible for very high-dimensional BSM theories. We present here a new machine learning method that accelerates the interpretation of LHC data, by learning the relationship between BSM theory parameters and data. As a proof-of-concept, we demonstrate that this technique accurately predicts natural SUSY signal events in two signal regions at the High Luminosity LHC, up to four orders of magnitude faster than standard techniques. The new approach makes it possible to rapidly and accurately reconstruct the theory parameters of complex BSM theories, should an excess in the data be discovered at the LHC.

  18. Distribution Learning in Evolutionary Strategies and Restricted Boltzmann Machines

    DEFF Research Database (Denmark)

    Krause, Oswin

    of the thesis is concerned with RBMs that are fitted to a dataset using maximum log-likelihood. As the computation of the distribution's normalization constant is intractable, Markov Chain Monte Carlo methods are required to estimate and follow the log-likelihood gradient. The thesis investigates...... the approximation properties of stacked RBMs used to model the distribution of real valued data. Further, estimation algorithms of the normalization constant of an RBM are compared and a theoretical framework is introduced from which a number of well known algorithms can be derived. Lastly, a method based......The thesis is concerned with learning distributions in the two settings of Evolutionary Strategies (ESs) and Restricted Boltzmann Machines (RBMs). In both cases, the distributions are learned from samples, albeit with different goals. Evolutionary Strategies are concerned with finding an optimum...

  19. A Framework for Distributed Deep Learning Layer Design in Python

    OpenAIRE

    McLeod, Clay

    2015-01-01

    In this paper, a framework for testing Deep Neural Network (DNN) design in Python is presented. First, big data, machine learning (ML), and Artificial Neural Networks (ANNs) are discussed to familiarize the reader with the importance of such a system. Next, the benefits and detriments of implementing such a system in Python are presented. Lastly, the specifics of the system are explained, and some experimental results are presented to prove the effectiveness of the system.

  20. Visual quality assessment by machine learning

    CERN Document Server

    Xu, Long; Kuo, C -C Jay

    2015-01-01

    The book encompasses the state-of-the-art visual quality assessment (VQA) and learning based visual quality assessment (LB-VQA) by providing a comprehensive overview of the existing relevant methods. It delivers the readers the basic knowledge, systematic overview and new development of VQA. It also encompasses the preliminary knowledge of Machine Learning (ML) to VQA tasks and newly developed ML techniques for the purpose. Hence, firstly, it is particularly helpful to the beginner-readers (including research students) to enter into VQA field in general and LB-VQA one in particular. Secondly, new development in VQA and LB-VQA particularly are detailed in this book, which will give peer researchers and engineers new insights in VQA.

  1. Learning frameworks as an alternative to repositories

    DEFF Research Database (Denmark)

    Dalsgaard, Christian

    2005-01-01

    This paper presents the concept of ‘learning frameworks’. The purpose of the paper is to discuss and question collections of digital learning objects in large repositories and to argue for large learning frameworks which organise a number of thematically related digital learning materials. Whereas...

  2. A Fast Reduced Kernel Extreme Learning Machine.

    Science.gov (United States)

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred.

  3. Measure Transformer Semantics for Bayesian Machine Learning

    Science.gov (United States)

    Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen

    The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

  4. Photometric Supernova Classification with Machine Learning

    Science.gov (United States)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  5. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    Science.gov (United States)

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  6. Strategies and Principles of Distributed Machine Learning on Big Data

    Directory of Open Access Journals (Sweden)

    Eric P. Xing

    2016-06-01

    Full Text Available The rise of big data has led to new demands for machine learning (ML systems to learn complex models, with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions thereupon. In order to run ML algorithms at such scales, on a distributed cluster with tens to thousands of machines, it is often the case that significant engineering efforts are required—and one might fairly ask whether such engineering truly falls within the domain of ML research. Taking the view that “big” ML systems can benefit greatly from ML-rooted statistical and algorithmic insights—and that ML researchers should therefore not shy away from such systems design—we discuss a series of principles and strategies distilled from our recent efforts on industrial-scale ML solutions. These principles and strategies span a continuum from application, to engineering, and to theoretical research and development of big ML systems and architectures, with the goal of understanding how to make them efficient, generally applicable, and supported with convergence and scaling guarantees. They concern four key questions that traditionally receive little attention in ML research: How can an ML program be distributed over a cluster? How can ML computation be bridged with inter-machine communication? How can such communication be performed? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typically seen in traditional computer programs, and by dissecting successful cases to reveal how we have harnessed these principles to design and develop both high-performance distributed ML software as well as general-purpose ML frameworks, we present opportunities for ML researchers and practitioners to further shape and enlarge the area

  7. A Deign Framework for Online Learning Environments.

    Science.gov (United States)

    Mishra, Sanjaya

    2002-01-01

    Discusses use of the Web for online instruction and presents a design framework for creating online learning environments. Highlights include approaches to instruction, including behaviorism, cognitivism, and constructivism; learning activities; content; learner support; and application of the framework for a graduate course at the Indira Gandhi…

  8. A Deign Framework for Online Learning Environments.

    Science.gov (United States)

    Mishra, Sanjaya

    2002-01-01

    Discusses use of the Web for online instruction and presents a design framework for creating online learning environments. Highlights include approaches to instruction, including behaviorism, cognitivism, and constructivism; learning activities; content; learner support; and application of the framework for a graduate course at the Indira Gandhi…

  9. Fundamentals of Machine Learning for Neural Machine Translation

    OpenAIRE

    Kelleher, John

    2016-01-01

    This paper presents a short introduction to neural networks and how they are used for machine translation and concludes with some discussion on the current research challenges being addressed by neural machine translation (NMT) research. The primary goal of this paper is to give a no-tears introduction to NMT to readers that do not have a computer science or mathematical background. The secondary goal is to provide the reader with a deep enough understanding of NMT that they can appreciate th...

  10. Distributed Extreme Learning Machine for Nonlinear Learning over Network

    Directory of Open Access Journals (Sweden)

    Songyan Huang

    2015-02-01

    Full Text Available Distributed data collection and analysis over a network are ubiquitous, especially over a wireless sensor network (WSN. To our knowledge, the data model used in most of the distributed algorithms is linear. However, in real applications, the linearity of systems is not always guaranteed. In nonlinear cases, the single hidden layer feedforward neural network (SLFN with radial basis function (RBF hidden neurons has the ability to approximate any continuous functions and, thus, may be used as the nonlinear learning system. However, confined by the communication cost, using the distributed version of the conventional algorithms to train the neural network directly is usually prohibited. Fortunately, based on the theorems provided in the extreme learning machine (ELM literature, we only need to compute the output weights of the SLFN. Computing the output weights itself is a linear learning problem, although the input-output mapping of the overall SLFN is still nonlinear. Using the distributed algorithmto cooperatively compute the output weights of the SLFN, we obtain a distributed extreme learning machine (dELM for nonlinear learning in this paper. This dELM is applied to the regression problem and classification problem to demonstrate its effectiveness and advantages.

  11. Combining Formal Logic and Machine Learning for Sentiment Analysis

    DEFF Research Database (Denmark)

    Petersen, Niklas Christoffer; Villadsen, Jørgen

    2014-01-01

    This paper presents a formal logical method for deep structural analysis of the syntactical properties of texts using machine learning techniques for efficient syntactical tagging. To evaluate the method it is used for entity level sentiment analysis as an alternative to pure machine learning...

  12. An active role for machine learning in drug development

    Science.gov (United States)

    Murphy, Robert F.

    2014-01-01

    Due to the complexity of biological systems, cutting-edge machine-learning methods will be critical for future drug development. In particular, machine-vision methods to extract detailed information from imaging assays and active-learning methods to guide experimentation will be required to overcome the dimensionality problem in drug development. PMID:21587249

  13. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  14. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    Science.gov (United States)

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  15. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  16. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  17. Newton Methods for Large Scale Problems in Machine Learning

    Science.gov (United States)

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  18. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    Science.gov (United States)

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  19. IEEE International Workshop on Machine Learning for Signal Processing: Preface

    DEFF Research Database (Denmark)

    Tao, Jianhua

    The 21st IEEE International Workshop on Machine Learning for Signal Processing will be held in Beijing, China, on September 18–21, 2011. The workshop series is the major annual technical event of the IEEE Signal Processing Society's Technical Committee on Machine Learning for Signal Processing...

  20. Proceedings of the IEEE Machine Learning for Signal Processing XVII

    DEFF Research Database (Denmark)

    The seventeenth of a series of workshops sponsored by the IEEE Signal Processing Society and organized by the Machine Learning for Signal Processing Technical Committee (MLSP-TC). The field of machine learning has matured considerably in both methodology and real-world application domains and has...

  1. IEEE International Workshop on Machine Learning for Signal Processing: Preface

    DEFF Research Database (Denmark)

    Tao, Jianhua

    The 21st IEEE International Workshop on Machine Learning for Signal Processing will be held in Beijing, China, on September 18–21, 2011. The workshop series is the major annual technical event of the IEEE Signal Processing Society's Technical Committee on Machine Learning for Signal Processing...

  2. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  3. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    Science.gov (United States)

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  4. A Machine-Learning-Driven Sky Model.

    Science.gov (United States)

    Satylmys, Pynar; Bashford-Rogers, Thomas; Chalmers, Alan; Debattista, Kurt

    2017-01-01

    Sky illumination is responsible for much of the lighting in a virtual environment. A machine-learning-based approach can compactly represent sky illumination from both existing analytic sky models and from captured environment maps. The proposed approach can approximate the captured lighting at a significantly reduced memory cost and enable smooth transitions of sky lighting to be created from a small set of environment maps captured at discrete times of day. The author's results demonstrate accuracy close to the ground truth for both analytical and capture-based methods. The approach has a low runtime overhead, so it can be used as a generic approach for both offline and real-time applications.

  5. Machine learning of Calabi-Yau volumes

    Science.gov (United States)

    Krefl, Daniel; Seong, Rak-Kyeong

    2017-09-01

    We employ machine learning techniques to investigate the volume minimum of Sasaki-Einstein base manifolds of noncompact toric Calabi-Yau three-folds. We find that the minimum volume can be approximated via a second-order multiple linear regression on standard topological quantities obtained from the corresponding toric diagram. The approximation improves further after invoking a convolutional neural network with the full toric diagram of the Calabi-Yau three-folds as the input. We are thereby able to circumvent any minimization procedure that was previously necessary and find an explicit mapping between the minimum volume and the topological quantities of the toric diagram. Under the AdS/CFT correspondence, the minimum volumes of Sasaki-Einstein manifolds correspond to central charges of a class of 4 d N =1 superconformal field theories. We therefore find empirical evidence for a function that gives values of central charges without the usual extremization procedure.

  6. Optimal sensor placement using machine learning

    CERN Document Server

    Semaan, Richard

    2016-01-01

    A new method for optimal sensor placement based on variable importance of machine learned models is proposed. With its simplicity, adaptivity, and low computational cost, the method offers many advantages over existing approaches. The new method is implemented on an airfoil equipped with a Coanda actuator. The analysis is based on flow field data obtained from 2D unsteady Reynolds averaged Navier-Stokes (URANS) simulations with different actuation conditions. The optimal sensor locations is compared against the current de-facto standard of maximum POD modal amplitude location, and against a brute force approach that scans all possible sensor combinations. The results show that both the flow conditions and the type of sensor have an effect on the optimal sensor placement, whereas the choice of the response function appears to have limited influence.

  7. Machine learning research 1989-90

    Science.gov (United States)

    Porter, Bruce W.; Souther, Arthur

    1990-01-01

    Multifunctional knowledge bases offer a significant advance in artificial intelligence because they can support numerous expert tasks within a domain. As a result they amortize the costs of building a knowledge base over multiple expert systems and they reduce the brittleness of each system. Due to the inevitable size and complexity of multifunctional knowledge bases, their construction and maintenance require knowledge engineering and acquisition tools that can automatically identify interactions between new and existing knowledge. Furthermore, their use requires software for accessing those portions of the knowledge base that coherently answer questions. Considerable progress was made in developing software for building and accessing multifunctional knowledge bases. A language was developed for representing knowledge, along with software tools for editing and displaying knowledge, a machine learning program for integrating new information into existing knowledge, and a question answering system for accessing the knowledge base.

  8. CD process control through machine learning

    Science.gov (United States)

    Utzny, Clemens

    2016-10-01

    For the specific requirements of the 14nm and 20nm site applications a new CD map approach was developed at the AMTC. This approach relies on a well established machine learning technique called recursive partitioning. Recursive partitioning is a powerful technique which creates a decision tree by successively testing whether the quantity of interest can be explained by one of the supplied covariates. The test performed is generally a statistical test with a pre-supplied significance level. Once the test indicates significant association between the variable of interest and a covariate a split performed at a threshold value which minimizes the variation within the newly attained groups. This partitioning is recurred until either no significant association can be detected or the resulting sub group size falls below a pre-supplied level.

  9. Lane Detection Based on Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Chao Fan

    2013-09-01

    Full Text Available In order to improve accuracy and robustness of the lane detection in complex conditions, such as the shadows and illumination changing, a novel detection algorithm was proposed based on machine learning. After pretreatment, a set of haar-like filters were used to calculate the eigenvalue in the gray image f(x,y and edge e(x,y. Then these features were trained by using improved boosting algorithm and the final class function g(x was obtained, which was used to judge whether the point x belonging to the lane or not. To avoid the over fitting in traditional boosting, Fisher discriminant analysis was used to initialize the weights of samples. After testing by many road in all conditions, it showed that this algorithm had good robustness and real-time to recognize the lane in all challenging conditions.

  10. Tracking medical genetic literature through machine learning.

    Science.gov (United States)

    Bornstein, Aaron T; McLoughlin, Matthew H; Aguilar, Jesus; Wong, Wendy S W; Solomon, Benjamin D

    2016-08-01

    There has been remarkable progress in identifying the causes of genetic conditions as well as understanding how changes in specific genes cause disease. Though difficult (and often superficial) to parse, an interesting tension involves emphasis on basic research aimed to dissect normal and abnormal biology versus more clearly clinical and therapeutic investigations. To examine one facet of this question and to better understand progress in Mendelian-related research, we developed an algorithm that classifies medical literature into three categories (Basic, Clinical, and Management) and conducted a retrospective analysis. We built a supervised machine learning classification model using the Azure Machine Learning (ML) Platform and analyzed the literature (1970-2014) from NCBI's Entrez Gene2Pubmed Database (http://www.ncbi.nlm.nih.gov/gene) using genes from the NHGRI's Clinical Genomics Database (http://research.nhgri.nih.gov/CGD/). We applied our model to 376,738 articles: 288,639 (76.6%) were classified as Basic, 54,178 (14.4%) as Clinical, and 24,569 (6.5%) as Management. The average classification accuracy was 92.2%. The rate of Clinical publication was significantly higher than Basic or Management. The rate of publication of article types differed significantly when divided into key eras: Human Genome Project (HGP) planning phase (1984-1990); HGP launch (1990) to publication (2001); following HGP completion to the "Next Generation" advent (2009); the era following 2009. In conclusion, in addition to the findings regarding the pace and focus of genetic progress, our algorithm produced a database that can be used in a variety of contexts including automating the identification of management-related literature.

  11. Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine

    Directory of Open Access Journals (Sweden)

    Ying Yin

    2016-05-01

    Full Text Available Multi-instance multi-label learning is a learning framework, where every object is represented by a bag of instances and associated with multiple labels simultaneously. The existing degeneration strategy-based methods often suffer from some common drawbacks: (1 the user-specific parameter for the number of clusters may incur the effective problem; (2 SVM may bring a high computational cost when utilized as the classifier builder. In this paper, we propose an algorithm, namely multi-instance multi-label (MIML-extreme learning machine (ELM, to address the problems. To our best knowledge, we are the first to utilize ELM in the MIML problem and to conduct the comparison of ELM and SVM on MIML. Extensive experiments have been conducted on real datasets and synthetic datasets. The results show that MIMLELM tends to achieve better generalization performance at a higher learning speed.

  12. Machine learning in radiation oncology theory and applications

    CERN Document Server

    El Naqa, Issam; Murphy, Martin J

    2015-01-01

    ​This book provides a complete overview of the role of machine learning in radiation oncology and medical physics, covering basic theory, methods, and a variety of applications in medical physics and radiotherapy. An introductory section explains machine learning, reviews supervised and unsupervised learning methods, discusses performance evaluation, and summarizes potential applications in radiation oncology. Detailed individual sections are then devoted to the use of machine learning in quality assurance; computer-aided detection, including treatment planning and contouring; image-guided rad

  13. Reinforcement and Systemic Machine Learning for Decision Making

    CERN Document Server

    Kulkarni, Parag

    2012-01-01

    Reinforcement and Systemic Machine Learning for Decision Making There are always difficulties in making machines that learn from experience. Complete information is not always available-or it becomes available in bits and pieces over a period of time. With respect to systemic learning, there is a need to understand the impact of decisions and actions on a system over that period of time. This book takes a holistic approach to addressing that need and presents a new paradigm-creating new learning applications and, ultimately, more intelligent machines. The first book of its kind in this new an

  14. Using Machine Learning to Search for MSSM Higgs Bosons

    CERN Document Server

    Diesing, Rebecca

    2016-01-01

    This paper examines the performance of machine learning in the identification of Minimally Su- persymmetric Standard Model (MSSM) Higgs Bosons, and compares this performance to that of traditional cut strategies. Two boosted decision tree algorithms were tested, scikit-learn and XGBoost. These tests indicated that machine learning can perform significantly better than traditional cuts. However, since machine learning in this form cannot be directly implemented in a real MSSM Higgs analysis, this performance information was instead used to better understand the relationships between training variables. Further studies might use this information to construct an improved cut strategy.

  15. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems

    Science.gov (United States)

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A.; Shavlik, Jude; Burnside, Elizabeth S.

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy. PMID:26306246

  16. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems.

    Science.gov (United States)

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A; Shavlik, Jude; Burnside, Elizabeth S

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy.

  17. MLitB: machine learning in the browser

    Directory of Open Access Journals (Sweden)

    Edward Meeds

    2015-07-01

    Full Text Available With few exceptions, the field of Machine Learning (ML research has largely ignored the browser as a computational engine. Beyond an educational resource for ML, the browser has vast potential to not only improve the state-of-the-art in ML research, but also, inexpensively and on a massive scale, to bring sophisticated ML learning and prediction to the public at large. This paper introduces MLitB, a prototype ML framework written entirely in Javascript, capable of performing large-scale distributed computing with heterogeneous classes of devices. The development of MLitB has been driven by several underlying objectives whose aim is to make ML learning and usage ubiquitous (by using ubiquitous compute devices, cheap and effortlessly distributed, and collaborative. This is achieved by allowing every internet capable device to run training algorithms and predictive models with no software installation and by saving models in universally readable formats. Our prototype library is capable of training deep neural networks with synchronized, distributed stochastic gradient descent. MLitB offers several important opportunities for novel ML research, including: development of distributed learning algorithms, advancement of web GPU algorithms, novel field and mobile applications, privacy preserving computing, and green grid-computing. MLitB is available as open source software.

  18. MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions

    DEFF Research Database (Denmark)

    Jimeno-Yepes, Antonio; Mork, James G.; Wilkowski, Bartlomiej

    2012-01-01

    Map and a k-NN approach called PubMed Related Citations (PRC). Our motivation is to improve the quality of MTI based on machine learning. Typical machine learning approaches fit this indexing task into text categorization. In this work, we have studied some Medical Subject Headings (MeSH) recommended by MTI...... and analyzed the issues when using standard machine learning algorithms. We show that in some cases machine learning can improve the annotations already recommended by MTI, that machine learning based on low variance methods achieves better performance and that each MeSH heading presents a different behavior...

  19. Harnessing Disordered-Ensemble Quantum Dynamics for Machine Learning

    Science.gov (United States)

    Fujii, Keisuke; Nakajima, Kohei

    2017-08-01

    The quantum computer has an amazing potential of fast information processing. However, the realization of a digital quantum computer is still a challenging problem requiring highly accurate controls and key application strategies. Here we propose a platform, quantum reservoir computing, to solve these issues successfully by exploiting the natural quantum dynamics of ensemble systems, which are ubiquitous in laboratories nowadays, for machine learning. This framework enables ensemble quantum systems to universally emulate nonlinear dynamical systems including classical chaos. A number of numerical experiments show that quantum systems consisting of 5-7 qubits possess computational capabilities comparable to conventional recurrent neural networks of 100-500 nodes. This discovery opens up a paradigm for information processing with artificial intelligence powered by quantum physics.

  20. Using Machine Learning for Discovery in Synoptic Survey Imaging

    CERN Document Server

    Brink, Henrik; Poznanski, Dovi; Bloom, Joshua S; Rice, John; Negahban, Sahand; Wainwright, Martin

    2012-01-01

    Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by bogus detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work we present a machine learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30,000 objects from the Palomar Transient Factory, we demonstrate a missed detection r...

  1. Analyzing Learning in Professional Learning Communities: A Conceptual Framework

    Science.gov (United States)

    Van Lare, Michelle D.; Brazer, S. David

    2013-01-01

    The purpose of this article is to build a conceptual framework that informs current understanding of how professional learning communities (PLCs) function in conjunction with organizational learning. The combination of sociocultural learning theories and organizational learning theories presents a more complete picture of PLC processes that has…

  2. Analyzing Learning in Professional Learning Communities: A Conceptual Framework

    Science.gov (United States)

    Van Lare, Michelle D.; Brazer, S. David

    2013-01-01

    The purpose of this article is to build a conceptual framework that informs current understanding of how professional learning communities (PLCs) function in conjunction with organizational learning. The combination of sociocultural learning theories and organizational learning theories presents a more complete picture of PLC processes that has…

  3. Machine Learning and the Traveling Repairman

    CERN Document Server

    Tulabandhula, Theja; Jaillet, Patrick

    2011-01-01

    The goal of the Machine Learning and Traveling Repairman Problem (ML&TRP) is to determine a route for a "repair crew," which repairs nodes on a graph. The repair crew aims to minimize the cost of failures at the nodes, but as in many real situations, the failure probabilities are not known and must be estimated. We introduce two formulations for the ML&TRP, where the first formulation is sequential: failure probabilities are estimated at each node, and then a weighted version of the traveling repairman problem is used to construct the route from the failure cost. We develop two models for the failure cost, based on whether repeat failures are considered, or only the first failure on a node. Our second formulation is a multi-objective learning problem for ranking on graphs. Here, we are estimating failure probabilities simultaneously with determining the graph traversal route; the choice of route influences the estimated failure probabilities. This is in accordance with a prior belief that probabilitie...

  4. Active Learning of Nondeterministic Finite State Machines

    Directory of Open Access Journals (Sweden)

    Warawoot Pacharoen

    2013-01-01

    Full Text Available We consider the problem of learning nondeterministic finite state machines (NFSMs from systems where their internal structures are implicit and nondeterministic. Recently, an algorithm for inferring observable NFSMs (ONFSMs, which are the potentially learnable subclass of NFSMs, has been proposed based on the hypothesis that the complete testing assumption is satisfied. According to this assumption, with an input sequence (query, the complete set of all possible output sequences is given by the so-called Teacher, so the number of times for asking the same query is not taken into account in the algorithm. In this paper, we propose LNM*, a refined ONFSM learning algorithm that considers the amount for repeating the same query as one parameter. Unlike the previous work, our approach does not require all possible output sequences in one answer. Instead, it tries to observe the possible output sequences by asking the same query many times to the Teacher. We have proved that LNM* can infer the corresponding ONFSMs of the unknown systems when the number of tries for the same query is adequate to guarantee the complete testing assumption. Moreover, the proof shows that our algorithm will eventually terminate no matter whether the assumption is fulfilled or not. We also present the theoretical time complexity analysis of LNM*. In addition, experimental results demonstrate the practical efficiency of our approach.

  5. Security framework for mobile learning environments

    OpenAIRE

    Shonola, Shaibu A.; Joy, Mike

    2014-01-01

    Mobile learning is becoming popular among educators as academic technologies advance. Mobile devices used in mobile learning can potentially become vulnerable if the security aspects are neglected, thereby putting personal information of users at risk. Therefore, for mobile learning applications to work effectively as valuable tools, the security aspects must be given adequate consideration. This paper proposes a security framework for mobile learning applications which is the bedrock for des...

  6. ScaleMT: a free/open-source framework for building scalable machine translation web services

    OpenAIRE

    Sánchez-Cartagena, Víctor M.; Pérez-Ortiz, Juan Antonio

    2009-01-01

    Machine translation web services usage is growing amazingly mainly because of the translation quality and reliability of the service provided by the Google Ajax Language API. To allow the open-source machine ranslation projects to compete with Google’s one and gain visibility on the internet, we have developed ScaleMT: a free/open-source framework that exposes existing machine translation engines as public web services. This framework is highly scalable as it can run coordinately on many serv...

  7. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography.

    Science.gov (United States)

    Narula, Sukrit; Shameer, Khader; Salem Omar, Alaa Mabrouk; Dudley, Joel T; Sengupta, Partho P

    2016-11-29

    Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p 13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience. Copyright © 2016 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  8. MLBCD: a machine learning tool for big clinical data.

    Science.gov (United States)

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  9. GEOLOGICAL MAPPING USING MACHINE LEARNING ALGORITHMS

    Directory of Open Access Journals (Sweden)

    A. S. Harvey

    2016-06-01

    Full Text Available Remotely sensed spectral imagery, geophysical (magnetic and gravity, and geodetic (elevation data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA, which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  10. Geological Mapping Using Machine Learning Algorithms

    Science.gov (United States)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  11. Collaborative Learning Framework in Business Management Systems

    Directory of Open Access Journals (Sweden)

    Vladimir GRIGORE

    2008-01-01

    Full Text Available This paper presents a solution based on collaboration with experts and practitioner from university and ERP companies involved in process learning by training and learning by working. The solution uses CPI test to establish proper team for framework modules: Real-Time Chat Room, Discussion Forum, E-mail Support and Learning through Training. We define novice, practitioner and expert competence level based on CORONET train methodology. ERP companies have own roles for mentoring services to knowledge workers and evaluate the performance of learning process with teachers’ cooperation in learning by teaching and learning by working module.

  12. Intelligent Machine Learning Approaches for Aerospace Applications

    Science.gov (United States)

    Sathyan, Anoop

    Machine Learning is a type of artificial intelligence that provides machines or networks the ability to learn from data without the need to explicitly program them. There are different kinds of machine learning techniques. This thesis discusses the applications of two of these approaches: Genetic Fuzzy Logic and Convolutional Neural Networks (CNN). Fuzzy Logic System (FLS) is a powerful tool that can be used for a wide variety of applications. FLS is a universal approximator that reduces the need for complex mathematics and replaces it with expert knowledge of the system to produce an input-output mapping using If-Then rules. The expert knowledge of a system can help in obtaining the parameters for small-scale FLSs, but for larger networks we will need to use sophisticated approaches that can automatically train the network to meet the design requirements. This is where Genetic Algorithms (GA) and EVE come into the picture. Both GA and EVE can tune the FLS parameters to minimize a cost function that is designed to meet the requirements of the specific problem. EVE is an artificial intelligence developed by Psibernetix that is trained to tune large scale FLSs. The parameters of an FLS can include the membership functions and rulebase of the inherent Fuzzy Inference Systems (FISs). The main issue with using the GFS is that the number of parameters in a FIS increase exponentially with the number of inputs thus making it increasingly harder to tune them. To reduce this issue, the FLSs discussed in this thesis consist of 2-input-1-output FISs in cascade (Chapter 4) or as a layer of parallel FISs (Chapter 7). We have obtained extremely good results using GFS for different applications at a reduced computational cost compared to other algorithms that are commonly used to solve the corresponding problems. In this thesis, GFSs have been designed for controlling an inverted double pendulum, a task allocation problem of clustering targets amongst a set of UAVs, a fire

  13. Modelling tick abundance using machine learning techniques and satellite imagery

    DEFF Research Database (Denmark)

    Kjær, Lene Jung; Korslund, L.; Kjelland, V.

    satellite images to run Boosted Regression Tree machine learning algorithms to predict overall distribution (presence/absence of ticks) and relative tick abundance of nymphs and larvae in southern Scandinavia. For nymphs, the predicted abundance had a positive correlation with observed abundance...... the predicted distribution of larvae was mostly even throughout Denmark, it was primarily around the coastlines in Norway and Sweden. Abundance was fairly low overall except in some fragmented patches corresponding to forested habitats in the region. Machine learning techniques allow us to predict for larger...... the collected ticks for pathogens and using the same machine learning techniques to develop prevalence maps of the ScandTick region....

  14. Implementing Machine Learning in Radiology Practice and Research.

    Science.gov (United States)

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  15. Studying depression using imaging and machine learning methods.

    Science.gov (United States)

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  16. Intercultural Historical Learning: A Conceptual Framework

    Science.gov (United States)

    Nordgren, Kenneth; Johansson, Maria

    2015-01-01

    This paper outlines a conceptual framework in order to systematically discuss the meaning of intercultural learning in history education and how it could be advanced. We do so by bringing together theories of historical consciousness, intercultural competence and postcolonial thinking. By combining these theories into one framework, we identify…

  17. Intercultural Historical Learning: A Conceptual Framework

    Science.gov (United States)

    Nordgren, Kenneth; Johansson, Maria

    2015-01-01

    This paper outlines a conceptual framework in order to systematically discuss the meaning of intercultural learning in history education and how it could be advanced. We do so by bringing together theories of historical consciousness, intercultural competence and postcolonial thinking. By combining these theories into one framework, we identify…

  18. Visual tracking based on extreme learning machine and sparse representation.

    Science.gov (United States)

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-10-22

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker.

  19. Visual Tracking Based on Extreme Learning Machine and Sparse Representation

    Directory of Open Access Journals (Sweden)

    Baoxian Wang

    2015-10-01

    Full Text Available The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM. Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker.

  20. Metabolite identification and molecular fingerprint prediction through machine learning.

    Science.gov (United States)

    Heinonen, Markus; Shen, Huibin; Zamboni, Nicola; Rousu, Juho

    2012-09-15

    Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. markus.heinonen@cs.helsinki.fi.

  1. Predicting Increased Blood Pressure Using Machine Learning

    Science.gov (United States)

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Soares, Telma de Jesus; dos Reis, Luciana Araujo

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R2 (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R2 (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313

  2. Optimal interference code based on machine learning

    Science.gov (United States)

    Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

    2016-10-01

    In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.

  3. Perspective: Machine learning potentials for atomistic simulations

    Science.gov (United States)

    Behler, Jörg

    2016-11-01

    Nowadays, computer simulations have become a standard tool in essentially all fields of chemistry, condensed matter physics, and materials science. In order to keep up with state-of-the-art experiments and the ever growing complexity of the investigated problems, there is a constantly increasing need for simulations of more realistic, i.e., larger, model systems with improved accuracy. In many cases, the availability of sufficiently efficient interatomic potentials providing reliable energies and forces has become a serious bottleneck for performing these simulations. To address this problem, currently a paradigm change is taking place in the development of interatomic potentials. Since the early days of computer simulations simplified potentials have been derived using physical approximations whenever the direct application of electronic structure methods has been too demanding. Recent advances in machine learning (ML) now offer an alternative approach for the representation of potential-energy surfaces by fitting large data sets from electronic structure calculations. In this perspective, the central ideas underlying these ML potentials, solved problems and remaining challenges are reviewed along with a discussion of their current applicability and limitations.

  4. Image Segmentation for Connectomics Using Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Tasdizen, Tolga; Seyedhosseini, Mojtaba; Liu, TIng; Jones, Cory; Jurrus, Elizabeth R.

    2014-12-01

    Reconstruction of neural circuits at the microscopic scale of individual neurons and synapses, also known as connectomics, is an important challenge for neuroscience. While an important motivation of connectomics is providing anatomical ground truth for neural circuit models, the ability to decipher neural wiring maps at the individual cell level is also important in studies of many neurodegenerative diseases. Reconstruction of a neural circuit at the individual neuron level requires the use of electron microscopy images due to their extremely high resolution. Computational challenges include pixel-by-pixel annotation of these images into classes such as cell membrane, mitochondria and synaptic vesicles and the segmentation of individual neurons. State-of-the-art image analysis solutions are still far from the accuracy and robustness of human vision and biologists are still limited to studying small neural circuits using mostly manual analysis. In this chapter, we describe our image analysis pipeline that makes use of novel supervised machine learning techniques to tackle this problem.

  5. Predicting Increased Blood Pressure Using Machine Learning

    Directory of Open Access Journals (Sweden)

    Hudson Fernandes Golino

    2014-01-01

    Full Text Available The present study investigates the prediction of increased blood pressure by body mass index (BMI, waist (WC and hip circumference (HC, and waist hip ratio (WHR using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42, misclassification (.19, and the higher pseudo R2 (.43. This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25, misclassification (.16, and the higher pseudo R2 (.46. This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power.

  6. Machine learning techniques for gait biometric recognition using the ground reaction force

    CERN Document Server

    Mason, James Eric; Woungang, Isaac

    2016-01-01

    This book focuses on how machine learning techniques can be used to analyze and make use of one particular category of behavioral biometrics known as the gait biometric. A comprehensive Ground Reaction Force (GRF)-based Gait Biometrics Recognition framework is proposed and validated by experiments. In addition, an in-depth analysis of existing recognition techniques that are best suited for performing footstep GRF-based person recognition is also proposed, as well as a comparison of feature extractors, normalizers, and classifiers configurations that were never directly compared with one another in any previous GRF recognition research. Finally, a detailed theoretical overview of many existing machine learning techniques is presented, leading to a proposal of two novel data processing techniques developed specifically for the purpose of gait biometric recognition using GRF. This book · introduces novel machine-learning-based temporal normalization techniques · bridges research gaps concerning the effect of ...

  7. Detecting intermediate mass black holes in globular clusters with machine learning

    CERN Document Server

    Pasquato, Mario

    2016-01-01

    Mergers of stellar-mass black holes were recently observed in the gravitational wave window opened by LIGO. This puts the spotlight on dense stellar systems and their ability to create intermediate-mass black holes (IMBHs) through repeated merging. Unfortunately, attempts at direct and indirect IMBH detection in star clusters in the nearby universe have proven inconclusive as of now. Indirect detection methods attempt to constrain IMBHs through their effect on star cluster photometric and kinematic observables. They are usually based on looking for a specific, physically motivated signature. While this approach is justified, it may be suboptimal in its usage of the available data. Here I present a new indirect detection method, based on machine learning, that is unaffected by these restrictions. I reduce the scientific question whether a star cluster hosts an IMBH to a classification problem in the machine learning framework. I present preliminary results to illustrate how machine learning models are trained ...

  8. Towards a Standard-based Domain-specific Platform to Solve Machine Learning-based Problems

    Directory of Open Access Journals (Sweden)

    Vicente García-Díaz

    2015-12-01

    Full Text Available Machine learning is one of the most important subfields of computer science and can be used to solve a variety of interesting artificial intelligence problems. There are different languages, framework and tools to define the data needed to solve machine learning-based problems. However, there is a great number of very diverse alternatives which makes it difficult the intercommunication, portability and re-usability of the definitions, designs or algorithms that any developer may create. In this paper, we take the first step towards a language and a development environment independent of the underlying technologies, allowing developers to design solutions to solve machine learning-based problems in a simple and fast way, automatically generating code for other technologies. That can be considered a transparent bridge among current technologies. We rely on Model-Driven Engineering approach, focusing on the creation of models to abstract the definition of artifacts from the underlying technologies.

  9. Framework of Digital Machining Process Planning Platform for Cylinder Body Part

    Institute of Scientific and Technical Information of China (English)

    MA Yu-min; FAN Liu-qun; ZHU Zhi-hao; ZHANG Hao

    2007-01-01

    Digital factory technology is an advanced manufacturing technology served as to establish a bridge between the process of product development and manufacturing. In terms of application for digital factory technology in machining, especially in machining of a complicated part such as a cylinder body part, a concept of digital process planning and its framework are proposed. Its components including machining domain knowledge model, machining knowledge base, machining resource base and process planning system are studied. A machining knowledge model in tree form and an object-driven knowledge reasoning mechanism are used for machining knowledge base. The process planning system is a user interface that leads a planner to finish the planning process. A case about a cylinder head part is given to demonstrate how the platform works. The framework of digital process planning is the foundation of some intelligent CAPP systems and helps to production line planning.

  10. The Evaluation Framework for Learning Analytics

    NARCIS (Netherlands)

    Scheffel, Maren

    2017-01-01

    The thesis is structured into three parts that describe the iterative process of creating, applying, evaluating and improving the different versions of the evaluation framework for learning analytics (EFLA). The first part describes the identification of quality indicators for learning analytics as

  11. A Design Framework for Personal Learning Environments

    NARCIS (Netherlands)

    Rahimi, E.

    2015-01-01

    The purpose of our research was to develop a PLE (personal learning environment) design framework for workplace settings. By doing such, the research has answered this research question, how should a technology-based personal learning environment be designed, aiming at supporting learners to gain

  12. A Design Framework for Personal Learning Environments

    NARCIS (Netherlands)

    Rahimi, E.

    2015-01-01

    The purpose of our research was to develop a PLE (personal learning environment) design framework for workplace settings. By doing such, the research has answered this research question, how should a technology-based personal learning environment be designed, aiming at supporting learners to gain co

  13. A Design Framework for Personal Learning Environments

    NARCIS (Netherlands)

    Rahimi, E.

    2015-01-01

    The purpose of our research was to develop a PLE (personal learning environment) design framework for workplace settings. By doing such, the research has answered this research question, how should a technology-based personal learning environment be designed, aiming at supporting learners to gain co

  14. Acceleration of saddle-point searches with machine learning.

    Science.gov (United States)

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  15. Acceleration of saddle-point searches with machine learning

    Science.gov (United States)

    Peterson, Andrew A.

    2016-08-01

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  16. A Protein Classification Benchmark collection for machine learning

    NARCIS (Netherlands)

    Sonego, P.; Pacurar, M.; Dhir, S.; Kertész-Farkas, A.; Kocsor, A.; Gáspári, Z.; Leunissen, J.A.M.; Pongor, S.

    2007-01-01

    Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection (http://hydra.icgeb.trieste.it/benchmark) was created in order to provide standard datasets on which the performance of machin

  17. Probabilistic models and machine learning in structural bioinformatics

    DEFF Research Database (Denmark)

    Hamelryck, Thomas

    2009-01-01

    . Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis...

  18. Sparse Machine Learning Methods for Understanding Large Text Corpora

    Data.gov (United States)

    National Aeronautics and Space Administration — Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational...

  19. Parameter Identifiability in Statistical Machine Learning: A Review.

    Science.gov (United States)

    Ran, Zhi-Yong; Hu, Bao-Gang

    2017-05-01

    This review examines the relevance of parameter identifiability for statistical models used in machine learning. In addition to defining main concepts, we address several issues of identifiability closely related to machine learning, showing the advantages and disadvantages of state-of-the-art research and demonstrating recent progress. First, we review criteria for determining the parameter structure of models from the literature. This has three related issues: parameter identifiability, parameter redundancy, and reparameterization. Second, we review the deep influence of identifiability on various aspects of machine learning from theoretical and application viewpoints. In addition to illustrating the utility and influence of identifiability, we emphasize the interplay among identifiability theory, machine learning, mathematical statistics, information theory, optimization theory, information geometry, Riemann geometry, symbolic computation, Bayesian inference, algebraic geometry, and others. Finally, we present a new perspective together with the associated challenges.

  20. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    National Research Council Canada - National Science Library

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-01-01

    .... The polymers are 'fingerprinted' as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand...

  1. Scaling Datalog for Machine Learning on Big Data

    CERN Document Server

    Bu, Yingyi; Carey, Michael J; Rosen, Joshua; Polyzotis, Neoklis; Condie, Tyson; Weimer, Markus; Ramakrishnan, Raghu

    2012-01-01

    In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine. As a proof of concept, we consider two programming models--Pregel and Iterative Map-Reduce-Update---from the machine learning domain, and show how they can be captured in Datalog, tuned for a specific task, and then compiled into an optimized physical plan. Experiments performed on a large computing cluster with real data demonstrate that this declarative approach can provide very good performance while offering both increased generality and programming ease.

  2. A Machine Learning System for Recognizing Subclasses (Demo)

    Energy Technology Data Exchange (ETDEWEB)

    Vatsavai, Raju [ORNL

    2012-01-01

    Thematic information extraction from remote sensing images is a complex task. In this demonstration, we present *Miner machine learning system. In particular, we demonstrate an advanced subclass recognition algorithm that is specifically designed to extract finer classes from aggregate classes.

  3. Performance of machine learning methods for classification tasks

    Directory of Open Access Journals (Sweden)

    B. Krithika

    2013-06-01

    Full Text Available In this paper, the performance of various machine learning methods on pattern classification and recognition tasks are proposed. The proposed method for evaluating performance will be based on the feature representation, feature selection and setting model parameters. The nature of the data, the methods of feature extraction and feature representation are discussed. The results of the Machine Learning algorithms on the classification task are analysed. The performance of Machine Learning methods on classifying Tamil word patterns, i.e., classification of noun and verbs are analysed.The software WEKA (data mining tool is used for evaluating the performance. WEKA has several machine learning algorithms like Bayes, Trees, Lazy, Rule based classifiers.

  4. Reduced multiple empirical kernel learning machine.

    Science.gov (United States)

    Wang, Zhe; Lu, MingZhe; Gao, Daqi

    2015-02-01

    Multiple kernel learning (MKL) is demonstrated to be flexible and effective in depicting heterogeneous data sources since MKL can introduce multiple kernels rather than a single fixed kernel into applications. However, MKL would get a high time and space complexity in contrast to single kernel learning, which is not expected in real-world applications. Meanwhile, it is known that the kernel mapping ways of MKL generally have two forms including implicit kernel mapping and empirical kernel mapping (EKM), where the latter is less attracted. In this paper, we focus on the MKL with the EKM, and propose a reduced multiple empirical kernel learning machine named RMEKLM for short. To the best of our knowledge, it is the first to reduce both time and space complexity of the MKL with EKM. Different from the existing MKL, the proposed RMEKLM adopts the Gauss Elimination technique to extract a set of feature vectors, which is validated that doing so does not lose much information of the original feature space. Then RMEKLM adopts the extracted feature vectors to span a reduced orthonormal subspace of the feature space, which is visualized in terms of the geometry structure. It can be demonstrated that the spanned subspace is isomorphic to the original feature space, which means that the dot product of two vectors in the original feature space is equal to that of the two corresponding vectors in the generated orthonormal subspace. More importantly, the proposed RMEKLM brings a simpler computation and meanwhile needs a less storage space, especially in the processing of testing. Finally, the experimental results show that RMEKLM owns a much efficient and effective performance in terms of both complexity and classification. The contributions of this paper can be given as follows: (1) by mapping the input space into an orthonormal subspace, the geometry of the generated subspace is visualized; (2) this paper first reduces both the time and space complexity of the EKM-based MKL; (3

  5. Machine learning techniques for energy optimization in mobile embedded systems

    Science.gov (United States)

    Donohoo, Brad Kyoshi

    Mobile smartphones and other portable battery operated embedded systems (PDAs, tablets) are pervasive computing devices that have emerged in recent years as essential instruments for communication, business, and social interactions. While performance, capabilities, and design are all important considerations when purchasing a mobile device, a long battery lifetime is one of the most desirable attributes. Battery technology and capacity has improved over the years, but it still cannot keep pace with the power consumption demands of today's mobile devices. This key limiter has led to a strong research emphasis on extending battery lifetime by minimizing energy consumption, primarily using software optimizations. This thesis presents two strategies that attempt to optimize mobile device energy consumption with negligible impact on user perception and quality of service (QoS). The first strategy proposes an application and user interaction aware middleware framework that takes advantage of user idle time between interaction events of the foreground application to optimize CPU and screen backlight energy consumption. The framework dynamically classifies mobile device applications based on their received interaction patterns, then invokes a number of different power management algorithms to adjust processor frequency and screen backlight levels accordingly. The second strategy proposes the usage of machine learning techniques to learn a user's mobile device usage pattern pertaining to spatiotemporal and device contexts, and then predict energy-optimal data and location interface configurations. By learning where and when a mobile device user uses certain power-hungry interfaces (3G, WiFi, and GPS), the techniques, which include variants of linear discriminant analysis, linear logistic regression, non-linear logistic regression, and k-nearest neighbor, are able to dynamically turn off unnecessary interfaces at runtime in order to save energy.

  6. Machine learning techniques applied to system characterization and equalization

    DEFF Research Database (Denmark)

    Zibar, Darko; Thrane, Jakob; Wass, Jesper

    2016-01-01

    Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals.......Linear signal processing algorithms are effective in combating linear fibre channel impairments. We demonstrate the ability of machine learning algorithms to combat nonlinear fibre channel impairments and perform parameter extraction from directly detected signals....

  7. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  8. Double/Debiased/Neyman Machine Learning of Treatment Effects

    OpenAIRE

    Chernozhukov, Victor; Chetverikov, Denis; Demirer, Mert; Duflo, Esther; Hansen, Christian; Newey, Whitney

    2017-01-01

    Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machine learning methods. In this note, we illustrate the application of this method in the context of ...

  9. Designing Contestability: Interaction Design, Machine Learning, and Mental Health.

    Science.gov (United States)

    Hirsch, Tad; Merced, Kritzia; Narayanan, Shrikanth; Imel, Zac E; Atkins, David C

    2017-06-01

    We describe the design of an automated assessment and training tool for psychotherapists to illustrate challenges with creating interactive machine learning (ML) systems, particularly in contexts where human life, livelihood, and wellbeing are at stake. We explore how existing theories of interaction design and machine learning apply to the psychotherapy context, and identify "contestability" as a new principle for designing systems that evaluate human behavior. Finally, we offer several strategies for making ML systems more accountable to human actors.

  10. Machine learning concepts in coherent optical communication systems

    DEFF Research Database (Denmark)

    Zibar, Darko; Schäffer, Christian G.

    2014-01-01

    Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA.......Powerful statistical signal processing methods, used by the machine learning community, are addressed and linked to current problems in coherent optical communication. Bayesian filtering methods are presented and applied for nonlinear dynamic state tracking. © 2014 OSA....

  11. A machine learning-based automatic currency trading system

    OpenAIRE

    Brvar, Anže

    2012-01-01

    The main goal of this thesis was to develop an automated trading system for Forex trading, which would use machine learning methods and their prediction models for deciding about trading actions. A training data set was obtained from exchange rates and values of technical indicators, which describe conditions on currency market. We estimated selected machine learning algorithms and their parameters with validation with sampling. We have prepared a set of automated trading systems with various...

  12. PCP-ML: protein characterization package for machine learning.

    Science.gov (United States)

    Eickholt, Jesse; Wang, Zheng

    2014-11-18

    Machine Learning (ML) has a number of demonstrated applications in protein prediction tasks such as protein structure prediction. To speed further development of machine learning based tools and their release to the community, we have developed a package which characterizes several aspects of a protein commonly used for protein prediction tasks with machine learning. A number of software libraries and modules exist for handling protein related data. The package we present in this work, PCP-ML, is unique in its small footprint and emphasis on machine learning. Its primary focus is on characterizing various aspects of a protein through sets of numerical data. The generated data can then be used with machine learning tools and/or techniques. PCP-ML is very flexible in how the generated data is formatted and as a result is compatible with a variety of existing machine learning packages. Given its small size, it can be directly packaged and distributed with community developed tools for protein prediction tasks. Source code and example programs are available under a BSD license at http://mlid.cps.cmich.edu/eickh1jl/tools/PCPML/. The package is implemented in C++ and accessible as a Python module.

  13. Machine learning and cosmological simulations - II. Hydrodynamical simulations

    Science.gov (United States)

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2016-04-01

    We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris simulation to train and test various sophisticated ML algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, g - r colour, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically robust and physically consistent with Illustris galaxies and follows the same fundamental observational constraints. ML offers an intriguing and promising technique to create quick mock galaxy catalogues in the future.

  14. Machine learning and cosmological simulations - I. Semi-analytical models

    Science.gov (United States)

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2016-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical Universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analysing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated ML algorithms (k-Nearest Neighbors, decision trees, random forests, and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of M > 1012 M⊙ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon SAMs and demonstrably place ML as a promising and a computationally efficient tool to study small-scale structure formation.

  15. Hydrological data assimilation using Extreme Learning Machines

    Science.gov (United States)

    Boucher, Marie-Amélie; Quilty, John; Adamowski, Jan

    2017-04-01

    Data assimilation refers to any process that allows for updating state variables in a model to represent reality more accurately than the initial (open loop) simulation. In hydrology, data assimilation is often a pre-requisite for forecasting. In practice, many operational agencies rely on "manual" data assimilation: perturbations are added manually to meteorological inputs or directly to state variables based on "expert knowledge" until the simulated streamflow matches the observed streamflow closely. The corrected state variables are then considered as representative of the "true", unknown, state of the watershed just before the forecasting period. However, manual data assimilation raises concerns, mainly regarding reproducibility and high reliance on "expert knowledge". For those reasons, automatic data assimilation methods have been proposed in the literature. Automatic data assimilation also allows for the assessment and reduction of state variable uncertainty, which is predominant for short-term streamflow forecasts (e.g. Thiboult et al. 2016). The goal of this project is to explore the potential of Extreme Learning Machines (ELM, Zang and Liu 2015) for data assimilation. ELMs are an emerging type of neural network that does not require iterative optimisation of their weights and biases and therefore are much faster to calibrate than typical feed-forward backpropagation neural networks. We explore ELM for updating state variables of the lumped conceptual hydrological model GR4J. The GR4J model has two state variables: the level of water in the production and routing reservoirs. Although these two variables are sufficient to describe the state of a snow-free watershed, they are modelling artifices that are not measurable. Consequently, their "true" values can only be verified indirectly through a comparison of simulated and observed streamflow and their values are highly uncertain. GR4J can also be coupled with the snow model CemaNeige, which adds two other

  16. Generative Modeling for Machine Learning on the D-Wave

    Energy Technology Data Exchange (ETDEWEB)

    Thulasidasan, Sunil [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Information Sciences Group

    2016-11-15

    These are slides on Generative Modeling for Machine Learning on the D-Wave. The following topics are detailed: generative models; Boltzmann machines: a generative model; restricted Boltzmann machines; learning parameters: RBM training; practical ways to train RBM; D-Wave as a Boltzmann sampler; mapping RBM onto the D-Wave; Chimera restricted RBM; mapping binary RBM to Ising model; experiments; data; D-Wave effective temperature, parameters noise, etc.; experiments: contrastive divergence (CD) 1 step; after 50 steps of CD; after 100 steps of CD; D-Wave (experiments 1, 2, 3); D-Wave observations.

  17. Estimating extinction using unsupervised machine learning

    Science.gov (United States)

    Meingast, Stefan; Lombardi, Marco; Alves, João

    2017-05-01

    Dust extinction is the most robust tracer of the gas distribution in the interstellar medium, but measuring extinction is limited by the systematic uncertainties involved in estimating the intrinsic colors to background stars. In this paper we present a new technique, Pnicer, that estimates intrinsic colors and extinction for individual stars using unsupervised machine learning algorithms. This new method aims to be free from any priors with respect to the column density and intrinsic color distribution. It is applicable to any combination of parameters and works in arbitrary numbers of dimensions. Furthermore, it is not restricted to color space. Extinction toward single sources is determined by fitting Gaussian mixture models along the extinction vector to (extinction-free) control field observations. In this way it becomes possible to describe the extinction for observed sources with probability densities, rather than a single value. Pnicer effectively eliminates known biases found in similar methods and outperforms them in cases of deep observational data where the number of background galaxies is significant, or when a large number of parameters is used to break degeneracies in the intrinsic color distributions. This new method remains computationally competitive, making it possible to correctly de-redden millions of sources within a matter of seconds. With the ever-increasing number of large-scale high-sensitivity imaging surveys, Pnicer offers a fast and reliable way to efficiently calculate extinction for arbitrary parameter combinations without prior information on source characteristics. The Pnicer software package also offers access to the well-established Nicer technique in a simple unified interface and is capable of building extinction maps including the Nicest correction for cloud substructure. Pnicer is offered to the community as an open-source software solution and is entirely written in Python.

  18. Machine learning methods for metabolic pathway prediction

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2010-01-01

    Full Text Available Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

  19. Pulsar Search Using Supervised Machine Learning

    Science.gov (United States)

    Ford, John M.

    2017-05-01

    Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require a large ensemble of pulsars to provide enough statistical information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very labor-intensive process, currently requiring skilled people to examine and interpret plots of data output by analysis programs. An automated system for screening the plots will speed up the search for pulsars by a very large factor. Research to date on using machine learning and pattern recognition has not yielded a completely satisfactory system, as systems with the desired near 100% recall have false positive rates that are higher than desired, causing more manual labor in the classification of pulsars. This work proposed to research, identify, propose and develop methods to overcome the barriers to building an improved classification system with a false positive rate of less than 1% and a recall of near 100% that will be useful for the current and next generation of large pulsar surveys. The results show that it is possible to generate classifiers that perform as needed from the available training data. While a false positive rate of 1% was not reached, recall of over 99% was achieved with a false positive rate of less than 2%. Methods of mitigating the imbalanced training and test data were explored and found to be highly effective in enhancing classification accuracy.

  20. Machine Learning Predictions of Flash Floods

    Science.gov (United States)

    Clark, R. A., III; Flamig, Z.; Gourley, J. J.; Hong, Y.

    2016-12-01

    This study concerns the development, assessment, and use of machine learning (ML) algorithms to automatically generate predictions of flash floods around the world from numerical weather prediction (NWP) output. Using an archive of NWP outputs from the Global Forecast System (GFS) model and a historical archive of reports of flash floods across the U.S. and Europe, we developed a set of ML models that output forecasts of the probability of a flash flood given a certain set of atmospheric conditions. Using these ML models, real-time global flash flood predictions from NWP data have been generated in research mode since February 2016. These ML models provide information about which atmospheric variables are most important in the flash flood prediction process. The raw ML predictions can be calibrated against historical events to generate reliable flash flood probabilities. The automatic system was tested in a research-to-operations testbed enviroment with National Weather Service forecasters. The ML models are quite successful at incorporating large amounts of information in a computationally-efficient manner and and result in reasonably skillful predictions. The system is largely successful at identifying flash floods resulting from synoptically-forced events, but struggles with isolated flash floods that arise as a result of weather systems largely unresolvable by the coarse resolution of a global NWP system. The results from this collection of studies suggest that automatic probabilistic predictions of flash floods are a plausible way forward in operational forecasting, but that future research could focus upon applying these methods to finer-scale NWP guidance, to NWP ensembles, and to forecast lead times beyond 24 hours.

  1. Framework of Strategic Learning: The PDCA Cycle

    Directory of Open Access Journals (Sweden)

    Michał Pietrzak

    2015-06-01

    Full Text Available Nowadays, strategic planning has to be permanent process and organizational learning should support it. Researchers in theories of organizational learning attempt to understand processes, which lead to changes in organizational knowledge, as well as the effects of learning on organizational performance. In traditional approach, the strategy is viewed as one shot event. However, in contemporary turbulent environment this could not be still valid. There is a need of elastic strategic management, which employs organizational learning process. The crucial element of such process is information acquisition, which allows refining the initial version of strategic plan. In this article authors discuss the PDCA cycle as a framework of strategic learning process, including both single-loop and double loop learning. Authors proposed the ideas for further research in area of organizational learning and strategic management.

  2. A system framework of inter-enterprise machining quality control based on fractal theory

    Science.gov (United States)

    Zhao, Liping; Qin, Yongtao; Yao, Yiyong; Yan, Peng

    2014-03-01

    In order to meet the quality control requirement of dynamic and complicated product machining processes among enterprises, a system framework of inter-enterprise machining quality control based on fractal was proposed. In this system framework, the fractal-specific characteristic of inter-enterprise machining quality control function was analysed, and the model of inter-enterprise machining quality control was constructed by the nature of fractal structures. Furthermore, the goal-driven strategy of inter-enterprise quality control and the dynamic organisation strategy of inter-enterprise quality improvement were constructed by the characteristic analysis on this model. In addition, the architecture of inter-enterprise machining quality control based on fractal was established by means of Web service. Finally, a case study for application was presented. The result showed that the proposed method was available, and could provide guidance for quality control and support for product reliability in inter-enterprise machining processes.

  3. A review of supervised machine learning applied to ageing research.

    Science.gov (United States)

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  4. Kernel Methods for Machine Learning with Life Science Applications

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie

    Kernel methods refer to a family of widely used nonlinear algorithms for machine learning tasks like classification, regression, and feature extraction. By exploiting the so-called kernel trick straightforward extensions of classical linear algorithms are enabled as long as the data only appear...... models to kernel learning, and means for restoring the generalizability in both kernel Principal Component Analysis and the Support Vector Machine are proposed. Viability is proved on a wide range of benchmark machine learning data sets....... as innerproducts in the model formulation. This dissertation presents research on improving the performance of standard kernel methods like kernel Principal Component Analysis and the Support Vector Machine. Moreover, the goal of the thesis has been two-fold. The first part focuses on the use of kernel Principal...

  5. 2015 International Conference on Machine Learning and Signal Processing

    CERN Document Server

    Woo, Wai; Sulaiman, Hamzah; Othman, Mohd; Saat, Mohd

    2016-01-01

    This book presents important research findings and recent innovations in the field of machine learning and signal processing. A wide range of topics relating to machine learning and signal processing techniques and their applications are addressed in order to provide both researchers and practitioners with a valuable resource documenting the latest advances and trends. The book comprises a careful selection of the papers submitted to the 2015 International Conference on Machine Learning and Signal Processing (MALSIP 2015), which was held on 15–17 December 2015 in Ho Chi Minh City, Vietnam with the aim of offering researchers, academicians, and practitioners an ideal opportunity to disseminate their findings and achievements. All of the included contributions were chosen by expert peer reviewers from across the world on the basis of their interest to the community. In addition to presenting the latest in design, development, and research, the book provides access to numerous new algorithms for machine learni...

  6. Prediction of antiepileptic drug treatment outcomes using machine learning

    Science.gov (United States)

    Colic, Sinisa; Wither, Robert G.; Lang, Min; Zhang, Liang; Eubanks, James H.; Bardakjian, Berj L.

    2017-02-01

    Objective. Antiepileptic drug (AED) treatments produce inconsistent outcomes, often necessitating patients to go through several drug trials until a successful treatment can be found. This study proposes the use of machine learning techniques to predict epilepsy treatment outcomes of commonly used AEDs. Approach. Machine learning algorithms were trained and evaluated using features obtained from intracranial electroencephalogram (iEEG) recordings of the epileptiform discharges observed in Mecp2-deficient mouse model of the Rett Syndrome. Previous work have linked the presence of cross-frequency coupling (I CFC) of the delta (2-5 Hz) rhythm with the fast ripple (400-600 Hz) rhythm in epileptiform discharges. Using the I CFC to label post-treatment outcomes we compared support vector machines (SVMs) and random forest (RF) machine learning classifiers for providing likelihood scores of successful treatment outcomes. Main results. (a) There was heterogeneity in AED treatment outcomes, (b) machine learning techniques could be used to rank the efficacy of AEDs by estimating likelihood scores for successful treatment outcome, (c) I CFC features yielded the most effective a priori identification of appropriate AED treatment, and (d) both classifiers performed comparably. Significance. Machine learning approaches yielded predictions of successful drug treatment outcomes which in turn could reduce the burdens of drug trials and lead to substantial improvements in patient quality of life.

  7. OxLM: A Neural Language Modelling Framework for Machine Translation

    Directory of Open Access Journals (Sweden)

    Paul Baltescu

    2014-09-01

    Full Text Available This paper presents an open source implementation1 of a neural language model for machine translation. Neural language models deal with the problem of data sparsity by learning distributed representations for words in a continuous vector space. The language modelling probabilities are estimated by projecting a word's context in the same space as the word representations and by assigning probabilities proportional to the distance between the words and the context's projection. Neural language models are notoriously slow to train and test. Our framework is designed with scalability in mind and provides two optional techniques for reducing the computational cost: the so-called class decomposition trick and a training algorithm based on noise contrastive estimation. Our models may be extended to incorporate direct n-gram features to learn weights for every n-gram in the training data. Our framework comes with wrappers for the cdec and Moses translation toolkits, allowing our language models to be incorporated as normalized features in their decoders (inside the beam search.

  8. Performance Evaluation of Machine Learning Algorithms for Urban Pattern Recognition from Multi-spectral Satellite Images

    Directory of Open Access Journals (Sweden)

    Marc Wieland

    2014-03-01

    Full Text Available In this study, a classification and performance evaluation framework for the recognition of urban patterns in medium (Landsat ETM, TM and MSS and very high resolution (WorldView-2, Quickbird, Ikonos multi-spectral satellite images is presented. The study aims at exploring the potential of machine learning algorithms in the context of an object-based image analysis and to thoroughly test the algorithm’s performance under varying conditions to optimize their usage for urban pattern recognition tasks. Four classification algorithms, Normal Bayes, K Nearest Neighbors, Random Trees and Support Vector Machines, which represent different concepts in machine learning (probabilistic, nearest neighbor, tree-based, function-based, have been selected and implemented on a free and open-source basis. Particular focus is given to assess the generalization ability of machine learning algorithms and the transferability of trained learning machines between different image types and image scenes. Moreover, the influence of the number and choice of training data, the influence of the size and composition of the feature vector and the effect of image segmentation on the classification accuracy is evaluated.

  9. The use of machine learning with signal- and NLP processing of source code to detect and classify vulnerabilities and weaknesses with MARFCAT

    CERN Document Server

    Mokhov, Serguei A

    2010-01-01

    We present a machine learning approach to static code analysis for weaknesses related to security and others with the open-source MARF framework and its application to for the NIST's SATE 2010 static analysis tool exhibition workshop.

  10. Applications of Machine Learning in Cancer Prediction and Prognosis

    Directory of Open Access Journals (Sweden)

    Joseph A. Cruz

    2006-01-01

    Full Text Available Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25% improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  11. Uncertainty-Aware Estimation of Population Abundance using Machine Learning

    NARCIS (Netherlands)

    Boom, B.J.; Beauxis-Aussalet, E.M.A.L.; Hardman, L.; Fisher, R.B.

    2015-01-01

    Machine Learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classication based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classic

  12. Comparison of Machine Learning Techniques for Target Detection

    NARCIS (Netherlands)

    Vink, J.P.; Haan, G. de

    2013-01-01

    This paper focuses on machine learning techniques for real-time detection. Although many supervised learning techniques have been described in the literature, no technique always performs best. Several comparative studies are available, but have not always been performedcarefully, leading to invalid

  13. Comparison of Machine Learning Techniques for Target Detection

    NARCIS (Netherlands)

    Vink, J.P.; Haan, G. de

    2013-01-01

    This paper focuses on machine learning techniques for real-time detection. Although many supervised learning techniques have been described in the literature, no technique always performs best. Several comparative studies are available, but have not always been performedcarefully, leading to invalid

  14. Distribution Learning in Evolutionary Strategies and Restricted Boltzmann Machines

    DEFF Research Database (Denmark)

    Krause, Oswin

    The thesis is concerned with learning distributions in the two settings of Evolutionary Strategies (ESs) and Restricted Boltzmann Machines (RBMs). In both cases, the distributions are learned from samples, albeit with different goals. Evolutionary Strategies are concerned with finding an optimum ...

  15. Large-scale Machine Learning in High-dimensional Datasets

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen

    Over the last few decades computers have gotten to play an essential role in our daily life, and data is now being collected in various domains at a faster pace than ever before. This dissertation presents research advances in four machine learning fields that all relate to the challenges imposed...... are better at modeling local heterogeneities. In the field of machine learning for neuroimaging, we introduce learning protocols for real-time functional Magnetic Resonance Imaging (fMRI) that allow for dynamic intervention in the human decision process. Specifically, the model exploits the structure of f...

  16. Machine learning in Python essential techniques for predictive analysis

    CERN Document Server

    Bowles, Michael

    2015-01-01

    Learn a simpler and more effective way to analyze data and predict outcomes with Python Machine Learning in Python shows you how to successfully analyze data using only two core machine learning algorithms, and how to apply them using Python. By focusing on two algorithm families that effectively predict outcomes, this book is able to provide full descriptions of the mechanisms at work, and the examples that illustrate the machinery with specific, hackable code. The algorithms are explained in simple terms with no complex math and applied using Python, with guidance on algorithm selection, d

  17. Less is more: regularization perspectives on large scale machine learning

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Deep learning based techniques provide a possible solution at the expanse of theoretical guidance and, especially, of computational requirements. It is then a key challenge for large scale machine learning to devise approaches guaranteed to be accurate and yet computationally efficient. In this talk, we will consider a regularization perspectives on machine learning appealing to classical ideas in linear algebra and inverse problems to scale-up dramatically nonparametric methods such as kernel methods, often dismissed because of prohibitive costs. Our analysis derives optimal theoretical guarantees while providing experimental results at par or out-performing state of the art approaches.

  18. A Machine Learning Perspective on Predictive Coding with PAQ

    CERN Document Server

    Knoll, Byron

    2011-01-01

    PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a starting point for discussions that will increase our understanding, lead to improvements to PAQ8, and facilitate a transfer of knowledge from PAQ8 to other machine learning methods, such a recurrent neural networks and stochastic memoizers. Finally, the report presents a broad range of new applications of PAQ to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and compression using features from the field of deep learning.

  19. Proceedings of the IEEE Machine Learning for Signal Processing XVII

    DEFF Research Database (Denmark)

    , and two papers from the winners of the Data Analysis Competition. The program included papers in the following areas: genomic signal processing, pattern recognition and classification, image and video processing, blind signal processing, models, learning algorithms, and applications of machine learning......The seventeenth of a series of workshops sponsored by the IEEE Signal Processing Society and organized by the Machine Learning for Signal Processing Technical Committee (MLSP-TC). The field of machine learning has matured considerably in both methodology and real-world application domains and has....... The program featured a Special Session on Genomic Signal Processing, chaired by Prof. Man-Wai Mak from Hong Kong Polytechnic University, Hong Kong. The session included four refereed papers by leading experts in the field. We also continued the tradition of the Data Analysis Competition thanks to the efforts...

  20. Machine Learning Based Diagnosis of Lithium Batteries

    Science.gov (United States)

    Ibe-Ekeocha, Chinemerem Christopher

    The depletion of the world's current petroleum reserve, coupled with the negative effects of carbon monoxide and other harmful petrochemical by-products on the environment, is the driving force behind the movement towards renewable and sustainable energy sources. Furthermore, the growing transportation sector consumes a significant portion of the total energy used in the United States. A complete electrification of this sector would require a significant development in electric vehicles (EVs) and hybrid electric vehicles (HEVs), thus translating to a reduction in the carbon footprint. As the market for EVs and HEVs grows, their battery management systems (BMS) need to be improved accordingly. The BMS is not only responsible for optimally charging and discharging the battery, but also monitoring battery's state of charge (SOC) and state of health (SOH). SOC, similar to an energy gauge, is a representation of a battery's remaining charge level as a percentage of its total possible charge at full capacity. Similarly, SOH is a measure of deterioration of a battery; thus it is a representation of the battery's age. Both SOC and SOH are not measurable, so it is important that these quantities are estimated accurately. An inaccurate estimation could not only be inconvenient for EV consumers, but also potentially detrimental to battery's performance and life. Such estimations could be implemented either online, while battery is in use, or offline when battery is at rest. This thesis presents intelligent online SOC and SOH estimation methods using machine learning tools such as artificial neural network (ANN). ANNs are a powerful generalization tool if programmed and trained effectively. Unlike other estimation strategies, the techniques used require no battery modeling or knowledge of battery internal parameters but rather uses battery's voltage, charge/discharge current, and ambient temperature measurements to accurately estimate battery's SOC and SOH. The developed

  1. Machine Learning Based Statistical Prediction Model for Improving Performance of Live Virtual Machine Migration

    Directory of Open Access Journals (Sweden)

    Minal Patel

    2016-01-01

    Full Text Available Service can be delivered anywhere and anytime in cloud computing using virtualization. The main issue to handle virtualized resources is to balance ongoing workloads. The migration of virtual machines has two major techniques: (i reducing dirty pages using CPU scheduling and (ii compressing memory pages. The available techniques for live migration are not able to predict dirty pages in advance. In the proposed framework, time series based prediction techniques are developed using historical analysis of past data. The time series is generated with transferring of memory pages iteratively. Here, two different regression based models of time series are proposed. The first model is developed using statistical probability based regression model and it is based on ARIMA (autoregressive integrated moving average model. The second one is developed using statistical learning based regression model and it uses SVR (support vector regression model. These models are tested on real data set of Xen to compute downtime, total number of pages transferred, and total migration time. The ARIMA model is able to predict dirty pages with 91.74% accuracy and the SVR model is able to predict dirty pages with 94.61% accuracy that is higher than ARIMA.

  2. Closed-loop control of an experimental mixing layer using machine learning control

    CERN Document Server

    Parezanović, Vladimir; Cordier, Laurent; Noack, Bernd R; Delville, Joël; Bonnet, Jean-Paul; Segond, Marc; Abel, Markus; Brunton, Steven L

    2014-01-01

    A novel framework for closed-loop control of turbulent flows is tested in an experimental mixing layer flow. This framework, called Machine Learning Control (MLC), provides a model-free method of searching for the best function, to be used as a control law in closed-loop flow control. MLC is based on genetic programming, a function optimization method of machine learning. In this article, MLC is benchmarked against classical open-loop actuation of the mixing layer. Results show that this method is capable of producing sensor-based control laws which can rival or surpass the best open-loop forcing, and be robust to changing flow conditions. Additionally, MLC can detect non-linear mechanisms present in the controlled plant, and exploit them to find a better type of actuation than the best periodic forcing.

  3. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    Science.gov (United States)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  4. Performance Evaluation of Machine Learning Algorithms for Urban Pattern Recognition from Multi-spectral Satellite Images

    OpenAIRE

    Marc Wieland; Massimiliano Pittore

    2014-01-01

    In this study, a classification and performance evaluation framework for the recognition of urban patterns in medium (Landsat ETM, TM and MSS) and very high resolution (WorldView-2, Quickbird, Ikonos) multi-spectral satellite images is presented. The study aims at exploring the potential of machine learning algorithms in the context of an object-based image analysis and to thoroughly test the algorithm’s performance under varying conditions to optimize their usage for urban pattern recognitio...

  5. Forecasting the NOK/USD Exchange Rate with Machine Learning Techniques

    OpenAIRE

    Theophilos Papadimitriou; Periklis Gogas; Vasilios Plakandaras

    2013-01-01

    In this paper, we approximate the empirical findings of Papadamou and Markopoulos (2012) on the NOK/USD exchange rate under a Machine Learning (ML) framework. By applying Support Vector Regression (SVR) on a general monetary exchange rate model and a Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS) to extract model structure, we test for the validity of popular monetary exchange rate models. We reach to mixed results since the coefficient sign of interest rate differential is in favor o...

  6. Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs

    OpenAIRE

    Boehm, Matthias

    2015-01-01

    Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks like Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimization. Advanced cost-based optimization techniques require---as a fundamental precondition---an accurat...

  7. Machine learning methods without tears: a primer for ecologists.

    Science.gov (United States)

    Olden, Julian D; Lawler, Joshua J; Poff, N LeRoy

    2008-06-01

    Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.

  8. Machine Learning and Conflict Prediction: A Use Case

    Directory of Open Access Journals (Sweden)

    Chris Perry

    2013-10-01

    Full Text Available For at least the last two decades, the international community in general and the United Nations specifically have attempted to develop robust, accurate and effective conflict early warning system for conflict prevention. One potential and promising component of integrated early warning systems lies in the field of machine learning. This paper aims at giving conflict analysis a basic understanding of machine learning methodology as well as to test the feasibility and added value of such an approach. The paper finds that the selection of appropriate machine learning methodologies can offer substantial improvements in accuracy and performance. It also finds that even at this early stage in testing machine learning on conflict prediction, full models offer more predictive power than simply using a prior outbreak of violence as the leading indicator of current violence. This suggests that a refined data selection methodology combined with strategic use of machine learning algorithms could indeed offer a significant addition to the early warning toolkit. Finally, the paper suggests a number of steps moving forward to improve upon this initial test methodology.

  9. Data Triage of Astronomical Transients: A Machine Learning Approach

    Science.gov (United States)

    Rebbapragada, U.

    This talk presents real-time machine learning systems for triage of big data streams generated by photometric and image-differencing pipelines. Our first system is a transient event detection system in development for the Palomar Transient Factory (PTF), a fully-automated synoptic sky survey that has demonstrated real-time discovery of optical transient events. The system is tasked with discriminating between real astronomical objects and bogus objects, which are usually artifacts of the image differencing pipeline. We performed a machine learning forensics investigation on PTF’s initial system that led to training data improvements that decreased both false positive and negative rates. The second machine learning system is a real-time classification engine of transients and variables in development for the Australian Square Kilometre Array Pathfinder (ASKAP), an upcoming wide-field radio survey with unprecedented ability to investigate the radio transient sky. The goal of our system is to classify light curves into known classes with as few observations as possible in order to trigger follow-up on costlier assets. We discuss the violation of standard machine learning assumptions incurred by this task, and propose the use of ensemble and hierarchical machine learning classifiers that make predictions most robustly.

  10. Machine learning in cell biology - teaching computers to recognize phenotypes.

    Science.gov (United States)

    Sommer, Christoph; Gerlich, Daniel W

    2013-12-15

    Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline.

  11. Fast Affinity Propagation Clustering based on Machine Learning

    OpenAIRE

    Shailendra Kumar Shrivastava; J. L. Rana; DR.R.C.JAIN

    2013-01-01

    Affinity propagation (AP) was recently introduced as an un-supervised learning algorithm for exemplar based clustering. In this paper a novel Fast Affinity Propagation clustering Approach based on Machine Learning (FAPML) has been proposed. FAPML tries to put data points into clusters based on the history of the data points belonging to clusters in early stages. In FAPML we introduce affinity learning constant and dispersion constant which supervise the clustering process. FAPML also enforces...

  12. Single-Machine Scheduling with Accelerating Learning Effects

    Directory of Open Access Journals (Sweden)

    T. C. E. Cheng

    2013-01-01

    Full Text Available Scheduling with learning effects has been widely studied. However, there are situations where the learning effect might accelerate. In this paper, we propose a new model where the learning effect accelerates as time goes by. We derive the optimal solutions for the single-machine problems to minimize the makespan, total completion time, total weighted completion time, maximum lateness, maximum tardiness, and total tardiness.

  13. Learning from minimum entropy queries in a large committee machine

    CERN Document Server

    Sollich, P

    1996-01-01

    In supervised learning, the redundancy contained in random examples can be avoided by learning from queries. Using statistical mechanics, we study learning from minimum entropy queries in a large tree-committee machine. The generalization error decreases exponentially with the number of training examples, providing a significant improvement over the algebraic decay for random examples. The connection between entropy and generalization error in multi-layer networks is discussed, and a computationally cheap algorithm for constructing queries is suggested and analysed.

  14. CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

    2017-07-14

    We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human and machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.

  15. A 3D Human-Machine Integrated Design and Analysis Framework for Squat Exercises with a Smith Machine.

    Science.gov (United States)

    Lee, Haerin; Jung, Moonki; Lee, Ki-Kwang; Lee, Sang Hun

    2017-02-06

    In this paper, we propose a three-dimensional design and evaluation framework and process based on a probabilistic-based motion synthesis algorithm and biomechanical analysis system for the design of the Smith machine and squat training programs. Moreover, we implemented a prototype system to validate the proposed framework. The framework consists of an integrated human-machine-environment model as well as a squat motion synthesis system and biomechanical analysis system. In the design and evaluation process, we created an integrated model in which interactions between a human body and machine or the ground are modeled as joints with constraints at contact points. Next, we generated Smith squat motion using the motion synthesis program based on a Gaussian process regression algorithm with a set of given values for independent variables. Then, using the biomechanical analysis system, we simulated joint moments and muscle activities from the input of the integrated model and squat motion. We validated the model and algorithm through physical experiments measuring the electromyography (EMG) signals, ground forces, and squat motions as well as through a biomechanical simulation of muscle forces. The proposed approach enables the incorporation of biomechanics in the design process and reduces the need for physical experiments and prototypes in the development of training programs and new Smith machines.

  16. A Conceptual Framework for Ambient Learning Displays

    NARCIS (Netherlands)

    Börner, Dirk; Kalz, Marco; Specht, Marcus

    2010-01-01

    Börner, D., Kalz, M., & Specht, M. (2010). A Conceptual Framework for Ambient Learning Displays. In B. Chang, T. Hirashima, & H. Ogata (Eds.), Joint Proceedings of the Work-in-Progress Poster and Invited Young Researcher Symposium for the 18th International Conference on Computers in Education (pp.

  17. A Conceptual Framework for Ambient Learning Displays

    NARCIS (Netherlands)

    Börner, Dirk; Kalz, Marco; Specht, Marcus

    2011-01-01

    Börner, D., Kalz, M., & Specht, M. (2010, 29 November-3 December). A Conceptual Framework for Ambient Learning Displays. Poster presented at the Work-in-Progress Poster and Invited Young Researcher Symposium of the 18th International Conference on Computers in Education, Putrajaya, Malaysia: Asia-Pa

  18. A Conceptual Framework for Ambient Learning Displays

    NARCIS (Netherlands)

    Börner, Dirk; Kalz, Marco; Specht, Marcus

    2010-01-01

    Börner, D., Kalz, M., & Specht, M. (2010). A Conceptual Framework for Ambient Learning Displays. In B. Chang, T. Hirashima, & H. Ogata (Eds.), Joint Proceedings of the Work-in-Progress Poster and Invited Young Researcher Symposium for the 18th International Conference on Computers in Education (pp.

  19. A Conceptual Framework for Ambient Learning Displays

    NARCIS (Netherlands)

    Börner, Dirk; Kalz, Marco; Specht, Marcus

    2011-01-01

    Börner, D., Kalz, M., & Specht, M. (2010, 29 November-3 December). A Conceptual Framework for Ambient Learning Displays. Poster presented at the Work-in-Progress Poster and Invited Young Researcher Symposium of the 18th International Conference on Computers in Education, Putrajaya, Malaysia:

  20. A Multimodal Interaction Framework for Blended Learning

    DEFF Research Database (Denmark)

    Vidakis, Nikolaos; Kalafatis, Konstantinos; Triantafyllidis, Georgios

    2017-01-01

    Humans interact with each other by utilizing the five basic senses as input modalities, whereas sounds, gestures, facial expressions etc. are utilized as output modalities. Multimodal interaction is also used between humans and their surrounding environment, although enhanced with further senses ...... framework enabling deployment of a vast variety of modalities, tailored appropriately for use in blended learning environment....

  1. A Data Protection Framework for Learning Analytics

    Science.gov (United States)

    Cormack, Andrew

    2016-01-01

    Most studies on the use of digital student data adopt an ethical framework derived from human-subject research, based on the informed consent of the experimental subject. However, consent gives universities little guidance on using learning analytics as a routine part of educational provision: which purposes are legitimate and which analyses…

  2. Mississippi Curriculum Framework for Machine Tool Operation/Machine Shop (Program CIP: 48.0503--Machine Shop Assistant). Secondary Programs.

    Science.gov (United States)

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which reflects Mississippi's statutory requirement that instructional programs be based on core curricula and performance-based assessment, contains outlines of the instructional units required in local instructional management plans and daily lesson plans for machine tool operation/machine shop I and II. Presented first are a…

  3. Recognition of printed Arabic text using machine learning

    Science.gov (United States)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  4. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    Science.gov (United States)

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  5. State Machine Framework And Its Use For Driving LHC Operational states

    CERN Document Server

    Misiowiec, M; Solfaroli Camilloci, M

    2011-01-01

    The LHC follows a complex operational cycle with 12 major phases that include equipment tests, preparation, beam injection, ramping and squeezing, finally followed by the physics phase. This cycle is modelled and enforced with a state machine, whereby each operational phase is represented by a state. On each transition, before entering the next state, a series of conditions is verified to make sure the LHC is ready to move on. The State Machine framework was developed to cater for building independent or embedded state machines. They safely drive between the states executing tasks bound to transitions and broadcast related information to interested parties. The framework encourages users to program their own actions. Simple configuration management allows the operators to define and maintain complex models themselves. An emphasis was also put on easy interaction with the remote state machine instances through standard communication protocols. On top of its core functionality, the framework offers a transparen...

  6. Building Artificial Vision Systems with Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    LeCun, Yann [New York University

    2011-02-23

    Three questions pose the next challenge for Artificial Intelligence (AI), robotics, and neuroscience. How do we learn perception (e.g. vision)? How do we learn representations of the perceptual world? How do we learn visual categories from just a few examples?

  7. An Organizational Learning Framework for Patient Safety.

    Science.gov (United States)

    Edwards, Marc T

    Despite concerted effort to improve quality and safety, high reliability remains a distant goal. Although this likely reflects the challenge of organizational change, persistent controversy over basic issues suggests that weaknesses in conceptual models may contribute. The essence of operational improvement is organizational learning. This article presents a framework for identifying leverage points for improvement based on organizational learning theory and applies it to an analysis of current practice and controversy. Organizations learn from others, from defects, from measurement, and from mindfulness. These learning modes correspond with contemporary themes of collaboration, no blame for human error, accountability for performance, and managing the unexpected. The collaborative model has dominated improvement efforts. Greater attention to the underdeveloped modes of organizational learning may foster more rapid progress in patient safety by increasing organizational capabilities, strengthening a culture of safety, and fixing more of the process problems that contribute to patient harm.

  8. A Novel Framework for Learning Geometry-Aware Kernels.

    Science.gov (United States)

    Pan, Binbin; Chen, Wen-Sheng; Xu, Chen; Chen, Bo

    2016-05-01

    The data from real world usually have nonlinear geometric structure, which are often assumed to lie on or close to a low-dimensional manifold in a high-dimensional space. How to detect this nonlinear geometric structure of the data is important for the learning algorithms. Recently, there has been a surge of interest in utilizing kernels to exploit the manifold structure of the data. Such kernels are called geometry-aware kernels and are widely used in the machine learning algorithms. The performance of these algorithms critically relies on the choice of the geometry-aware kernels. Intuitively, a good geometry-aware kernel should utilize additional information other than the geometric information. In many applications, it is required to compute the out-of-sample data directly. However, most of the geometry-aware kernel methods are restricted to the available data given beforehand, with no straightforward extension for out-of-sample data. In this paper, we propose a framework for more general geometry-aware kernel learning. The proposed framework integrates multiple sources of information and enables us to develop flexible and effective kernel matrices. Then, we theoretically show how the learned kernel matrices are extended to the corresponding kernel functions, in which the out-of-sample data can be computed directly. Under our framework, a novel family of geometry-aware kernels is developed. Especially, some existing geometry-aware kernels can be viewed as instances of our framework. The performance of the kernels is evaluated on dimensionality reduction, classification, and clustering tasks. The empirical results show that our kernels significantly improve the performance.

  9. Machine Learning Principles Can Improve Hip Fracture Prediction

    DEFF Research Database (Denmark)

    Kruse, Christian; Eiken, Pia; Vestergaard, Peter

    2017-01-01

    Apply machine learning principles to predict hip fractures and estimate predictor importance in Dual-energy X-ray absorptiometry (DXA)-scanned men and women. Dual-energy X-ray absorptiometry data from two Danish regions between 1996 and 2006 were combined with national Danish patient data.......89 [0.82; 0.95], but with poor calibration in higher probabilities. A ten predictor subset (BMD, biochemical cholesterol and liver function tests, penicillin use and osteoarthritis diagnoses) achieved a test AUC of 0.86 [0.78; 0.94] using an "xgbTree" model. Machine learning can improve hip fracture...... prediction beyond logistic regression using ensemble models. Compiling data from international cohorts of longer follow-up and performing similar machine learning procedures has the potential to further improve discrimination and calibration....

  10. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology

    Directory of Open Access Journals (Sweden)

    Jieru Zhang

    2016-01-01

    Full Text Available Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram, have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  11. Photometric classification of emission line galaxies with Machine Learning methods

    CERN Document Server

    Cavuoti, Stefano; D'Abrusco, Raffaele; Longo, Giuseppe; Paolillo, Maurizio

    2013-01-01

    In this paper we discuss an application of machine learning based methods to the identification of candidate AGN from optical survey data and to the automatic classification of AGNs in broad classes. We applied four different machine learning algorithms, namely the Multi Layer Perceptron (MLP), trained respectively with the Conjugate Gradient, Scaled Conjugate Gradient and Quasi Newton learning rules, and the Support Vector Machines (SVM), to tackle the problem of the classification of emission line galaxies in different classes, mainly AGNs vs non-AGNs, obtained using optical photometry in place of the diagnostics based on line intensity ratios which are classically used in the literature. Using the same photometric features we discuss also the behavior of the classifiers on finer AGN classification tasks, namely Seyfert I vs Seyfert II and Seyfert vs LINER. Furthermore we describe the algorithms employed, the samples of spectroscopically classified galaxies used to train the algorithms, the procedure follow...

  12. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    Directory of Open Access Journals (Sweden)

    Saerom Park

    Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  13. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    Science.gov (United States)

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  14. Machine learning for Big Data analytics in plants.

    Science.gov (United States)

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences.

  15. Proceedings of IEEE Machine Learning for Signal Processing Workshop XV

    DEFF Research Database (Denmark)

    Larsen, Jan

    These proceedings contains refereed papers presented at the Fifteenth IEEE Workshop on Machine Learning for Signal Processing (MLSP’2005), held in Mystic, Connecticut, USA, September 28-30, 2005. This is a continuation of the IEEE Workshops on Neural Networks for Signal Processing (NNSP) organized...... by the NNSP Technical Committee of the IEEE Signal Processing Society. The name of the Technical Committee, hence of the Workshop, was changed to Machine Learning for Signal Processing in September 2003 to better reflect the areas represented by the Technical Committee. The conference is organized...... by the Machine Learning for Signal Processing Technical Committee with sponsorship of the IEEE Signal Processing Society. Following the practice started two years ago, the bound volume of the proceedings is going to be published by IEEE following the Workshop, and we are pleased to offer to conference attendees...

  16. Proceedings of IEEE Machine Learning for Signal Processing Workshop XV

    DEFF Research Database (Denmark)

    Larsen, Jan

    These proceedings contains refereed papers presented at the Fifteenth IEEE Workshop on Machine Learning for Signal Processing (MLSP’2005), held in Mystic, Connecticut, USA, September 28-30, 2005. This is a continuation of the IEEE Workshops on Neural Networks for Signal Processing (NNSP) organized...... by the NNSP Technical Committee of the IEEE Signal Processing Society. The name of the Technical Committee, hence of the Workshop, was changed to Machine Learning for Signal Processing in September 2003 to better reflect the areas represented by the Technical Committee. The conference is organized...... by the Machine Learning for Signal Processing Technical Committee with sponsorship of the IEEE Signal Processing Society. Following the practice started two years ago, the bound volume of the proceedings is going to be published by IEEE following the Workshop, and we are pleased to offer to conference attendees...

  17. Machine Learning: A Crucial Tool for Sensor Design

    Directory of Open Access Journals (Sweden)

    Weixiang Zhao

    2008-12-01

    Full Text Available Sensors have been widely used for disease diagnosis, environmental quality monitoring, food quality control, industrial process analysis and control, and other related fields. As a key tool for sensor data analysis, machine learning is becoming a core part of novel sensor design. Dividing a complete machine learning process into three steps: data pre-treatment, feature extraction and dimension reduction, and system modeling, this paper provides a review of the methods that are widely used for each step. For each method, the principles and the key issues that affect modeling results are discussed. After reviewing the potential problems in machine learning processes, this paper gives a summary of current algorithms in this field and provides some feasible directions for future studies.

  18. Galaxy Zoo: Reproducing Galaxy Morphologies Via Machine Learning

    CERN Document Server

    Banerji, Manda; Lintott, Chris J; Abdalla, Filipe B; Schawinski, Kevin; Andreescu, Dan; Bamford, Steven; Murray, Phil; Raddick, M Jordan; Slosar, Anze; Szalay, Alex; Thomas, Daniel; Vandenberg, Jan

    2009-01-01

    We present morphological classifications obtained using machine learning for objects in SDSS DR7 that have been classified by Galaxy Zoo into three classes namely spirals, ellipticals and stars/unique objects. An artificial neural network is trained on a subset of objects classified by the human eye and we test whether the machine learning algorithm can reproduce the human classifications for the rest of the sample. We find that the success of the neural network in matching the human classifications depends crucially on the set of input parameters chosen for the machine-learning algorithm. The colours, concentrations and parameters associated with profile-fitting are reasonable in seperating the stars and galaxies into three classes. However, these results are considerably improved when adding adaptive shape parameters as well as texture. The adaptive moments and texture parameters alone cannot distinguish between stars and elliptical galaxies. Using a set of thirteen distance-independant parameters, the neur...

  19. Fast learning method for convolutional neural networks using extreme learning machine and its application to lane detection.

    Science.gov (United States)

    Kim, Jihun; Kim, Jonghong; Jang, Gil-Jin; Lee, Minho

    2017-03-01

    Deep learning has received significant attention recently as a promising solution to many problems in the area of artificial intelligence. Among several deep learning architectures, convolutional neural networks (CNNs) demonstrate superior performance when compared to other machine learning methods in the applications of object detection and recognition. We use a CNN for image enhancement and the detection of driving lanes on motorways. In general, the process of lane detection consists of edge extraction and line detection. A CNN can be used to enhance the input images before lane detection by excluding noise and obstacles that are irrelevant to the edge detection result. However, training conventional CNNs requires considerable computation and a big dataset. Therefore, we suggest a new learning algorithm for CNNs using an extreme learning machine (ELM). The ELM is a fast learning method used to calculate network weights between output and hidden layers in a single iteration and thus, can dramatically reduce learning time while producing accurate results with minimal training data. A conventional ELM can be applied to networks with a single hidden layer; as such, we propose a stacked ELM architecture in the CNN framework. Further, we modify the backpropagation algorithm to find the targets of hidden layers and effectively learn network weights while maintaining performance. Experimental results confirm that the proposed method is effective in reducing learning time and improving performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  1. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    Science.gov (United States)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  2. Machine Learning: When and Where the Horses Went Astray?

    CERN Document Server

    Diamant, Emanuel

    2009-01-01

    Machine Learning is usually defined as a subfield of AI, which is busy with information extraction from raw data sets. Despite of its common acceptance and widespread recognition, this definition is wrong and groundless. Meaningful information does not belong to the data that bear it. It belongs to the observers of the data and it is a shared agreement and a convention among them. Therefore, this private information cannot be extracted from the data by any means. Therefore, all further attempts of Machine Learning apologists to justify their funny business are inappropriate.

  3. Oceanic eddy detection and lifetime forecast using machine learning methods

    Science.gov (United States)

    Ashkezari, Mohammad D.; Hill, Christopher N.; Follett, Christopher N.; Forget, Gaël.; Follows, Michael J.

    2016-12-01

    We report a novel altimetry-based machine learning approach for eddy identification and characterization. The machine learning models use daily maps of geostrophic velocity anomalies and are trained according to the phase angle between the zonal and meridional components at each grid point. The trained models are then used to identify the corresponding eddy phase patterns and to predict the lifetime of a detected eddy structure. The performance of the proposed method is examined at two dynamically different regions to demonstrate its robust behavior and region independency.

  4. Energy landscapes for a machine learning application to series data.

    Science.gov (United States)

    Ballard, Andrew J; Stevenson, Jacob D; Das, Ritankar; Wales, David J

    2016-03-28

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  5. Advances in independent component analysis and learning machines

    CERN Document Server

    Bingham, Ella; Laaksonen, Jorma; Lampinen, Jouko

    2015-01-01

    In honour of Professor Erkki Oja, one of the pioneers of Independent Component Analysis (ICA), this book reviews key advances in the theory and application of ICA, as well as its influence on signal processing, pattern recognition, machine learning, and data mining. Examples of topics which have developed from the advances of ICA, which are covered in the book are: A unifying probabilistic model for PCA and ICA Optimization methods for matrix decompositions Insights into the FastICA algorithmUnsupervised deep learning Machine vision and image retrieval A review of developments in the t

  6. Advances in machine learning and data mining for astronomy

    CERN Document Server

    Way, Michael J

    2012-01-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health

  7. Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning.

    Science.gov (United States)

    Gorban, A N; Mirkes, E M; Zinovyev, A

    2016-12-01

    Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L1 norm or even sub-linear potentials corresponding to quasinorms Lp (0basic universal data approximation algorithms (k-means, principal components, principal manifolds and graphs, regularized and sparse regression), based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials). We develop a new and universal framework to minimize arbitrary sub-quadratic error potentials using an algorithm with guaranteed fast convergence to the local or global error minimum. The theory of PQSQ potentials is based on the notion of the cone of minorant functions, and represents a natural approximation formalism based on the application of min-plus algebra. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized and sparse regression, leading to the improvement in the computational cost/accuracy trade-off. We demonstrate that on synthetic and real-life datasets PQSQ-based machine learning methods achieve orders of magnitude faster computational performance than the corresponding state-of-the-art methods, having similar or better approximation accuracy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    Science.gov (United States)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  9. The cerebellum: a neuronal learning machine?

    Science.gov (United States)

    Raymond, J. L.; Lisberger, S. G.; Mauk, M. D.

    1996-01-01

    Comparison of two seemingly quite different behaviors yields a surprisingly consistent picture of the role of the cerebellum in motor learning. Behavioral and physiological data about classical conditioning of the eyelid response and motor learning in the vestibulo-ocular reflex suggests that (i) plasticity is distributed between the cerebellar cortex and the deep cerebellar nuclei; (ii) the cerebellar cortex plays a special role in learning the timing of movement; and (iii) the cerebellar cortex guides learning in the deep nuclei, which may allow learning to be transferred from the cortex to the deep nuclei. Because many of the similarities in the data from the two systems typify general features of cerebellar organization, the cerebellar mechanisms of learning in these two systems may represent principles that apply to many motor systems.

  10. Tensor Voting A Perceptual Organization Approach to Computer Vision and Machine Learning

    CERN Document Server

    Mordohai, Philippos

    2006-01-01

    This lecture presents research on a general framework for perceptual organization that was conducted mainly at the Institute for Robotics and Intelligent Systems of the University of Southern California. It is not written as a historical recount of the work, since the sequence of the presentation is not in chronological order. It aims at presenting an approach to a wide range of problems in computer vision and machine learning that is data-driven, local and requires a minimal number of assumptions. The tensor voting framework combines these properties and provides a unified perceptual organiza

  11. Machine learning bandgaps of double perovskites

    National Research Council Canada - National Science Library

    Pilania, G; Mannodi-Kanakkithodi, A; Uberuaga, B P; Ramprasad, R; Gubernatis, J E; Lookman, T

    2016-01-01

    .... While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning...

  12. Machine-Learning Algorithms to Code Public Health Spending Accounts.

    Science.gov (United States)

    Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

    Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

  13. Optimized Grid Based e-Learning Framework

    Directory of Open Access Journals (Sweden)

    Suresh Jaganathan

    2014-12-01

    Full Text Available E-Learning is the process of extending the resources to different locations by using multimedia communications. Many e-Learning methodologies are available and based on client-server, peer-to-peer and using Grid Computing concepts. To establish e-Learning process, systems should satisfy these needs, i high storage for storing, ii high network throughput for faster transfer and iii efficient streaming of materials. The first and second needs are satisfied by using Grid and P2P technologies and the third need can be achieved by an efficient video compression algorithm. This study proposes a framework, called Optimized Grid Based e-Learning (OgBeL , which adopts both Grid and P2P technology. To reduce the e-Learning material size for efficient streaming, a light weight compression algorithm called (dWave is embedded in (OgBeL . The behavior of framework is analyzed in terms of time taken to transfer files using in-use grid protocols and in networks combined with grid and P2P.

  14. Tackling the x-ray cargo inspection challenge using machine learning

    Science.gov (United States)

    Jaccard, Nicolas; Rogers, Thomas W.; Morton, Edward J.; Griffin, Lewis D.

    2016-05-01

    The current infrastructure for non-intrusive inspection of cargo containers cannot accommodate exploding com-merce volumes and increasingly stringent regulations. There is a pressing need to develop methods to automate parts of the inspection workflow, enabling expert operators to focus on a manageable number of high-risk images. To tackle this challenge, we developed a modular framework for automated X-ray cargo image inspection. Employing state-of-the-art machine learning approaches, including deep learning, we demonstrate high performance for empty container verification and specific threat detection. This work constitutes a significant step towards the partial automation of X-ray cargo image inspection.

  15. Feature importance for machine learning redshifts applied to SDSS galaxies

    CERN Document Server

    Hoyle, Ben; Zitlau, Roman; Steiz, Stella; Weller, Jochen

    2014-01-01

    We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Random Decision Forests (RDF) with the ensemble learning routine Adaboost. We select a list of 85 easily measured (or derived) photometric quantities (or 'features') and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNN) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18% and decreases the catastrophic outlier rate by 32%. We further compare the redshift estimate from RDF using the ensemble learning routine Adaboost with those from two different aNNs, and with photometric redshifts available from the SDSS. We find that the RDF requires orders of magnitude less computation time than the aNNs to obtain a m...

  16. Refining fuzzy logic controllers with machine learning

    Science.gov (United States)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  17. Stacking for machine learning redshifts applied to SDSS galaxies

    OpenAIRE

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-01-01

    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised...

  18. Inverse Learning Control of Nonlinear Systems Using Support Vector Machines

    Institute of Scientific and Technical Information of China (English)

    HU Zhong-hui; LI Yuan-gui; CAI Yun-ze; XU Xiao-ming

    2005-01-01

    An inverse learning control scheme using the support vector machine (SVM) for regression was proposed. The inverse learning approach is originally researched in the neural networks. Compared with neural networks, SVMs overcome the problems of local minimum and curse of dimensionality. Additionally, the good generalization performance of SVMs increases the robustness of control system. The method of designing SVM inverselearning controller was presented. The proposed method is demonstrated on tracking problems and the performance is satisfactory.

  19. Machine Learning Tools for Geomorphic Mapping of Planetary Surfaces

    OpenAIRE

    Stepinski, Tomasz F.; Vilalta, Ricardo

    2010-01-01

    Geomorphic auto-mapping of planetary surfaces is a challenging problem. Here we have described how machine learning techniques, such as clustering or classification, can be utilized to automate the process of geomorphic mapping for exploratory and exploitation purposes. Relatively coarse resolution of planetary topographic data limits the number of features that can be used in the learning process and makes planetary auto-mapping more challenging than terrestrial auto-mapping. With this cavea...

  20. A 128-Channel Extreme Learning Machine-Based Neural Decoder for Brain Machine Interfaces.

    Science.gov (United States)

    Chen, Yi; Yao, Enyi; Basu, Arindam

    2016-06-01

    Currently, state-of-the-art motor intention decoding algorithms in brain-machine interfaces are mostly implemented on a PC and consume significant amount of power. A machine learning coprocessor in 0.35- μm CMOS for the motor intention decoding in the brain-machine interfaces is presented in this paper. Using Extreme Learning Machine algorithm and low-power analog processing, it achieves an energy efficiency of 3.45 pJ/MAC at a classification rate of 50 Hz. The learning in second stage and corresponding digitally stored coefficients are used to increase robustness of the core analog processor. The chip is verified with neural data recorded in monkey finger movements experiment, achieving a decoding accuracy of 99.3% for movement type. The same coprocessor is also used to decode time of movement from asynchronous neural spikes. With time-delayed feature dimension enhancement, the classification accuracy can be increased by 5% with limited number of input channels. Further, a sparsity promoting training scheme enables reduction of number of programmable weights by ≈ 2X.

  1. Parsing learning in networks using brain-machine interfaces.

    Science.gov (United States)

    Orsborn, Amy L; Pesaran, Bijan

    2017-08-24

    Brain-machine interfaces (BMIs) define new ways to interact with our environment and hold great promise for clinical therapies. Motor BMIs, for instance, re-route neural activity to control movements of a new effector and could restore movement to people with paralysis. Increasing experience shows that interfacing with the brain inevitably changes the brain. BMIs engage and depend on a wide array of innate learning mechanisms to produce meaningful behavior. BMIs precisely define the information streams into and out of the brain, but engage wide-spread learning. We take a network perspective and review existing observations of learning in motor BMIs to show that BMIs engage multiple learning mechanisms distributed across neural networks. Recent studies demonstrate the advantages of BMI for parsing this learning and its underlying neural mechanisms. BMIs therefore provide a powerful tool for studying the neural mechanisms of learning that highlights the critical role of learning in engineered neural therapies. Copyright © 2017. Published by Elsevier Ltd.

  2. Machine learning for adaptive many-core machines a practical approach

    CERN Document Server

    Lopes, Noel

    2015-01-01

    The overwhelming data produced everyday and the increasing performance and cost requirements of applications?are transversal to a wide range of activities in society, from science to industry. In particular, the magnitude and complexity of the tasks that Machine Learning (ML) algorithms have to solve are driving the need to devise adaptive many-core machines that scale well with the volume of data, or in other words, can handle Big Data.This book gives a concise view on how to extend the applicability of well-known ML algorithms in Graphics Processing Unit (GPU) with data scalability in mind.

  3. Automated mapping of building facades by machine learning

    DEFF Research Database (Denmark)

    Höhle, Joachim

    2014-01-01

    Facades of buildings contain various types of objects which have to be recorded for information systems. The article describes a solution for this task focussing on automated classification by means of machine learning techniques. Stereo pairs of oblique images are used to derive 3D point clouds...

  4. Machine learning for network-based malware detection

    DEFF Research Database (Denmark)

    Stevanovic, Matija

    and based on different, mutually complementary, principles of traffic analysis. The proposed approaches rely on machine learning algorithms (MLAs) for automated and resource-efficient identification of the patterns of malicious network traffic. We evaluated the proposed methods through extensive evaluations...

  5. Machine Learning and Data Mining Methods in Diabetes Research.

    Science.gov (United States)

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  6. An efficient learning procedure for deep Boltzmann machines.

    Science.gov (United States)

    Salakhutdinov, Ruslan; Hinton, Geoffrey

    2012-08-01

    We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer pretraining phase that initializes the weights sensibly. The pretraining also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB data sets showing that deep Boltzmann machines learn very good generative models of handwritten digits and 3D objects. We also show that the features discovered by deep Boltzmann machines are a very effective way to initialize the hidden layers of feedforward neural nets, which are then discriminatively fine-tuned.

  7. Predicting single-molecule conductance through machine learning

    Science.gov (United States)

    Lanzillo, Nicholas A.; Breneman, Curt M.

    2016-10-01

    We present a robust machine learning model that is trained on the experimentally determined electrical conductance values of approximately 120 single-molecule junctions used in scanning tunnelling microscope molecular break junction (STM-MBJ) experiments. Quantum mechanical, chemical, and topological descriptors are used to correlate each molecular structure with a conductance value, and the resulting machine-learning model can predict the corresponding value of conductance with correlation coefficients of r 2 = 0.95 for the training set and r 2 = 0.78 for a blind testing set. While neglecting entirely the effects of the metal contacts, this work demonstrates that single molecule conductance can be qualitatively correlated with a number of molecular descriptors through a suitably trained machine learning model. The dominant features in the machine learning model include those based on the electronic wavefunction, the geometry/topology of the molecule as well as the surface chemistry of the molecule. This model can be used to identify promising molecular structures for use in single-molecule electronic circuits and can guide synthesis and experiments in the future.

  8. Video Quality Assessment and Machine Learning: Performance and Interpretability

    DEFF Research Database (Denmark)

    Søgaard, Jacob; Forchhammer, Søren; Korhonen, Jari

    2015-01-01

    In this work we compare a simple and a complex Machine Learning (ML) method used for the purpose of Video Quality Assessment (VQA). The simple ML method chosen is the Elastic Net (EN), which is a regularized linear regression model and easier to interpret. The more complex method chosen is Support...

  9. Extracting Information from Spoken User Input. A Machine Learning Approach

    NARCIS (Netherlands)

    Lendvai, P.K.

    2004-01-01

    We propose a module that performs automatic analysis of user input in spoken dialogue systems using machine learning algorithms. The input to the module is material received from the speech recogniser and the dialogue manager of the spoken dialogue system, the output is a four-level

  10. Machine learning versus knowledge based classification of legal texts

    NARCIS (Netherlands)

    de Maat, E.; Krabben, K.; Winkels, R.

    2010-01-01

    This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but

  11. Runtime Optimizations for Tree-Based Machine Learning Models

    NARCIS (Netherlands)

    N. Asadi; J.J.P. Lin (Jimmy); A.P. de Vries (Arjen)

    2014-01-01

    htmlabstractTree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression

  12. Analog neural network for support vector machine learning.

    Science.gov (United States)

    Perfetti, Renzo; Ricci, Elisa

    2006-07-01

    An analog neural network for support vector machine learning is proposed, based on a partially dual formulation of the quadratic programming problem. It results in a simpler circuit implementation with respect to existing neural solutions for the same application. The effectiveness of the proposed network is shown through some computer simulations concerning benchmark problems.

  13. Machine Translation-Assisted Language Learning: Writing for Beginners

    Science.gov (United States)

    Garcia, Ignacio; Pena, Maria Isabel

    2011-01-01

    The few studies that deal with machine translation (MT) as a language learning tool focus on its use by advanced learners, never by beginners. Yet, freely available MT engines (i.e. Google Translate) and MT-related web initiatives (i.e. Gabble-on.com) position themselves to cater precisely to the needs of learners with a limited command of a…

  14. Plasma disruption prediction using machine learning methods: DIII-D

    Science.gov (United States)

    Lupin-Jimenez, L.; Kolemen, E.; Eldon, D.; Eidietis, N.

    2016-10-01

    Plasma disruption prediction is becoming more important with the development of larger tokamaks, due to the larger amount of thermal and magnetic energy that can be stored. By accurately predicting an impending disruption, the disruption's impact can be mitigated or, better, prevented. Recent approaches to disruption prediction have been through implementation of machine learning methods, which characterize raw and processed diagnostic data to develop accurate prediction models. Using disruption trials from the DIII-D database, the effectiveness of different machine learning methods are characterized. Developed real time disruption prediction approaches are focused on tearing and locking modes. Machine learning methods used include random forests, multilayer perceptrons, and traditional regression analysis. The algorithms are trained with data within short time frames, and whether or not a disruption occurs within the time window after the end of the frame. Initial results from the machine learning algorithms will be presented. Work supported by US DOE under the Science Undergraduate Laboratory Internship (SULI) program, DE-FC02-04ER54698, and DE-AC02-09CH11466.

  15. Acquiring Software Design Schemas: A Machine Learning Perspective

    Science.gov (United States)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  16. Learning Pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data

    NARCIS (Netherlands)

    Di Mitri, Daniele; Scheffel, Maren; Drachsler, Hendrik; Börner, Dirk; Ternier, Stefaan; Specht, Marcus

    2017-01-01

    Learning Pulse explores whether using a machine learning approach on multimodal data such as heart rate, step count, weather condition and learning activity can be used to predict learning performance in self-regulated learning settings. An experiment was carried out lasting eight weeks involving Ph

  17. Online learning control using adaptive critic designs with sparse kernel machines.

    Science.gov (United States)

    Xu, Xin; Hou, Zhongsheng; Lian, Chuanqiang; He, Haibo

    2013-05-01

    In the past decade, adaptive critic designs (ACDs), including heuristic dynamic programming (HDP), dual heuristic programming (DHP), and their action-dependent ones, have been widely studied to realize online learning control of dynamical systems. However, because neural networks with manually designed features are commonly used to deal with continuous state and action spaces, the generalization capability and learning efficiency of previous ACDs still need to be improved. In this paper, a novel framework of ACDs with sparse kernel machines is presented by integrating kernel methods into the critic of ACDs. To improve the generalization capability as well as the computational efficiency of kernel machines, a sparsification method based on the approximately linear dependence analysis is used. Using the sparse kernel machines, two kernel-based ACD algorithms, that is, kernel HDP (KHDP) and kernel DHP (KDHP), are proposed and their performance is analyzed both theoretically and empirically. Because of the representation learning and generalization capability of sparse kernel machines, KHDP and KDHP can obtain much better performance than previous HDP and DHP with manually designed neural networks. Simulation and experimental results of two nonlinear control problems, that is, a continuous-action inverted pendulum problem and a ball and plate control problem, demonstrate the effectiveness of the proposed kernel ACD methods.

  18. Committee of machine learning predictors of hydrological models uncertainty

    Science.gov (United States)

    Kayastha, Nagendra; Solomatine, Dimitri

    2014-05-01

    In prediction of uncertainty based on machine learning methods, the results of various sampling schemes namely, Monte Carlo sampling (MCS), generalized likelihood uncertainty estimation (GLUE), Markov chain Monte Carlo (MCMC), shuffled complex evolution metropolis algorithm (SCEMUA), differential evolution adaptive metropolis (DREAM), particle swarm optimization (PSO) and adaptive cluster covering (ACCO)[1] used to build a predictive models. These models predict the uncertainty (quantiles of pdf) of a deterministic output from hydrological model [2]. Inputs to these models are the specially identified representative variables (past events precipitation and flows). The trained machine learning models are then employed to predict the model output uncertainty which is specific for the new input data. For each sampling scheme three machine learning methods namely, artificial neural networks, model tree, locally weighted regression are applied to predict output uncertainties. The problem here is that different sampling algorithms result in different data sets used to train different machine learning models which leads to several models (21 predictive uncertainty models). There is no clear evidence which model is the best since there is no basis for comparison. A solution could be to form a committee of all models and to sue a dynamic averaging scheme to generate the final output [3]. This approach is applied to estimate uncertainty of streamflows simulation from a conceptual hydrological model HBV in the Nzoia catchment in Kenya. [1] N. Kayastha, D. L. Shrestha and D. P. Solomatine. Experiments with several methods of parameter uncertainty estimation in hydrological modeling. Proc. 9th Intern. Conf. on Hydroinformatics, Tianjin, China, September 2010. [2] D. L. Shrestha, N. Kayastha, and D. P. Solomatine, and R. Price. Encapsulation of parameteric uncertainty statistics by various predictive machine learning models: MLUE method, Journal of Hydroinformatic, in press

  19. Learning to learn in the European Reference Framework for lifelong learning

    NARCIS (Netherlands)

    Pirrie, Anne; Thoutenhoofd, Ernst D.

    2013-01-01

    This article explores the construction of learning to learn that is implicit in the document Key Competences for Lifelong LearningEuropean Reference Framework and related education policy from the European Commission. The authors argue that the hallmark of learning to learn is the development of a f

  20. Learning to learn in the European Reference Framework for lifelong learning

    NARCIS (Netherlands)

    Pirrie, Anne; Thoutenhoofd, Ernst D.

    2013-01-01

    This article explores the construction of learning to learn that is implicit in the document Key Competences for Lifelong LearningEuropean Reference Framework and related education policy from the European Commission. The authors argue that the hallmark of learning to learn is the development of a f

  1. A Framework to Support Mobile Learning in Multilingual Environments

    Science.gov (United States)

    Jantjies, Mmaki E.; Joy, Mike

    2014-01-01

    This paper presents a multilingual mobile learning framework that can be used to support the pedagogical development of mobile learning systems which can support learning in under-resourced multilingual schools. The framework has been developed following two empirical mobile learning studies. Both studies were conducted in multilingual South…

  2. The immune system, adaptation, and machine learning

    Science.gov (United States)

    Farmer, J. Doyne; Packard, Norman H.; Perelson, Alan S.

    1986-10-01

    The immune system is capable of learning, memory, and pattern recognition. By employing genetic operators on a time scale fast enough to observe experimentally, the immune system is able to recognize novel shapes without preprogramming. Here we describe a dynamical model for the immune system that is based on the network hypothesis of Jerne, and is simple enough to simulate on a computer. This model has a strong similarity to an approach to learning and artificial intelligence introduced by Holland, called the classifier system. We demonstrate that simple versions of the classifier system can be cast as a nonlinear dynamical system, and explore the analogy between the immune and classifier systems in detail. Through this comparison we hope to gain insight into the way they perform specific tasks, and to suggest new approaches that might be of value in learning systems.

  3. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Directory of Open Access Journals (Sweden)

    Ickwon Choi

    2015-04-01

    Full Text Available The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release. We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  4. Machine learning techniques in dialogue act recognition

    Directory of Open Access Journals (Sweden)

    Mark Fišel

    2007-05-01

    Full Text Available This report addresses dialogue acts, their existing applications and techniques of automatically recognizing them, in Estonia as well as elsewhere. Three main applications are described: in dialogue systems to determine the intention of the speaker, in dialogue systems with machine translation to resolve ambiguities in the possible translation variants and in speech recognition to reduce word recognition error rate. Several recognition techniques are described on the surface level: how they work and how they are trained. A summary of the corresponding representation methods is provided for each technique. The paper also includes examples of applying the techniques to dialogue act recognition.The author comes to the conclusion that using the current evaluation metric it is impossible to compare dialogue act recognition techniques when these are applied to different dialogue act tag sets. Dialogue acts remain an open research area, with space and need for developing new recognition techniques and methods of evaluation.

  5. Machine Learning Approaches for Modeling Spammer Behavior

    CERN Document Server

    Islam, Md Saiful; Islam, Md Rafiqul

    2010-01-01

    Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Na\\"ive Bayesian classifier (Na\\"ive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

  6. Machine learning models in breast cancer survival prediction.

    Science.gov (United States)

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of

  7. Stacking for machine learning redshifts applied to SDSS galaxies

    Science.gov (United States)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called `stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We show how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organizing maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9 per cent and 21 per cent on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4 per cent and 2.5 per cent for the explored metrics and comes at almost no additional computational cost.

  8. Study on Applying Hybrid Machine Learning into Family Apparel Expenditure

    Institute of Scientific and Technical Information of China (English)

    SHEN Lei

    2008-01-01

    Hybrid Machine Learning (HMD is a kind of advanced algorithm in the field of intelligent information process.It combines the induced learning based-on decision-making tree with the blocking neural network.And it provides a useful intelligent knowledge-based data mining technique.Its core algorithm is ID3 and Field Theory based ART (FTART).The paper introduces the principals of hybrid machine learning firstly, and then applies it into analyzing family apparel expenditures and their influencing factors systematically.Finally, compared with those from the traditional statistic methods, the results from HML is more friendly and easily to be understood.Besides, the forecasting by HML is more correct than by the traditional ways.

  9. Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

    CERN Document Server

    Poczos, Barnabas; Schneider, Jeff

    2012-01-01

    Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.

  10. Machine learning on geospatial big data

    CSIR Research Space (South Africa)

    Van Zyl, T

    2014-02-01

    Full Text Available learning, a model may be trained so as to do automated classification of new unlabelled observations, to forecast future observations of some system or automatically spot anomalous events (Vatsavai et al. 2012). Geospatial big data present two opportunities...

  11. Unsupervised process monitoring and fault diagnosis with machine learning methods

    CERN Document Server

    Aldrich, Chris

    2013-01-01

    This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data

  12. Collaborative learning framework for online stakeholder engagement.

    Science.gov (United States)

    Khodyakov, Dmitry; Savitsky, Terrance D; Dalal, Siddhartha

    2016-08-01

    Public and stakeholder engagement can improve the quality of both research and policy decision making. However, such engagement poses significant methodological challenges in terms of collecting and analysing input from large, diverse groups. To explain how online approaches can facilitate iterative stakeholder engagement, to describe how input from large and diverse stakeholder groups can be analysed and to propose a collaborative learning framework (CLF) to interpret stakeholder engagement results. We use 'A National Conversation on Reducing the Burden of Suicide in the United States' as a case study of online stakeholder engagement and employ a Bayesian data modelling approach to develop a CLF. Our data modelling results identified six distinct stakeholder clusters that varied in the degree of individual articulation and group agreement and exhibited one of the three learning styles: learning towards consensus, learning by contrast and groupthink. Learning by contrast was the most common, or dominant, learning style in this study. Study results were used to develop a CLF, which helps explore multitude of stakeholder perspectives; identifies clusters of participants with similar shifts in beliefs; offers an empirically derived indicator of engagement quality; and helps determine the dominant learning style. The ability to detect learning by contrast helps illustrate differences in stakeholder perspectives, which may help policymakers, including Patient-Centered Outcomes Research Institute, make better decisions by soliciting and incorporating input from patients, caregivers, health-care providers and researchers. Study results have important implications for soliciting and incorporating input from stakeholders with different interests and perspectives. © 2015 The Authors. Health Expectations Published by John Wiley & Sons Ltd.

  13. Assessing Implicit Knowledge in BIM Models with Machine Learning

    DEFF Research Database (Denmark)

    Krijnen, Thomas; Tamke, Martin

    2015-01-01

    architects and engineers are able to deduce non-explicitly explicitly stated information, which is often the core of the transported architectural information. This paper investigates how machine learning approaches allow a computational system to deduce implicit knowledge from a set of BIM models.......The promise, which comes along with Building Information Models, is that they are information rich, machine readable and represent the insights of multiple building disciplines within single or linked models. However, this knowledge has to be stated explicitly in order to be understood. Trained...

  14. Machine Learning for Vision-Based Motion Analysis

    CERN Document Server

    Wang, Liang; Cheng, Li; Pietikainen, Matti

    2011-01-01

    Techniques of vision-based motion analysis aim to detect, track, identify, and generally understand the behavior of objects in image sequences. With the growth of video data in a wide range of applications from visual surveillance to human-machine interfaces, the ability to automatically analyze and understand object motions from video footage is of increasing importance. Among the latest developments in this field is the application of statistical machine learning algorithms for object tracking, activity modeling, and recognition. Developed from expert contributions to the first and second In

  15. Assessing Implicit Knowledge in BIM Models with Machine Learning

    DEFF Research Database (Denmark)

    Krijnen, Thomas; Tamke, Martin

    2015-01-01

    architects and engineers are able to deduce non-explicitly explicitly stated information, which is often the core of the transported architectural information. This paper investigates how machine learning approaches allow a computational system to deduce implicit knowledge from a set of BIM models.......The promise, which comes along with Building Information Models, is that they are information rich, machine readable and represent the insights of multiple building disciplines within single or linked models. However, this knowledge has to be stated explicitly in order to be understood. Trained...

  16. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    Science.gov (United States)

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  17. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  18. Graphs in machine learning: an introduction

    CERN Document Server

    Latouche, Pierre

    2015-01-01

    Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globall...

  19. A kernel-based framework for learning graded relations from data

    CERN Document Server

    Waegeman, Willem; Airola, Antti; Salakoski, Tapio; Stock, Michiel; De Baets, Bernard

    2011-01-01

    Driven by a large number of potential applications in areas like bioinformatics, information retrieval and social network analysis, the problem setting of inferring relations between pairs of data objects has recently been investigated quite intensively in the machine learning community. To this end, current approaches typically consider datasets containing crisp relations, so that standard classification methods can be adopted. However, relations between objects like similarities and preferences are often expressed in a graded manner in real-world applications. A general kernel-based framework for learning relations from data is introduced here. It extends existing approaches because both crisp and graded relations are considered, and it unifies existing approaches because different types of graded relations can be modeled, including symmetric and reciprocal relations. This framework establishes important links between recent developments in fuzzy set theory and machine learning. Its usefulness is demonstrat...

  20. Interactive Algorithms for Unsupervised Machine Learning

    Science.gov (United States)

    2015-06-01

    Copetas, and Diane Stidle who greatly enriched my life at CMU. I am thankful to Zeeshan Syed and Eu-Jin Goh who supported me during my internship at Google...for a fun and productive internship . I am looking forward to spending another year at MSR and continuing to collaborate with and learn from everyone at...the nuclear norm minimization program to exactly 1As before this could equivalently be the column space with assumption on the maximal row coherence. 12

  1. Machine Learning for Galaxy Morphology Classification

    CERN Document Server

    Gauci, Adam; Abela, John; Magro, Alessio

    2010-01-01

    In this work, decision tree learning algorithms and fuzzy inferencing systems are applied for galaxy morphology classification. In particular, the CART, the C4.5, the Random Forest and fuzzy logic algorithms are studied and reliable classifiers are developed to distinguish between spiral galaxies, elliptical galaxies or star/unknown galactic objects. Morphology information for the training and testing datasets is obtained from the Galaxy Zoo project while the corresponding photometric and spectra parameters are downloaded from the SDSS DR7 catalogue.

  2. A robotic framework for semantic concept learning.

    Energy Technology Data Exchange (ETDEWEB)

    Squire, Kevin M. (University of Illinois at Urbana-Champaign, Urbana, IL.); Levinson, Stephen E. (University of Illinois at Urbana-Champaign, Urbana, IL.); Xavier, Patrick Gordon

    2004-09-01

    This report describes work carried out under a Sandia National Laboratories Excellence in Engineering Fellowship in the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. Our research group (at UIUC) is developing a intelligent robot, and attempting to teach it language. While there are many aspects of this research, for the purposes of this report the most important are the following ideas. Language is primarily based on semantics, not syntax. To truly learn meaning, the language engine must be part of an embodied intelligent system, one capable of using associative learning to form concepts from the perception of experiences in the world, and further capable of manipulating those concepts symbolically. In the work described here, we explore the use of hidden Markov models (HMMs) in this capacity. HMMs are capable of automatically learning and extracting the underlying structure of continuous-valued inputs and representing that structure in the states of the model. These states can then be treated as symbolic representations of the inputs. We describe a composite model consisting of a cascade of HMMs that can be embedded in a small mobile robot and used to learn correlations among sensory inputs to create symbolic concepts. These symbols can then be manipulated linguistically and used for decision making. This is the project final report for the University Collaboration LDRD project, 'A Robotic Framework for Semantic Concept Learning'.

  3. A Teaching - Learning Framework for MEMS Education

    Science.gov (United States)

    Sheeparamatti, B. G.; Angadi, S. A.; Sheeparamatti, R. B.; Kadadevaramath, J. S.

    2006-04-01

    Micro-Electro-Mechanical Systems (MEMS) technology has been identified as one of the most promising technologies in the 21st century. MEMS technology has opened up a wide array of unforeseen applications. Hence it is necessary to train the technocrats of tomorrow in this emerging field to meet the industrial/societal demands. The drive behind fostering of MEMS technology is the reduction in the cost, size, weight, and power consumption of the sensors, actuators, and associated electronics. MEMS is a multidisciplinary engineering and basic science area which includes electrical engineering, mechanical engineering, material science and biomedical engineering. Hence MEMS education needs a special approach to prepare the technocrats for a career in MEMS. The modern education methodology using computer based training systems (CBTS) with embedded modeling and simulation tools will help in this direction. The availability of computer based learning resources such as MATLAB, ANSYS/Multiphysics and rapid prototyping tools have contributed to proposition of an efficient teaching-learning framework for MEMS education presented in this paper. This paper proposes a conceptual framework for teaching/learning MEMS in the current technical education scenario.

  4. A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Hayden Wimmer

    2014-11-01

    Full Text Available While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM, a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Bayesian classifiers. This applied research reveals practical implications for applying PPDM to data mining and machine learning and serves as a critical first step learning how to apply PPDM to machine learning algorithms and the effects of PPDM on machine learning. Results indicate that certain machine learning algorithms are more suited for use with PPDM techniques.

  5. Screening for Prediabetes Using Machine Learning Models

    Directory of Open Access Journals (Sweden)

    Soo Beom Choi

    2014-01-01

    Full Text Available The global prevalence of diabetes is rapidly increasing. Studies support the necessity of screening and interventions for prediabetes, which could result in serious complications and diabetes. This study aimed at developing an intelligence-based screening model for prediabetes. Data from the Korean National Health and Nutrition Examination Survey (KNHANES were used, excluding subjects with diabetes. The KNHANES 2010 data (n=4685 were used for training and internal validation, while data from KNHANES 2011 (n=4566 were used for external validation. We developed two models to screen for prediabetes using an artificial neural network (ANN and support vector machine (SVM and performed a systematic evaluation of the models using internal and external validation. We compared the performance of our models with that of a screening score model based on logistic regression analysis for prediabetes that had been developed previously. The SVM model showed the areas under the curve of 0.731 in the external datasets, which is higher than those of the ANN model (0.729 and the screening score model (0.712, respectively. The prescreening methods developed in this study performed better than the screening score model that had been developed previously and may be more effective method for prediabetes screening.

  6. Stacking for machine learning redshifts applied to SDSS galaxies

    CERN Document Server

    Zitlau, Roman; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-01-01

    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organising maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9% and 21% on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When appl...

  7. 论机器学习%On Machine Learning

    Institute of Scientific and Technical Information of China (English)

    赵玉鹏

    2011-01-01

    Machine learning is the important branch of Artificial Intelligence and is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data.The research of machine learning has two purpose:or to improve the performance of robot,other is to discover the new knowledge from database.The scientific community of machine learning is divided tow group: one is to improve the algorithm of classification,the other is the computational learning theory.%机器学习是一门人工智能的科学,同时也是实现人工智能的一个重要途径。相应地,机器学习的研究目的有两个:一是用于提高机器人的性能;二是从数据库中发现新的知识。机器学习的研究队伍可以分为两个科学共同体:一个群体用于分类算法的改进;另一个群体主要进行从事计算学习理论的分析。

  8. Machine learning strategies for systems with invariance properties

    Science.gov (United States)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  9. A Comparison Study of Extreme Learning Machine and Least Squares Support Vector Machine for Structural Impact Localization

    OpenAIRE

    Qingsong Xu

    2014-01-01

    Extreme learning machine (ELM) is a learning algorithm for single-hidden layer feedforward neural network dedicated to an extremely fast learning. However, the performance of ELM in structural impact localization is unknown yet. In this paper, a comparison study of ELM with least squares support vector machine (LSSVM) is presented for the application on impact localization of a plate structure with surface-mounted piezoelectric sensors. Both basic and kernel-based ELM regression models have b...

  10. Nonlinear programming for classification problems in machine learning

    Science.gov (United States)

    Astorino, Annabella; Fuduli, Antonio; Gaudioso, Manlio

    2016-10-01

    We survey some nonlinear models for classification problems arising in machine learning. In the last years this field has become more and more relevant due to a lot of practical applications, such as text and web classification, object recognition in machine vision, gene expression profile analysis, DNA and protein analysis, medical diagnosis, customer profiling etc. Classification deals with separation of sets by means of appropriate separation surfaces, which is generally obtained by solving a numerical optimization model. While linear separability is the basis of the most popular approach to classification, the Support Vector Machine (SVM), in the recent years using nonlinear separating surfaces has received some attention. The objective of this work is to recall some of such proposals, mainly in terms of the numerical optimization models. In particular we tackle the polyhedral, ellipsoidal, spherical and conical separation approaches and, for some of them, we also consider the semisupervised versions.

  11. Analysing CMS transfers using Machine Learning techniques

    CERN Document Server

    Diotalevi, Tommaso

    2016-01-01

    LHC experiments transfer more than 10 PB/week between all grid sites using the FTS transfer service. In particular, CMS manages almost 5 PB/week of FTS transfers with PhEDEx (Physics Experiment Data Export). FTS sends metrics about each transfer (e.g. transfer rate, duration, size) to a central HDFS storage at CERN. The work done during these three months, here as a Summer Student, involved the usage of ML techniques, using a CMS framework called DCAFPilot, to process this new data and generate predictions of transfer latencies on all links between Grid sites. This analysis will provide, as a future service, the necessary information in order to proactively identify and maybe fix latency issued transfer over the WLCG.

  12. Learning surface molecular structures via machine vision

    Science.gov (United States)

    Ziatdinov, Maxim; Maksov, Artem; Kalinin, Sergei V.

    2017-08-01

    Recent advances in high resolution scanning transmission electron and scanning probe microscopies have allowed researchers to perform measurements of materials structural parameters and functional properties in real space with a picometre precision. In many technologically relevant atomic and/or molecular systems, however, the information of interest is distributed spatially in a non-uniform manner and may have a complex multi-dimensional nature. One of the critical issues, therefore, lies in being able to accurately identify (`read out') all the individual building blocks in different atomic/molecular architectures, as well as more complex patterns that these blocks may form, on a scale of hundreds and thousands of individual atomic/molecular units. Here we employ machine vision to read and recognize complex molecular assemblies on surfaces. Specifically, we combine Markov random field model and convolutional neural networks to classify structural and rotational states of all individual building blocks in molecular assembly on the metallic surface visualized in high-resolution scanning tunneling microscopy measurements. We show how the obtained full decoding of the system allows us to directly construct a pair density function—a centerpiece in analysis of disorder-property relationship paradigm—as well as to analyze spatial correlations between multiple order parameters at the nanoscale, and elucidate reaction pathway involving molecular conformation changes. The method represents a significant shift in our way of analyzing atomic and/or molecular resolved microscopic images and can be applied to variety of other microscopic measurements of structural, electronic, and magnetic orders in different condensed matter systems.

  13. A Framework for Mobile Learning for Enhancing Learning in Higher Education

    Science.gov (United States)

    Barreh, Kadar Abdillahi; Abas, Zoraini Wati

    2015-01-01

    As mobile learning becomes increasingly pervasive, many higher education institutions have initiated a number of mobile learning initiatives to support their traditional learning modes. This study proposes a framework for mobile learning for enhancing learning in higher education. This framework for mobile learning is based on research conducted…

  14. A Framework for Developing Sustainable E-Learning Programmes

    Science.gov (United States)

    Chipere, Ngoni

    2017-01-01

    A framework was created at the University of the West Indies to guide the development of 18 e-learning programmes. The framework is based on three principles for sustainable e-learning design: (1) stakeholder-centredness; (2) cost-effectiveness and (3) high operational efficiency. These principles give rise to nine framework elements: (1) a labour…

  15. A Framework for Developing Sustainable E-Learning Programmes

    Science.gov (United States)

    Chipere, Ngoni

    2017-01-01

    A framework was created at the University of the West Indies to guide the development of 18 e-learning programmes. The framework is based on three principles for sustainable e-learning design: (1) stakeholder-centredness; (2) cost-effectiveness and (3) high operational efficiency. These principles give rise to nine framework elements: (1) a labour…

  16. Remotely sensed data assimilation technique to develop machine learning models for use in water management

    Science.gov (United States)

    Zaman, Bushra

    Increasing population and water conflicts are making water management one of the most important issues of the present world. It has become absolutely necessary to find ways to manage water more efficiently. Technological advancement has introduced various techniques for data acquisition and analysis, and these tools can be used to address some of the critical issues that challenge water resource management. This research used learning machine techniques and information acquired through remote sensing, to solve problems related to soil moisture estimation and crop identification on large spatial scales. In this dissertation, solutions were proposed in three problem areas that can be important in the decision making process related to water management in irrigated systems. A data assimilation technique was used to build a learning machine model that generated soil moisture estimates commensurate with the scale of the data. The research was taken further by developing a multivariate machine learning algorithm to predict root zone soil moisture both in space and time. Further, a model was developed for supervised classification of multi-spectral reflectance data using a multi-class machine learning algorithm. The procedure was designed for classifying crops but the model is data dependent and can be used with other datasets and hence can be applied to other landcover classification problems. The dissertation compared the performance of relevance vector and the support vector machines in estimating soil moisture. A multivariate relevance vector machine algorithm was tested in the spatio-temporal prediction of soil moisture, and the multi-class relevance vector machine model was used for classifying different crop types. It was concluded that the classification scheme may uncover important data patterns contributing greatly to knowledge bases, and to scientific and medical research. The results for the soil moisture models would give a rough idea to farmers

  17. Machine Learning Techniques for Optical Performance Monitoring from Directly Detected PDM-QAM Signals

    DEFF Research Database (Denmark)

    Thrane, Jakob; Wass, Jesper; Piels, Molly

    2017-01-01

    Linear signal processing algorithms are effective in dealing with linear transmission channel and linear signal detection, while the nonlinear signal processing algorithms, from the machine learning community, are effective in dealing with nonlinear transmission channel and nonlinear signal...... detection. In this paper, a brief overview of the various machine learning methods and their application in optical communication is presented and discussed. Moreover, supervised machine learning methods, such as neural networks and support vector machine, are experimentally demonstrated for in-band optical...

  18. Weka machine learning for predicting the phospholipidosis inducing potential.

    Science.gov (United States)

    Ivanciuc, Ovidiu

    2008-01-01

    The drug discovery and development process is lengthy and expensive, and bringing a drug to market may take up to 18 years and may cost up to 2 billion $US. The extensive use of computer-assisted drug design techniques may considerably increase the chances of finding valuable drug candidates, thus decreasing the drug discovery time and costs. The most important computational approach is represented by structure-activity relationships that can discriminate between sets of chemicals that are active/inactive towards a certain biological receptor. An adverse effect of some cationic amphiphilic drugs is phospholipidosis that manifests as an intracellular accumulation of phospholipids and formation of concentric lamellar bodies. Here we present structure-activity relationships (SAR) computed with a wide variety of machine learning algorithms trained to identify drugs that have phospholipidosis inducing potential. All SAR models are developed with the machine learning software Weka, and include both classical algorithms, such as k-nearest neighbors and decision trees, as well as recently introduced methods, such as support vector machines and artificial immune systems. The best predictions are obtained with support vector machines, followed by perceptron artificial neural network, logistic regression, and k-nearest neighbors.

  19. Active Learning Framework for Non-Intrusive Load Monitoring: Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Xin

    2016-05-16

    Non-Intrusive Load Monitoring (NILM) is a set of techniques that estimate the electricity usage of individual appliances from power measurements taken at a limited number of locations in a building. One of the key challenges in NILM is having too much data without class labels yet being unable to label the data manually for cost or time constraints. This paper presents an active learning framework that helps existing NILM techniques to overcome this challenge. Active learning is an advanced machine learning method that interactively queries a user for the class label information. Unlike most existing NILM systems that heuristically request user inputs, the proposed method only needs minimally sufficient information from a user to build a compact and yet highly representative load signature library. Initial results indicate the proposed method can reduce the user inputs by up to 90% while still achieving similar disaggregation performance compared to a heuristic method. Thus, the proposed method can substantially reduce the burden on the user, improve the performance of a NILM system with limited user inputs, and overcome the key market barriers to the wide adoption of NILM technologies.

  20. Machine learning techniques for razor triggers

    CERN Document Server

    Kolosova, Marina

    2015-01-01

    My project was focused on the development of a neural network which can predict if an event passes or not a razor trigger. Using synthetic data containing jets and missing transverse energy we built and trained a razor network by supervised learning. We accomplished a ∼ 91% agreement between the output of the neural network and the target while the other 10% was due to the noise of the neural network. We could apply such networks during the L1 trigger using neuromorhic hardware. Neuromorphic chips are electronic systems that function in a way similar to an actual brain, they are faster than GPUs or CPUs, but they can only be used with spiking neural networks.