WorldWideScience

Sample records for ensemble classification based

  1. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

    Science.gov (United States)

    Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

    2017-06-01

    Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.

  2. Tweet-based Target Market Classification Using Ensemble Method

    Directory of Open Access Journals (Sweden)

    Muhammad Adi Khairul Anshary

    2016-09-01

    Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.

  3. An Efficient Ensemble Learning Method for Gene Microarray Classification

    Directory of Open Access Journals (Sweden)

    Alireza Osareh

    2013-01-01

    Full Text Available The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  4. Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis

    Directory of Open Access Journals (Sweden)

    Łukasz Augustyniak

    2015-12-01

    Full Text Available We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3–5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

  5. Modality-Driven Classification and Visualization of Ensemble Variance

    Energy Technology Data Exchange (ETDEWEB)

    Bensema, Kevin; Gosink, Luke; Obermaier, Harald; Joy, Kenneth I.

    2016-10-01

    Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no information about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.

  6. Ensemble Classification of Data Streams Based on Attribute Reduction and a Sliding Window

    Directory of Open Access Journals (Sweden)

    Yingchun Chen

    2018-04-01

    Full Text Available With the current increasing volume and dimensionality of data, traditional data classification algorithms are unable to satisfy the demands of practical classification applications of data streams. To deal with noise and concept drift in data streams, we propose an ensemble classification algorithm based on attribute reduction and a sliding window in this paper. Using mutual information, an approximate attribute reduction algorithm based on rough sets is used to reduce data dimensionality and increase the diversity of reduced results in the algorithm. A double-threshold concept drift detection method and a three-stage sliding window control strategy are introduced to improve the performance of the algorithm when dealing with both noise and concept drift. The classification precision is further improved by updating the base classifiers and their nonlinear weights. Experiments on synthetic datasets and actual datasets demonstrate the performance of the algorithm in terms of classification precision, memory use, and time efficiency.

  7. Ensemble Deep Learning for Biomedical Time Series Classification

    Directory of Open Access Journals (Sweden)

    Lin-peng Jin

    2016-01-01

    Full Text Available Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost.

  8. Exploring diversity in ensemble classification: Applications in large area land cover mapping

    Science.gov (United States)

    Mellor, Andrew; Boukir, Samia

    2017-07-01

    Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area

  9. Short text sentiment classification based on feature extension and ensemble classifier

    Science.gov (United States)

    Liu, Yang; Zhu, Xie

    2018-05-01

    With the rapid development of Internet social media, excavating the emotional tendencies of the short text information from the Internet, the acquisition of useful information has attracted the attention of researchers. At present, the commonly used can be attributed to the rule-based classification and statistical machine learning classification methods. Although micro-blog sentiment analysis has made good progress, there still exist some shortcomings such as not highly accurate enough and strong dependence from sentiment classification effect. Aiming at the characteristics of Chinese short texts, such as less information, sparse features, and diverse expressions, this paper considers expanding the original text by mining related semantic information from the reviews, forwarding and other related information. First, this paper uses Word2vec to compute word similarity to extend the feature words. And then uses an ensemble classifier composed of SVM, KNN and HMM to analyze the emotion of the short text of micro-blog. The experimental results show that the proposed method can make good use of the comment forwarding information to extend the original features. Compared with the traditional method, the accuracy, recall and F1 value obtained by this method have been improved.

  10. A New Feature Ensemble with a Multistage Classification Scheme for Breast Cancer Diagnosis

    Directory of Open Access Journals (Sweden)

    Idil Isikli Esener

    2017-01-01

    Full Text Available A new and effective feature ensemble with a multistage classification is proposed to be implemented in a computer-aided diagnosis (CAD system for breast cancer diagnosis. A publicly available mammogram image dataset collected during the Image Retrieval in Medical Applications (IRMA project is utilized to verify the suggested feature ensemble and multistage classification. In achieving the CAD system, feature extraction is performed on the mammogram region of interest (ROI images which are preprocessed by applying a histogram equalization followed by a nonlocal means filtering. The proposed feature ensemble is formed by concatenating the local configuration pattern-based, statistical, and frequency domain features. The classification process of these features is implemented in three cases: a one-stage study, a two-stage study, and a three-stage study. Eight well-known classifiers are used in all cases of this multistage classification scheme. Additionally, the results of the classifiers that provide the top three performances are combined via a majority voting technique to improve the recognition accuracy on both two- and three-stage studies. A maximum of 85.47%, 88.79%, and 93.52% classification accuracies are attained by the one-, two-, and three-stage studies, respectively. The proposed multistage classification scheme is more effective than the single-stage classification for breast cancer diagnosis.

  11. Acute leukemia classification by ensemble particle swarm model selection.

    Science.gov (United States)

    Escalante, Hugo Jair; Montes-y-Gómez, Manuel; González, Jesús A; Gómez-Gil, Pilar; Altamirano, Leopoldo; Reyes, Carlos A; Reta, Carolina; Rosales, Alejandro

    2012-07-01

    Acute leukemia is a malignant disease that affects a large proportion of the world population. Different types and subtypes of acute leukemia require different treatments. In order to assign the correct treatment, a physician must identify the leukemia type or subtype. Advanced and precise methods are available for identifying leukemia types, but they are very expensive and not available in most hospitals in developing countries. Thus, alternative methods have been proposed. An option explored in this paper is based on the morphological properties of bone marrow images, where features are extracted from medical images and standard machine learning techniques are used to build leukemia type classifiers. This paper studies the use of ensemble particle swarm model selection (EPSMS), which is an automated tool for the selection of classification models, in the context of acute leukemia classification. EPSMS is the application of particle swarm optimization to the exploration of the search space of ensembles that can be formed by heterogeneous classification models in a machine learning toolbox. EPSMS does not require prior domain knowledge and it is able to select highly accurate classification models without user intervention. Furthermore, specific models can be used for different classification tasks. We report experimental results for acute leukemia classification with real data and show that EPSMS outperformed the best results obtained using manually designed classifiers with the same data. The highest performance using EPSMS was of 97.68% for two-type classification problems and of 94.21% for more than two types problems. To the best of our knowledge, these are the best results reported for this data set. Compared with previous studies, these improvements were consistent among different type/subtype classification tasks, different features extracted from images, and different feature extraction regions. The performance improvements were statistically significant

  12. The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review

    Directory of Open Access Journals (Sweden)

    Gulshan Kumar

    2012-01-01

    Full Text Available In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different ensembles by considering different combination methods, training datasets, base classifiers, and many other factors. Artificial-intelligence-(AI- based techniques play prominent role in development of ensemble for intrusion detection (ID and have many benefits over other techniques. However, there is no comprehensive review of ensembles in general and AI-based ensembles for ID to examine and understand their current research status to solve the ID problem. Here, an updated review of ensembles and their taxonomies has been presented in general. The paper also presents the updated review of various AI-based ensembles for ID (in particular during last decade. The related studies of AI-based ensembles are compared by set of evaluation metrics driven from (1 architecture & approach followed; (2 different methods utilized in different phases of ensemble learning; (3 other measures used to evaluate classification performance of the ensembles. The paper also provides the future directions of the research in this area. The paper will help the better understanding of different directions in which research of ensembles has been done in general and specifically: field of intrusion detection systems (IDSs.

  13. Evaluating an ensemble classification approach for crop diversityverification in Danish greening subsidy control

    DEFF Research Database (Denmark)

    Chellasamy, Menaka; Ferre, Ty; Greve, Mogens Humlekrog

    2016-01-01

    Beginning in 2015, Danish farmers are obliged to meet specific crop diversification rules based on total land area and number of crops cultivated to be eligible for new greening subsidies. Hence, there is a need for the Danish government to extend their subsidy control system to verify farmers......’ declarations to war-rant greening payments under the new crop diversification rules. Remote Sensing (RS) technology has been used since 1992 to control farmers’ subsidies in Denmark. However, a proper RS-based approach is yet to be finalised to validate new crop diversity requirements designed for assessing...... compliance under the recent subsidy scheme (2014–2020); This study uses an ensemble classification approach(proposed by the authors in previous studies) for validating the crop diversity requirements of the new rules. The approach uses a neural network ensemble classification system with bi-temporal (spring...

  14. Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

    Science.gov (United States)

    Sørensen, Lauge; Nielsen, Mads

    2018-05-15

    The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. An ensemble classification approach for improved Land use/cover change detection

    Science.gov (United States)

    Chellasamy, M.; Ferré, T. P. A.; Humlekrog Greve, M.; Larsen, R.; Chinnasamy, U.

    2014-11-01

    Change Detection (CD) methods based on post-classification comparison approaches are claimed to provide potentially reliable results. They are considered to be most obvious quantitative method in the analysis of Land Use Land Cover (LULC) changes which provides from - to change information. But, the performance of post-classification comparison approaches highly depends on the accuracy of classification of individual images used for comparison. Hence, we present a classification approach that produce accurate classified results which aids to obtain improved change detection results. Machine learning is a part of broader framework in change detection, where neural networks have drawn much attention. Neural network algorithms adaptively estimate continuous functions from input data without mathematical representation of output dependence on input. A common practice for classification is to use Multi-Layer-Perceptron (MLP) neural network with backpropogation learning algorithm for prediction. To increase the ability of learning and prediction, multiple inputs (spectral, texture, topography, and multi-temporal information) are generally stacked to incorporate diversity of information. On the other hand literatures claims backpropagation algorithm to exhibit weak and unstable learning in use of multiple inputs, while dealing with complex datasets characterized by mixed uncertainty levels. To address the problem of learning complex information, we propose an ensemble classification technique that incorporates multiple inputs for classification unlike traditional stacking of multiple input data. In this paper, we present an Endorsement Theory based ensemble classification that integrates multiple information, in terms of prediction probabilities, to produce final classification results. Three different input datasets are used in this study: spectral, texture and indices, from SPOT-4 multispectral imagery captured on 1998 and 2003. Each SPOT image is classified

  16. Polarimetric SAR Image Classification Using Multiple-feature Fusion and Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Sun Xun

    2016-12-01

    Full Text Available In this paper, we propose a supervised classification algorithm for Polarimetric Synthetic Aperture Radar (PolSAR images using multiple-feature fusion and ensemble learning. First, we extract different polarimetric features, including extended polarimetric feature space, Hoekman, Huynen, H/alpha/A, and fourcomponent scattering features of PolSAR images. Next, we randomly select two types of features each time from all feature sets to guarantee the reliability and diversity of later ensembles and use a support vector machine as the basic classifier for predicting classification results. Finally, we concatenate all prediction probabilities of basic classifiers as the final feature representation and employ the random forest method to obtain final classification results. Experimental results at the pixel and region levels show the effectiveness of the proposed algorithm.

  17. A hybrid ensemble learning approach to star-galaxy classification

    Science.gov (United States)

    Kim, Edward J.; Brunner, Robert J.; Carrasco Kind, Matias

    2015-10-01

    There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learning method based on random forest, an unsupervised machine learning method based on self-organizing maps, and a hierarchical Bayesian template-fitting method. Using data from the CFHTLenS survey (Canada-France-Hawaii Telescope Lensing Survey), we consider different scenarios: when a high-quality training set is available with spectroscopic labels from DEEP2 (Deep Extragalactic Evolutionary Probe Phase 2 ), SDSS (Sloan Digital Sky Survey), VIPERS (VIMOS Public Extragalactic Redshift Survey), and VVDS (VIMOS VLT Deep Survey), and when the demographics of sources in a low-quality training set do not match the demographics of objects in the test data set. We demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method in these scenarios. Thus, strategies that combine the predictions of different classifiers may prove to be optimal in currently ongoing and forthcoming photometric surveys, such as the Dark Energy Survey and the Large Synoptic Survey Telescope.

  18. Constructing Support Vector Machine Ensembles for Cancer Classification Based on Proteomic Profiling

    Institute of Scientific and Technical Information of China (English)

    Yong Mao; Xiao-Bo Zhou; Dao-Ying Pi; You-Xian Sun

    2005-01-01

    In this study, we present a constructive algorithm for training cooperative support vector machine ensembles (CSVMEs). CSVME combines ensemble architecture design with cooperative training for individual SVMs in ensembles. Unlike most previous studies on training ensembles, CSVME puts emphasis on both accuracy and collaboration among individual SVMs in an ensemble. A group of SVMs selected on the basis of recursive classifier elimination is used in CSVME, and the number of the individual SVMs selected to construct CSVME is determined by 10-fold cross-validation. This kind of SVME has been tested on two ovarian cancer datasets previously obtained by proteomic mass spectrometry. By combining several individual SVMs, the proposed method achieves better performance than the SVME of all base SVMs.

  19. Village Building Identification Based on Ensemble Convolutional Neural Networks

    Science.gov (United States)

    Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei

    2017-01-01

    In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154

  20. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    Directory of Open Access Journals (Sweden)

    Ashlock Daniel

    2009-08-01

    Full Text Available Abstract Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

  1. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

    Science.gov (United States)

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-08-22

    Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

  2. Global Optimization Ensemble Model for Classification Methods

    Science.gov (United States)

    Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

    2014-01-01

    Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382

  3. Global Optimization Ensemble Model for Classification Methods

    Directory of Open Access Journals (Sweden)

    Hina Anwar

    2014-01-01

    Full Text Available Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity.

  4. Ensemble based system for whole-slide prostate cancer probability mapping using color texture features.

    LENUS (Irish Health Repository)

    DiFranco, Matthew D

    2011-01-01

    We present a tile-based approach for producing clinically relevant probability maps of prostatic carcinoma in histological sections from radical prostatectomy. Our methodology incorporates ensemble learning for feature selection and classification on expert-annotated images. Random forest feature selection performed over varying training sets provides a subset of generalized CIEL*a*b* co-occurrence texture features, while sample selection strategies with minimal constraints reduce training data requirements to achieve reliable results. Ensembles of classifiers are built using expert-annotated tiles from training images, and scores for the probability of cancer presence are calculated from the responses of each classifier in the ensemble. Spatial filtering of tile-based texture features prior to classification results in increased heat-map coherence as well as AUC values of 95% using ensembles of either random forests or support vector machines. Our approach is designed for adaptation to different imaging modalities, image features, and histological decision domains.

  5. An Improved Ensemble of Random Vector Functional Link Networks Based on Particle Swarm Optimization with Double Optimization Strategy.

    Science.gov (United States)

    Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang

    2016-01-01

    For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.

  6. A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

    Directory of Open Access Journals (Sweden)

    Yongjun Piao

    2015-01-01

    Full Text Available Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.

  7. A target recognition method for maritime surveillance radars based on hybrid ensemble selection

    Science.gov (United States)

    Fan, Xueman; Hu, Shengliang; He, Jingbo

    2017-11-01

    In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.

  8. A deep learning-based multi-model ensemble method for cancer prediction.

    Science.gov (United States)

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Ensemble Classification of Alzheimer's Disease and Mild Cognitive Impairment Based on Complex Graph Measures from Diffusion Tensor Images

    Science.gov (United States)

    Ebadi, Ashkan; Dalboni da Rocha, Josué L.; Nagaraju, Dushyanth B.; Tovar-Moll, Fernanda; Bramati, Ivanei; Coutinho, Gabriel; Sitaram, Ranganatha; Rashidi, Parisa

    2017-01-01

    The human brain is a complex network of interacting regions. The gray matter regions of brain are interconnected by white matter tracts, together forming one integrative complex network. In this article, we report our investigation about the potential of applying brain connectivity patterns as an aid in diagnosing Alzheimer's disease and Mild Cognitive Impairment (MCI). We performed pattern analysis of graph theoretical measures derived from Diffusion Tensor Imaging (DTI) data representing structural brain networks of 45 subjects, consisting of 15 patients of Alzheimer's disease (AD), 15 patients of MCI, and 15 healthy subjects (CT). We considered pair-wise class combinations of subjects, defining three separate classification tasks, i.e., AD-CT, AD-MCI, and CT-MCI, and used an ensemble classification module to perform the classification tasks. Our ensemble framework with feature selection shows a promising performance with classification accuracy of 83.3% for AD vs. MCI, 80% for AD vs. CT, and 70% for MCI vs. CT. Moreover, our findings suggest that AD can be related to graph measures abnormalities at Brodmann areas in the sensorimotor cortex and piriform cortex. In this way, node redundancy coefficient and load centrality in the primary motor cortex were recognized as good indicators of AD in contrast to MCI. In general, load centrality, betweenness centrality, and closeness centrality were found to be the most relevant network measures, as they were the top identified features at different nodes. The present study can be regarded as a “proof of concept” about a procedure for the classification of MRI markers between AD dementia, MCI, and normal old individuals, due to the small and not well-defined groups of AD and MCI patients. Future studies with larger samples of subjects and more sophisticated patient exclusion criteria are necessary toward the development of a more precise technique for clinical diagnosis. PMID:28293162

  10. Structure D'Ensemble, Multiple Classification, Multiple Seriation and Amount of Irrelevant Information

    Science.gov (United States)

    Hamel, B. Remmo; Van Der Veer, M. A. A.

    1972-01-01

    A significant positive correlation between multiple classification was found, in testing 65 children aged 6 to 8 years, at the stage of concrete operations. This is interpreted as support for the existence of a structure d'ensemble of operational schemes in the period of concrete operations. (Authors)

  11. IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.

    Science.gov (United States)

    Ali, Safdar; Majid, Abdul; Khan, Asifullah

    2014-04-01

    Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our

  12. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers

    Directory of Open Access Journals (Sweden)

    Zhibin Xiao

    2017-02-01

    Full Text Available Recognition of transportation modes can be used in different applications including human behavior research, transport management and traffic control. Previous work on transportation mode recognition has often relied on using multiple sensors or matching Geographic Information System (GIS information, which is not possible in many cases. In this paper, an approach based on ensemble learning is proposed to infer hybrid transportation modes using only Global Position System (GPS data. First, in order to distinguish between different transportation modes, we used a statistical method to generate global features and extract several local features from sub-trajectories after trajectory segmentation, before these features were combined in the classification stage. Second, to obtain a better performance, we used tree-based ensemble models (Random Forest, Gradient Boosting Decision Tree, and XGBoost instead of traditional methods (K-Nearest Neighbor, Decision Tree, and Support Vector Machines to classify the different transportation modes. The experiment results on the later have shown the efficacy of our proposed approach. Among them, the XGBoost model produced the best performance with a classification accuracy of 90.77% obtained on the GEOLIFE dataset, and we used a tree-based ensemble method to ensure accurate feature selection to reduce the model complexity.

  13. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Wong G William

    2008-06-01

    Full Text Available Abstract Background Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. Results This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost. We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. Conclusion In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection

  14. Filter Bank Regularized Common Spatial Pattern Ensemble for Small Sample Motor Imagery Classification.

    Science.gov (United States)

    Park, Sang-Hoon; Lee, David; Lee, Sang-Goog

    2018-02-01

    For the last few years, many feature extraction methods have been proposed based on biological signals. Among these, the brain signals have the advantage that they can be obtained, even by people with peripheral nervous system damage. Motor imagery electroencephalograms (EEG) are inexpensive to measure, offer a high temporal resolution, and are intuitive. Therefore, these have received a significant amount of attention in various fields, including signal processing, cognitive science, and medicine. The common spatial pattern (CSP) algorithm is a useful method for feature extraction from motor imagery EEG. However, performance degradation occurs in a small-sample setting (SSS), because the CSP depends on sample-based covariance. Since the active frequency range is different for each subject, it is also inconvenient to set the frequency range to be different every time. In this paper, we propose the feature extraction method based on a filter bank to solve these problems. The proposed method consists of five steps. First, motor imagery EEG is divided by a using filter bank. Second, the regularized CSP (R-CSP) is applied to the divided EEG. Third, we select the features according to mutual information based on the individual feature algorithm. Fourth, parameter sets are selected for the ensemble. Finally, we classify using ensemble based on features. The brain-computer interface competition III data set IVa is used to evaluate the performance of the proposed method. The proposed method improves the mean classification accuracy by 12.34%, 11.57%, 9%, 4.95%, and 4.47% compared with CSP, SR-CSP, R-CSP, filter bank CSP (FBCSP), and SR-FBCSP. Compared with the filter bank R-CSP ( , ), which is a parameter selection version of the proposed method, the classification accuracy is improved by 3.49%. In particular, the proposed method shows a large improvement in performance in the SSS.

  15. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    Directory of Open Access Journals (Sweden)

    Saadia Zahid

    2015-01-01

    Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

  16. Using support vector machine ensembles for target audience classification on Twitter.

    Science.gov (United States)

    Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

    2015-01-01

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.

  17. Using support vector machine ensembles for target audience classification on Twitter.

    Directory of Open Access Journals (Sweden)

    Siaw Ling Lo

    Full Text Available The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA. A Support Vector Machine (SVM ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.

  18. Emotion Recognition of Weblog Sentences Based on an Ensemble Algorithm of Multi-label Classification and Word Emotions

    Science.gov (United States)

    Li, Ji; Ren, Fuji

    Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.

  19. Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

    Directory of Open Access Journals (Sweden)

    Guofeng Wang

    2014-11-01

    Full Text Available Tool condition monitoring (TCM plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM, hidden Markov model (HMM and radius basis function (RBF are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.

  20. Evaluation of LDA Ensembles Classifiers for Brain Computer Interface

    International Nuclear Information System (INIS)

    Arjona, Cristian; Pentácolo, José; Gareis, Iván; Atum, Yanina; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

    2011-01-01

    The Brain Computer Interface (BCI) translates brain activity into computer commands. To increase the performance of the BCI, to decode the user intentions it is necessary to get better the feature extraction and classification techniques. In this article the performance of a three linear discriminant analysis (LDA) classifiers ensemble is studied. The system based on ensemble can theoretically achieved better classification results than the individual counterpart, regarding individual classifier generation algorithm and the procedures for combine their outputs. Classic algorithms based on ensembles such as bagging and boosting are discussed here. For the application on BCI, it was concluded that the generated results using ER and AUC as performance index do not give enough information to establish which configuration is better.

  1. Improved Classification by Non Iterative and Ensemble Classifiers in Motor Fault Diagnosis

    Directory of Open Access Journals (Sweden)

    PANIGRAHY, P. S.

    2018-02-01

    Full Text Available Data driven approach for multi-class fault diagnosis of induction motor using MCSA at steady state condition is a complex pattern classification problem. This investigation has exploited the built-in ensemble process of non-iterative classifiers to resolve the most challenging issues in this area, including bearing and stator fault detection. Non-iterative techniques exhibit with an average 15% of increased fault classification accuracy against their iterative counterparts. Particularly RF has shown outstanding performance even at less number of training samples and noisy feature space because of its distributive feature model. The robustness of the results, backed by the experimental verification shows that the non-iterative individual classifiers like RF is the optimum choice in the area of automatic fault diagnosis of induction motor.

  2. A Theoretical Analysis of Why Hybrid Ensembles Work

    Directory of Open Access Journals (Sweden)

    Kuo-Wei Hsu

    2017-01-01

    Full Text Available Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.

  3. Exploiting ensemble learning for automatic cataract detection and grading.

    Science.gov (United States)

    Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing

    2016-02-01

    Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Competitive Learning Neural Network Ensemble Weighted by Predicted Performance

    Science.gov (United States)

    Ye, Qiang

    2010-01-01

    Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…

  5. A comparative study of breast cancer diagnosis based on neural network ensemble via improved training algorithms.

    Science.gov (United States)

    Azami, Hamed; Escudero, Javier

    2015-08-01

    Breast cancer is one of the most common types of cancer in women all over the world. Early diagnosis of this kind of cancer can significantly increase the chances of long-term survival. Since diagnosis of breast cancer is a complex problem, neural network (NN) approaches have been used as a promising solution. Considering the low speed of the back-propagation (BP) algorithm to train a feed-forward NN, we consider a number of improved NN trainings for the Wisconsin breast cancer dataset: BP with momentum, BP with adaptive learning rate, BP with adaptive learning rate and momentum, Polak-Ribikre conjugate gradient algorithm (CGA), Fletcher-Reeves CGA, Powell-Beale CGA, scaled CGA, resilient BP (RBP), one-step secant and quasi-Newton methods. An NN ensemble, which is a learning paradigm to combine a number of NN outputs, is used to improve the accuracy of the classification task. Results demonstrate that NN ensemble-based classification methods have better performance than NN-based algorithms. The highest overall average accuracy is 97.68% obtained by NN ensemble trained by RBP for 50%-50% training-test evaluation method.

  6. A multi-model ensemble approach to seabed mapping

    Science.gov (United States)

    Diesing, Markus; Stephens, David

    2015-06-01

    Seabed habitat mapping based on swath acoustic data and ground-truth samples is an emergent and active marine science discipline. Significant progress could be achieved by transferring techniques and approaches that have been successfully developed and employed in such fields as terrestrial land cover mapping. One such promising approach is the multiple classifier system, which aims at improving classification performance by combining the outputs of several classifiers. Here we present results of a multi-model ensemble applied to multibeam acoustic data covering more than 5000 km2 of seabed in the North Sea with the aim to derive accurate spatial predictions of seabed substrate. A suite of six machine learning classifiers (k-Nearest Neighbour, Support Vector Machine, Classification Tree, Random Forest, Neural Network and Naïve Bayes) was trained with ground-truth sample data classified into seabed substrate classes and their prediction accuracy was assessed with an independent set of samples. The three and five best performing models were combined to classifier ensembles. Both ensembles led to increased prediction accuracy as compared to the best performing single classifier. The improvements were however not statistically significant at the 5% level. Although the three-model ensemble did not perform significantly better than its individual component models, we noticed that the five-model ensemble did perform significantly better than three of the five component models. A classifier ensemble might therefore be an effective strategy to improve classification performance. Another advantage is the fact that the agreement in predicted substrate class between the individual models of the ensemble could be used as a measure of confidence. We propose a simple and spatially explicit measure of confidence that is based on model agreement and prediction accuracy.

  7. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Directory of Open Access Journals (Sweden)

    Yoonsik Shim

    2016-10-01

    Full Text Available We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP. The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  8. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Science.gov (United States)

    Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

    2016-10-01

    We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  9. Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.

    Science.gov (United States)

    Ali, Safdar; Majid, Abdul; Javed, Syed Gibran; Sattar, Mohsin

    2016-06-01

    Early prediction of breast cancer is important for effective treatment and survival. We developed an effective Cost-Sensitive Classifier with GentleBoost Ensemble (Can-CSC-GBE) for the classification of breast cancer using protein amino acid features. In this work, first, discriminant information of the protein sequences related to breast tissue is extracted. Then, the physicochemical properties hydrophobicity and hydrophilicity of amino acids are employed to generate molecule descriptors in different feature spaces. For comparison, we obtained results by combining Cost-Sensitive learning with conventional ensemble of AdaBoostM1 and Bagging. The proposed Can-CSC-GBE system has effectively reduced the misclassification costs and thereby improved the overall classification performance. Our novel approach has highlighted promising results as compared to the state-of-the-art ensemble approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    Science.gov (United States)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  11. Sequential ensemble-based optimal design for parameter estimation: SEQUENTIAL ENSEMBLE-BASED OPTIMAL DESIGN

    Energy Technology Data Exchange (ETDEWEB)

    Man, Jun [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Zhang, Jiangjiang [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Li, Weixuan [Pacific Northwest National Laboratory, Richland Washington USA; Zeng, Lingzao [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Wu, Laosheng [Department of Environmental Sciences, University of California, Riverside California USA

    2016-10-01

    The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees of freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.

  12. Ensemble-based Kalman Filters in Strongly Nonlinear Dynamics

    Institute of Scientific and Technical Information of China (English)

    Zhaoxia PU; Joshua HACKER

    2009-01-01

    This study examines the effectiveness of ensemble Kalman filters in data assimilation with the strongly nonlinear dynamics of the Lorenz-63 model, and in particular their use in predicting the regime transition that occurs when the model jumps from one basin of attraction to the other. Four configurations of the ensemble-based Kalman filtering data assimilation techniques, including the ensemble Kalman filter, ensemble adjustment Kalman filter, ensemble square root filter and ensemble transform Kalman filter, are evaluated with their ability in predicting the regime transition (also called phase transition) and also are compared in terms of their sensitivity to both observational and sampling errors. The sensitivity of each ensemble-based filter to the size of the ensemble is also examined.

  13. MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

    Science.gov (United States)

    Chen, Lei; Kamel, Mohamed S.

    2016-01-01

    In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

  14. Pattern Recognition of Momentary Mental Workload Based on Multi-Channel Electrophysiological Data and Ensemble Convolutional Neural Networks.

    Science.gov (United States)

    Zhang, Jianhua; Li, Sunan; Wang, Rubin

    2017-01-01

    In this paper, we deal with the Mental Workload (MWL) classification problem based on the measured physiological data. First we discussed the optimal depth (i.e., the number of hidden layers) and parameter optimization algorithms for the Convolutional Neural Networks (CNN). The base CNNs designed were tested according to five classification performance indices, namely Accuracy, Precision, F-measure, G-mean, and required training time. Then we developed an Ensemble Convolutional Neural Network (ECNN) to enhance the accuracy and robustness of the individual CNN model. For the ECNN design, three model aggregation approaches (weighted averaging, majority voting and stacking) were examined and a resampling strategy was used to enhance the diversity of individual CNN models. The results of MWL classification performance comparison indicated that the proposed ECNN framework can effectively improve MWL classification performance and is featured by entirely automatic feature extraction and MWL classification, when compared with traditional machine learning methods.

  15. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

    Science.gov (United States)

    Stanescu, Ana; Caragea, Doina

    2015-01-01

    Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.

  16. AUC-Maximizing Ensembles through Metalearning.

    Science.gov (United States)

    LeDell, Erin; van der Laan, Mark J; Petersen, Maya

    2016-05-01

    Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.

  17. Design ensemble machine learning model for breast cancer diagnosis.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Hsieh, Sung-Huai; Cheng, Po-Hsun; Chen, Chi-Huang; Hsu, Kai-Ping; Lee, I-Shun; Wang, Zhenyu; Lai, Feipei

    2012-10-01

    In this paper, we classify the breast cancer of medical diagnostic data. Information gain has been adapted for feature selections. Neural fuzzy (NF), k-nearest neighbor (KNN), quadratic classifier (QC), each single model scheme as well as their associated, ensemble ones have been developed for classifications. In addition, a combined ensemble model with these three schemes has been constructed for further validations. The experimental results indicate that the ensemble learning performs better than individual single ones. Moreover, the combined ensemble model illustrates the highest accuracy of classifications for the breast cancer among all models.

  18. Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

    Science.gov (United States)

    Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

    2018-03-27

    P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

  19. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    Science.gov (United States)

    S K, Somasundaram; P, Alli

    2017-11-09

    of DR screening system using Bagging Ensemble Classifier (BEC) is investigated. With the help of voting the process in ML-BEC, bagging minimizes the error due to variance of the base classifier. With the publicly available retinal image databases, our classifier is trained with 25% of RI. Results show that the ensemble classifier can achieve better classification accuracy (CA) than single classification models. Empirical experiments suggest that the machine learning-based ensemble classifier is efficient for further reducing DR classification time (CT).

  20. Reacting to different types of concept drift: the Accuracy Updated Ensemble algorithm.

    Science.gov (United States)

    Brzezinski, Dariusz; Stefanowski, Jerzy

    2014-01-01

    Data stream mining has been receiving increased attention due to its presence in a wide range of applications, such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream's underlying data distribution. Several classification algorithms that cope with concept drift have been put forward, however, most of them specialize in one type of change. In this paper, we propose a new data stream classifier, called the Accuracy Updated Ensemble (AUE2), which aims at reacting equally well to different types of drift. AUE2 combines accuracy-based weighting mechanisms known from block-based ensembles with the incremental nature of Hoeffding Trees. The proposed algorithm is experimentally compared with 11 state-of-the-art stream methods, including single classifiers, block-based and online ensembles, and hybrid approaches in different drift scenarios. Out of all the compared algorithms, AUE2 provided best average classification accuracy while proving to be less memory consuming than other ensemble approaches. Experimental results show that AUE2 can be considered suitable for scenarios, involving many types of drift as well as static environments.

  1. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    Science.gov (United States)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  2. Ensemble-Based Data Assimilation in Reservoir Characterization: A Review

    Directory of Open Access Journals (Sweden)

    Seungpil Jung

    2018-02-01

    Full Text Available This paper presents a review of ensemble-based data assimilation for strongly nonlinear problems on the characterization of heterogeneous reservoirs with different production histories. It concentrates on ensemble Kalman filter (EnKF and ensemble smoother (ES as representative frameworks, discusses their pros and cons, and investigates recent progress to overcome their drawbacks. The typical weaknesses of ensemble-based methods are non-Gaussian parameters, improper prior ensembles and finite population size. Three categorized approaches, to mitigate these limitations, are reviewed with recent accomplishments; improvement of Kalman gains, add-on of transformation functions, and independent evaluation of observed data. The data assimilation in heterogeneous reservoirs, applying the improved ensemble methods, is discussed on predicting unknown dynamic data in reservoir characterization.

  3. Random ensemble learning for EEG classification.

    Science.gov (United States)

    Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid

    2018-01-01

    Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. SVM and SVM Ensembles in Breast Cancer Prediction.

    Science.gov (United States)

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  5. SVM and SVM Ensembles in Breast Cancer Prediction.

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    Full Text Available Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  6. Credit scoring using ensemble of various classifiers on reduced feature set

    Directory of Open Access Journals (Sweden)

    Dahiya Shashi

    2015-01-01

    Full Text Available Credit scoring methods are widely used for evaluating loan applications in financial and banking institutions. Credit score identifies if applicant customers belong to good risk applicant group or a bad risk applicant group. These decisions are based on the demographic data of the customers, overall business by the customer with bank, and loan payment history of the loan applicants. The advantages of using credit scoring models include reducing the cost of credit analysis, enabling faster credit decisions and diminishing possible risk. Many statistical and machine learning techniques such as Logistic Regression, Support Vector Machines, Neural Networks and Decision tree algorithms have been used independently and as hybrid credit scoring models. This paper proposes an ensemble based technique combining seven individual models to increase the classification accuracy. Feature selection has also been used for selecting important attributes for classification. Cross classification was conducted using three data partitions. German credit dataset having 1000 instances and 21 attributes is used in the present study. The results of the experiments revealed that the ensemble model yielded a very good accuracy when compared to individual models. In all three different partitions, the ensemble model was able to classify more than 80% of the loan customers as good creditors correctly. Also, for 70:30 partition there was a good impact of feature selection on the accuracy of classifiers. The results were improved for almost all individual models including the ensemble model.

  7. Classification and localization of early-stage Alzheimer's disease in magnetic resonance images using a patch-based classifier ensemble

    International Nuclear Information System (INIS)

    Simoes, Rita; Slump, Cornelis H.; Cappellen van Walsum, Anne-Marie van

    2014-01-01

    Classification methods have been proposed to detect Alzheimer's disease (AD) using magnetic resonance images. Most rely on features such as the shape/volume of brain structures that need to be defined a priori. In this work, we propose a method that does not require either the segmentation of specific brain regions or the nonlinear alignment to a template. Besides classification, we also analyze which brain regions are discriminative between a group of normal controls and a group of AD patients. We perform 3D texture analysis using Local Binary Patterns computed at local image patches in the whole brain, combined in a classifier ensemble. We evaluate our method in a publicly available database including very mild-to-mild AD subjects and healthy elderly controls. For the subject cohort including only mild AD subjects, the best results are obtained using a combination of large (30 x 30 x 30 and 40 x 40 x 40 voxels) patches. A spatial analysis on the best performing patches shows that these are located in the medial-temporal lobe and in the periventricular regions. When very mild AD subjects are included in the dataset, the small (10 x 10 x 10 voxels) patches perform best, with the most discriminative ones being located near the left hippocampus. We show that our method is able not only to perform accurate classification, but also to localize discriminative brain regions, which are in accordance with the medical literature. This is achieved without the need to segment-specific brain structures and without performing nonlinear registration to a template, indicating that the method may be suitable for a clinical implementation that can help to diagnose AD at an earlier stage.

  8. One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values

    Directory of Open Access Journals (Sweden)

    Jin Xiao

    2014-01-01

    Full Text Available Scientific customer value segmentation (CVS is the base of efficient customer relationship management, and customer credit scoring, fraud detection, and churn prediction all belong to CVS. In real CVS, the customer data usually include lots of missing values, which may affect the performance of CVS model greatly. This study proposes a one-step dynamic classifier ensemble model for missing values (ODCEM model. On the one hand, ODCEM integrates the preprocess of missing values and the classification modeling into one step; on the other hand, it utilizes multiple classifiers ensemble technology in constructing the classification models. The empirical results in credit scoring dataset “German” from UCI and the real customer churn prediction dataset “China churn” show that the ODCEM outperforms four commonly used “two-step” models and the ensemble based model LMF and can provide better decision support for market managers.

  9. Decimated Input Ensembles for Improved Generalization

    Science.gov (United States)

    Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)

    1999-01-01

    Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.

  10. An ensemble classifier to predict track geometry degradation

    International Nuclear Information System (INIS)

    Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

    2017-01-01

    Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.

  11. Discrimination between authentic and adulterated liquors by near-infrared spectroscopy and ensemble classification

    Science.gov (United States)

    Chen, Hui; Tan, Chao; Wu, Tong; Wang, Li; Zhu, Wanping

    2014-09-01

    Chinese liquor is one of the famous distilled spirits and counterfeit liquor is becoming a serious problem in the market. Especially, age liquor is facing the crisis of confidence because it is difficult for consumer to identify the marked age, which prompts unscrupulous traders to pose off low-grade liquors as high-grade liquors. An ideal method for authenticity confirmation of liquors should be non-invasive, non-destructive and timely. The combination of near-infrared spectroscopy with chemometrics proves to be a good way to reach these premises. A new strategy is proposed for classification and verification of the adulteration of liquors by using NIR spectroscopy and chemometric classification, i.e., ensemble support vector machines (SVM). Three measures, i.e., accuracy, sensitivity and specificity were used for performance evaluation. The results confirmed that the strategy can serve as a screening tool applied to verify adulteration of the liquor, that is, a prior step used to condition the sample to a deeper analysis only when a positive result for adulteration is obtained by the proposed methodology.

  12. Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models.

    Directory of Open Access Journals (Sweden)

    Nikola Simidjievski

    Full Text Available Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting, significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.

  13. Neural network ensemble based CAD system for focal liver lesions from B-mode ultrasound.

    Science.gov (United States)

    Virmani, Jitendra; Kumar, Vinod; Kalra, Naveen; Khandelwal, Niranjan

    2014-08-01

    A neural network ensemble (NNE) based computer-aided diagnostic (CAD) system to assist radiologists in differential diagnosis between focal liver lesions (FLLs), including (1) typical and atypical cases of Cyst, hemangioma (HEM) and metastatic carcinoma (MET) lesions, (2) small and large hepatocellular carcinoma (HCC) lesions, along with (3) normal (NOR) liver tissue is proposed in the present work. Expert radiologists, visualize the textural characteristics of regions inside and outside the lesions to differentiate between different FLLs, accordingly texture features computed from inside lesion regions of interest (IROIs) and texture ratio features computed from IROIs and surrounding lesion regions of interests (SROIs) are taken as input. Principal component analysis (PCA) is used for reducing the dimensionality of the feature space before classifier design. The first step of classification module consists of a five class PCA-NN based primary classifier which yields probability outputs for five liver image classes. The second step of classification module consists of ten binary PCA-NN based secondary classifiers for NOR/Cyst, NOR/HEM, NOR/HCC, NOR/MET, Cyst/HEM, Cyst/HCC, Cyst/MET, HEM/HCC, HEM/MET and HCC/MET classes. The probability outputs of five class PCA-NN based primary classifier is used to determine the first two most probable classes for a test instance, based on which it is directed to the corresponding binary PCA-NN based secondary classifier for crisp classification between two classes. By including the second step of the classification module, classification accuracy increases from 88.7 % to 95 %. The promising results obtained by the proposed system indicate its usefulness to assist radiologists in differential diagnosis of FLLs.

  14. Efficient Kernel-Based Ensemble Gaussian Mixture Filtering

    KAUST Repository

    Liu, Bo

    2015-11-11

    We consider the Bayesian filtering problem for data assimilation following the kernel-based ensemble Gaussian-mixture filtering (EnGMF) approach introduced by Anderson and Anderson (1999). In this approach, the posterior distribution of the system state is propagated with the model using the ensemble Monte Carlo method, providing a forecast ensemble that is then used to construct a prior Gaussian-mixture (GM) based on the kernel density estimator. This results in two update steps: a Kalman filter (KF)-like update of the ensemble members and a particle filter (PF)-like update of the weights, followed by a resampling step to start a new forecast cycle. After formulating EnGMF for any observational operator, we analyze the influence of the bandwidth parameter of the kernel function on the covariance of the posterior distribution. We then focus on two aspects: i) the efficient implementation of EnGMF with (relatively) small ensembles, where we propose a new deterministic resampling strategy preserving the first two moments of the posterior GM to limit the sampling error; and ii) the analysis of the effect of the bandwidth parameter on contributions of KF and PF updates and on the weights variance. Numerical results using the Lorenz-96 model are presented to assess the behavior of EnGMF with deterministic resampling, study its sensitivity to different parameters and settings, and evaluate its performance against ensemble KFs. The proposed EnGMF approach with deterministic resampling suggests improved estimates in all tested scenarios, and is shown to require less localization and to be less sensitive to the choice of filtering parameters.

  15. A class of energy-based ensembles in Tsallis statistics

    International Nuclear Information System (INIS)

    Chandrashekar, R; Naina Mohammed, S S

    2011-01-01

    A comprehensive investigation is carried out on the class of energy-based ensembles. The eight ensembles are divided into two main classes. In the isothermal class of ensembles the individual members are at the same temperature. A unified framework is evolved to describe the four isothermal ensembles using the currently accepted third constraint formalism. The isothermal–isobaric, grand canonical and generalized ensembles are illustrated through a study of the classical nonrelativistic and extreme relativistic ideal gas models. An exact calculation is possible only in the case of the isothermal–isobaric ensemble. The study of the ideal gas models in the grand canonical and the generalized ensembles has been carried out using a perturbative procedure with the nonextensivity parameter (1 − q) as the expansion parameter. Though all the thermodynamic quantities have been computed up to a particular order in (1 − q) the procedure can be extended up to any arbitrary order in the expansion parameter. In the adiabatic class of ensembles the individual members of the ensemble have the same value of the heat function and a unified formulation to described all four ensembles is given. The nonrelativistic and the extreme relativistic ideal gases are studied in the isoenthalpic–isobaric ensemble, the adiabatic ensemble with number fluctuations and the adiabatic ensemble with number and particle fluctuations

  16. Neural Network Ensembles

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Salamon, Peter

    1990-01-01

    We propose several means for improving the performance an training of neural networks for classification. We use crossvalidation as a tool for optimizing network parameters and architecture. We show further that the remaining generalization error can be reduced by invoking ensembles of similar...... networks....

  17. Visualization and classification of physiological failure modes in ensemble hemorrhage simulation

    Science.gov (United States)

    Zhang, Song; Pruett, William Andrew; Hester, Robert

    2015-01-01

    In an emergency situation such as hemorrhage, doctors need to predict which patients need immediate treatment and care. This task is difficult because of the diverse response to hemorrhage in human population. Ensemble physiological simulations provide a means to sample a diverse range of subjects and may have a better chance of containing the correct solution. However, to reveal the patterns and trends from the ensemble simulation is a challenging task. We have developed a visualization framework for ensemble physiological simulations. The visualization helps users identify trends among ensemble members, classify ensemble member into subpopulations for analysis, and provide prediction to future events by matching a new patient's data to existing ensembles. We demonstrated the effectiveness of the visualization on simulated physiological data. The lessons learned here can be applied to clinically-collected physiological data in the future.

  18. Flood Forecasting Based on TIGGE Precipitation Ensemble Forecast

    Directory of Open Access Journals (Sweden)

    Jinyin Ye

    2016-01-01

    Full Text Available TIGGE (THORPEX International Grand Global Ensemble was a major part of the THORPEX (Observing System Research and Predictability Experiment. It integrates ensemble precipitation products from all the major forecast centers in the world and provides systematic evaluation on the multimodel ensemble prediction system. Development of meteorologic-hydrologic coupled flood forecasting model and early warning model based on the TIGGE precipitation ensemble forecast can provide flood probability forecast, extend the lead time of the flood forecast, and gain more time for decision-makers to make the right decision. In this study, precipitation ensemble forecast products from ECMWF, NCEP, and CMA are used to drive distributed hydrologic model TOPX. We focus on Yi River catchment and aim to build a flood forecast and early warning system. The results show that the meteorologic-hydrologic coupled model can satisfactorily predict the flow-process of four flood events. The predicted occurrence time of peak discharges is close to the observations. However, the magnitude of the peak discharges is significantly different due to various performances of the ensemble prediction systems. The coupled forecasting model can accurately predict occurrence of the peak time and the corresponding risk probability of peak discharge based on the probability distribution of peak time and flood warning, which can provide users a strong theoretical foundation and valuable information as a promising new approach.

  19. Comparing writing style feature-based classification methods for estimating user reputations in social media.

    Science.gov (United States)

    Suh, Jong Hwan

    2016-01-01

    In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.

  20. An Ensemble of Classifiers based Approach for Prediction of Alzheimer's Disease using fMRI Images based on Fusion of Volumetric, Textural and Hemodynamic Features

    Directory of Open Access Journals (Sweden)

    MALIK, F.

    2018-02-01

    Full Text Available Alzheimer's is a neurodegenerative disease caused by the destruction and death of brain neurons resulting in memory loss, impaired thinking ability, and in certain behavioral changes. Alzheimer disease is a major cause of dementia and eventually death all around the world. Early diagnosis of the disease is crucial which can help the victims to maintain their level of independence for comparatively longer time and live a best life possible. For early detection of Alzheimer's disease, we are proposing a novel approach based on fusion of multiple types of features including hemodynamic, volumetric and textural features of the brain. Our approach uses non-invasive fMRI with ensemble of classifiers, for the classification of the normal controls and the Alzheimer patients. For performance evaluation, ten-fold cross validation is used. Individual feature sets and fusion of features have been investigated with ensemble classifiers for successful classification of Alzheimer's patients from normal controls. It is observed that fusion of features resulted in improved results for accuracy, specificity and sensitivity.

  1. Quantum ensembles of quantum classifiers.

    Science.gov (United States)

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  2. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

    Science.gov (United States)

    Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

    2018-04-11

    The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes

  3. A Discrete Wavelet Based Feature Extraction and Hybrid Classification Technique for Microarray Data Analysis

    Directory of Open Access Journals (Sweden)

    Jaison Bennet

    2014-01-01

    Full Text Available Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN, naive Bayes, and support vector machine (SVM. Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT and moving window technique (MWT is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.

  4. Detecting Malware with an Ensemble Method Based on Deep Neural Network

    Directory of Open Access Journals (Sweden)

    Jinpei Yan

    2018-01-01

    Full Text Available Malware detection plays a crucial role in computer security. Recent researches mainly use machine learning based methods heavily relying on domain knowledge for manually extracting malicious features. In this paper, we propose MalNet, a novel malware detection method that learns features automatically from the raw data. Concretely, we first generate a grayscale image from malware file, meanwhile extracting its opcode sequences with the decompilation tool IDA. Then MalNet uses CNN and LSTM networks to learn from grayscale image and opcode sequence, respectively, and takes a stacking ensemble for malware classification. We perform experiments on more than 40,000 samples including 20,650 benign files collected from online software providers and 21,736 malwares provided by Microsoft. The evaluation result shows that MalNet achieves 99.88% validation accuracy for malware detection. In addition, we also take malware family classification experiment on 9 malware families to compare MalNet with other related works, in which MalNet outperforms most of related works with 99.36% detection accuracy and achieves a considerable speed-up on detecting efficiency comparing with two state-of-the-art results on Microsoft malware dataset.

  5. A study of fuzzy logic ensemble system performance on face recognition problem

    Science.gov (United States)

    Polyakova, A.; Lipinskiy, L.

    2017-02-01

    Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.

  6. Cluster Ensemble-Based Image Segmentation

    Directory of Open Access Journals (Sweden)

    Xiaoru Wang

    2013-07-01

    Full Text Available Image segmentation is the foundation of computer vision applications. In this paper, we propose a new cluster ensemble-based image segmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories of images. Second, we exploit the PageRank idea from Internet applications and apply it to the image segmentation task. This can improve the final segmentation results by combining the spatial information of the image and the semantic similarity of regions. Our experiments on four public image databases validate the superiority of our algorithm over conventional single type of feature or multiple types of features-based algorithms, since our algorithm can fuse multiple types of features effectively for better segmentation results. Moreover, our method is also proved to be very competitive in comparison with other state-of-the-art segmentation algorithms.

  7. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data

    Directory of Open Access Journals (Sweden)

    Connie Ko

    2016-08-01

    Full Text Available Recent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI is proposed that characterizes the diversity of data quality (Qi among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple and the validation samples contains four labels (pine; poplar; maple; and “unknown”. Classification accuracy improved from 72.8%; when training samples were

  8. HIPPI: highly accurate protein family classification with ensembles of HMMs

    Directory of Open Access Journals (Sweden)

    Nam-phuong Nguyen

    2016-11-01

    Full Text Available Abstract Background Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. Results We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification. HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. Conclusion HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  9. Ensemble of classifiers based network intrusion detection system performance bound

    CSIR Research Space (South Africa)

    Mkuzangwe, Nenekazi NP

    2017-11-01

    Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...

  10. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    Energy Technology Data Exchange (ETDEWEB)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    2014-03-15

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  11. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    International Nuclear Information System (INIS)

    Singh, Kunwar P.; Gupta, Shikha

    2014-01-01

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R 2 ) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R 2 and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  12. Multiple kernel boosting framework based on information measure for classification

    International Nuclear Information System (INIS)

    Qi, Chengming; Wang, Yuping; Tian, Wenjie; Wang, Qun

    2016-01-01

    The performance of kernel-based method, such as support vector machine (SVM), is greatly affected by the choice of kernel function. Multiple kernel learning (MKL) is a promising family of machine learning algorithms and has attracted many attentions in recent years. MKL combines multiple sub-kernels to seek better results compared to single kernel learning. In order to improve the efficiency of SVM and MKL, in this paper, the Kullback–Leibler kernel function is derived to develop SVM. The proposed method employs an improved ensemble learning framework, named KLMKB, which applies Adaboost to learning multiple kernel-based classifier. In the experiment for hyperspectral remote sensing image classification, we employ feature selected through Optional Index Factor (OIF) to classify the satellite image. We extensively examine the performance of our approach in comparison to some relevant and state-of-the-art algorithms on a number of benchmark classification data sets and hyperspectral remote sensing image data set. Experimental results show that our method has a stable behavior and a noticeable accuracy for different data set.

  13. Ensemble Methods

    Science.gov (United States)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    proposed to explain the characteristics and the successful application of ensembles to different application domains. For instance, Allwein, Schapire, and Singer interpreted the improved generalization capabilities of ensembles of learning machines in the framework of large margin classifiers [4,177], Kleinberg in the context of stochastic discrimination theory [112], and Breiman and Friedman in the light of the bias-variance analysis borrowed from classical statistics [21,70]. Empirical studies showed that both in classification and regression problems, ensembles improve on single learning machines, and moreover large experimental studies compared the effectiveness of different ensemble methods on benchmark data sets [10,11,49,188]. The interest in this research area is motivated also by the availability of very fast computers and networks of workstations at a relatively low cost that allow the implementation and the experimentation of complex ensemble methods using off-the-shelf computer platforms. However, as explained in Section 26.2 there are deeper reasons to use ensembles of learning machines, motivated by the intrinsic characteristics of the ensemble methods. The main aim of this chapter is to introduce ensemble methods and to provide an overview and a bibliography of the main areas of research, without pretending to be exhaustive or to explain the detailed characteristics of each ensemble method. The paper is organized as follows. In the next section, the main theoretical and practical reasons for combining multiple learners are introduced. Section 26.3 depicts the main taxonomies on ensemble methods proposed in the literature. In Section 26.4 and 26.5, we present an overview of the main supervised ensemble methods reported in the literature, adopting a simple taxonomy, originally proposed in Ref. [201]. Applications of ensemble methods are only marginally considered, but a specific section on some relevant applications of ensemble methods in astronomy and

  14. adabag: An R Package for Classification with Boosting and Bagging

    Directory of Open Access Journals (Sweden)

    Esteban Alfaro

    2013-09-01

    Full Text Available Boosting and bagging are two widely used ensemble methods for classification. Their common goal is to improve the accuracy of a classifier combining single classifiers which are slightly better than random guessing. Among the family of boosting algorithms, AdaBoost (adaptive boosting is the best known, although it is suitable only for dichotomous tasks. AdaBoost.M1 and SAMME (stagewise additive modeling using a multi-class exponential loss function are two easy and natural extensions to the general case of two or more classes. In this paper, the adabag R package is introduced. This version implements AdaBoost.M1, SAMME and bagging algorithms with classification trees as base classifiers. Once the ensembles have been trained, they can be used to predict the class of new samples. The accuracy of these classifiers can be estimated in a separated data set or through cross validation. Moreover, the evolution of the error as the ensemble grows can be analysed and the ensemble can be pruned. In addition, the margin in the class prediction and the probability of each class for the observations can be calculated. Finally, several classic examples in classification literature are shown to illustrate the use of this package.

  15. Detection of eardrum abnormalities using ensemble deep learning approaches

    Science.gov (United States)

    Senaras, Caglar; Moberly, Aaron C.; Teknos, Theodoros; Essig, Garth; Elmaraghy, Charles; Taj-Schaal, Nazhat; Yua, Lianbo; Gurcan, Metin N.

    2018-02-01

    In this study, we proposed an approach to report the condition of the eardrum as "normal" or "abnormal" by ensembling two different deep learning architectures. In the first network (Network 1), we applied transfer learning to the Inception V3 network by using 409 labeled samples. As a second network (Network 2), we designed a convolutional neural network to take advantage of auto-encoders by using additional 673 unlabeled eardrum samples. The individual classification accuracies of the Network 1 and Network 2 were calculated as 84.4%(+/- 12.1%) and 82.6% (+/- 11.3%), respectively. Only 32% of the errors of the two networks were the same, making it possible to combine two approaches to achieve better classification accuracy. The proposed ensemble method allows us to achieve robust classification because it has high accuracy (84.4%) with the lowest standard deviation (+/- 10.3%).

  16. A preclustering-based ensemble learning technique for acute appendicitis diagnoses.

    Science.gov (United States)

    Lee, Yen-Hsien; Hu, Paul Jen-Hwa; Cheng, Tsang-Hsiang; Huang, Te-Chia; Chuang, Wei-Yao

    2013-06-01

    Acute appendicitis is a common medical condition, whose effective, timely diagnosis can be difficult. A missed diagnosis not only puts the patient in danger but also requires additional resources for corrective treatments. An acute appendicitis diagnosis constitutes a classification problem, for which a further fundamental challenge pertains to the skewed outcome class distribution of instances in the training sample. A preclustering-based ensemble learning (PEL) technique aims to address the associated imbalanced sample learning problems and thereby support the timely, accurate diagnosis of acute appendicitis. The proposed PEL technique employs undersampling to reduce the number of majority-class instances in a training sample, uses preclustering to group similar majority-class instances into multiple groups, and selects from each group representative instances to create more balanced samples. The PEL technique thereby reduces potential information loss from random undersampling. It also takes advantage of ensemble learning to improve performance. We empirically evaluate this proposed technique with 574 clinical cases obtained from a comprehensive tertiary hospital in southern Taiwan, using several prevalent techniques and a salient scoring system as benchmarks. The comparative results show that PEL is more effective and less biased than any benchmarks. The proposed PEL technique seems more sensitive to identifying positive acute appendicitis than the commonly used Alvarado scoring system and exhibits higher specificity in identifying negative acute appendicitis. In addition, the sensitivity and specificity values of PEL appear higher than those of the investigated benchmarks that follow the resampling approach. Our analysis suggests PEL benefits from the more representative majority-class instances in the training sample. According to our overall evaluation results, PEL records the best overall performance, and its area under the curve measure reaches 0.619. The

  17. On Ensemble Nonlinear Kalman Filtering with Symmetric Analysis Ensembles

    KAUST Repository

    Luo, Xiaodong; Hoteit, Ibrahim; Moroz, Irene M.

    2010-01-01

    However, by adopting the Monte Carlo method, the EnSRF also incurs certain sampling errors. One way to alleviate this problem is to introduce certain symmetry to the ensembles, which can reduce the sampling errors and spurious modes in evaluation of the means and covariances of the ensembles [7]. In this contribution, we present two methods to produce symmetric ensembles. One is based on the unscented transform [8, 9], which leads to the unscented Kalman filter (UKF) [8, 9] and its variant, the ensemble unscented Kalman filter (EnUKF) [7]. The other is based on Stirling’s interpolation formula (SIF), which results in the divided difference filter (DDF) [10]. Here we propose a simplified divided difference filter (sDDF) in the context of ensemble filtering. The similarity and difference between the sDDF and the EnUKF will be discussed. Numerical experiments will also be conducted to investigate the performance of the sDDF and the EnUKF, and compare them to a well‐established EnSRF, the ensemble transform Kalman filter (ETKF) [2].

  18. An adaptive Gaussian process-based iterative ensemble smoother for data assimilation

    Science.gov (United States)

    Ju, Lei; Zhang, Jiangjiang; Meng, Long; Wu, Laosheng; Zeng, Lingzao

    2018-05-01

    Accurate characterization of subsurface hydraulic conductivity is vital for modeling of subsurface flow and transport. The iterative ensemble smoother (IES) has been proposed to estimate the heterogeneous parameter field. As a Monte Carlo-based method, IES requires a relatively large ensemble size to guarantee its performance. To improve the computational efficiency, we propose an adaptive Gaussian process (GP)-based iterative ensemble smoother (GPIES) in this study. At each iteration, the GP surrogate is adaptively refined by adding a few new base points chosen from the updated parameter realizations. Then the sensitivity information between model parameters and measurements is calculated from a large number of realizations generated by the GP surrogate with virtually no computational cost. Since the original model evaluations are only required for base points, whose number is much smaller than the ensemble size, the computational cost is significantly reduced. The applicability of GPIES in estimating heterogeneous conductivity is evaluated by the saturated and unsaturated flow problems, respectively. Without sacrificing estimation accuracy, GPIES achieves about an order of magnitude of speed-up compared with the standard IES. Although subsurface flow problems are considered in this study, the proposed method can be equally applied to other hydrological models.

  19. An ensemble machine learning approach to predict survival in breast cancer.

    Science.gov (United States)

    Djebbari, Amira; Liu, Ziying; Phan, Sieu; Famili, Fazel

    2008-01-01

    Current breast cancer predictive signatures are not unique. Can we use this fact to our advantage to improve prediction? From the machine learning perspective, it is well known that combining multiple classifiers can improve classification performance. We propose an ensemble machine learning approach which consists of choosing feature subsets and learning predictive models from them. We then combine models based on certain model fusion criteria and we also introduce a tuning parameter to control sensitivity. Our method significantly improves classification performance with a particular emphasis on sensitivity which is critical to avoid misclassifying poor prognosis patients as good prognosis.

  20. AN ENSEMBLE TEMPLATE MATCHING AND CONTENT-BASED IMAGE RETRIEVAL SCHEME TOWARDS EARLY STAGE DETECTION OF MELANOMA

    Directory of Open Access Journals (Sweden)

    Spiros Kostopoulos

    2016-12-01

    Full Text Available Malignant melanoma represents the most dangerous type of skin cancer. In this study we present an ensemble classification scheme, employing the mutual information, the cross-correlation and the clustering based on proximity of image features methods, for early stage assessment of melanomas on plain photography images. The proposed scheme performs two main operations. First, it retrieves the most similar, to the unknown case, image samples from an available image database with verified benign moles and malignant melanoma cases. Second, it provides an automated estimation regarding the nature of the unknown image sample based on the majority of the most similar images retrieved from the available database. Clinical material comprised 75 melanoma and 75 benign plain photography images collected from publicly available dermatological atlases. Results showed that the ensemble scheme outperformed all other methods tested in terms of accuracy with 94.9±1.5%, following an external cross-validation evaluation methodology. The proposed scheme may benefit patients by providing a second opinion consultation during the self-skin examination process and the physician by providing a second opinion estimation regarding the nature of suspicious moles that may assist towards decision making especially for ambiguous cases, safeguarding, in this way from potential diagnostic misinterpretations.

  1. A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning.

    Science.gov (United States)

    Lu, Wei; Li, Zhe; Chu, Jinghui

    2017-04-01

    Breast cancer is a common cancer among women. With the development of modern medical science and information technology, medical imaging techniques have an increasingly important role in the early detection and diagnosis of breast cancer. In this paper, we propose an automated computer-aided diagnosis (CADx) framework for magnetic resonance imaging (MRI). The scheme consists of an ensemble of several machine learning-based techniques, including ensemble under-sampling (EUS) for imbalanced data processing, the Relief algorithm for feature selection, the subspace method for providing data diversity, and Adaboost for improving the performance of base classifiers. We extracted morphological, various texture, and Gabor features. To clarify the feature subsets' physical meaning, subspaces are built by combining morphological features with each kind of texture or Gabor feature. We tested our proposal using a manually segmented Region of Interest (ROI) data set, which contains 438 images of malignant tumors and 1898 images of normal tissues or benign tumors. Our proposal achieves an area under the ROC curve (AUC) value of 0.9617, which outperforms most other state-of-the-art breast MRI CADx systems. Compared with other methods, our proposal significantly reduces the false-positive classification rate. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

    Directory of Open Access Journals (Sweden)

    Datta Susmita

    2010-08-01

    Full Text Available Abstract Background Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. Results We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. Conclusions For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.

  3. Online cross-validation-based ensemble learning.

    Science.gov (United States)

    Benkeser, David; Ju, Cheng; Lendle, Sam; van der Laan, Mark

    2018-01-30

    Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  4. EnsembleGASVR: A novel ensemble method for classifying missense single nucleotide polymorphisms

    KAUST Repository

    Rapakoulia, Trisevgeni

    2014-04-26

    Motivation: Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem ofmissing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. Results: To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a twostep algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. © The Author 2014.

  5. Ensemble Clustering Classification Applied to Competing SVM and One-Class Classifiers Exemplified by Plant MicroRNAs Data

    Directory of Open Access Journals (Sweden)

    Yousef Malik

    2016-12-01

    Full Text Available The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN. In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.

  6. Active relearning for robust supervised classification of pulmonary emphysema

    Science.gov (United States)

    Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.

    2012-03-01

    Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.

  7. Object Classification Based on Analysis of Spectral Characteristics of Seismic Signal Envelopes

    Science.gov (United States)

    Morozov, Yu. V.; Spektor, A. A.

    2017-11-01

    A method for classifying moving objects having a seismic effect on the ground surface is proposed which is based on statistical analysis of the envelopes of received signals. The values of the components of the amplitude spectrum of the envelopes obtained applying Hilbert and Fourier transforms are used as classification criteria. Examples illustrating the statistical properties of spectra and the operation of the seismic classifier are given for an ensemble of objects of four classes (person, group of people, large animal, vehicle). It is shown that the computational procedures for processing seismic signals are quite simple and can therefore be used in real-time systems with modest requirements for computational resources.

  8. Comparative Visualization of Vector Field Ensembles Based on Longest Common Subsequence

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Richen; Guo, Hanqi; Zhang, Jiang; Yuan, Xiaoru

    2016-04-19

    We propose a longest common subsequence (LCS) based approach to compute the distance among vector field ensembles. By measuring how many common blocks the ensemble pathlines passing through, the LCS distance defines the similarity among vector field ensembles by counting the number of sharing domain data blocks. Compared to the traditional methods (e.g. point-wise Euclidean distance or dynamic time warping distance), the proposed approach is robust to outlier, data missing, and sampling rate of pathline timestep. Taking the advantages of smaller and reusable intermediate output, visualization based on the proposed LCS approach revealing temporal trends in the data at low storage cost, and avoiding tracing pathlines repeatedly. Finally, we evaluate our method on both synthetic data and simulation data, which demonstrate the robustness of the proposed approach.

  9. Class-specific Error Bounds for Ensemble Classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  10. An ensemble self-training protein interaction article classifier.

    Science.gov (United States)

    Chen, Yifei; Hou, Ping; Manderick, Bernard

    2014-01-01

    Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.

  11. The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

    OpenAIRE

    Qin, Xiwen; Li, Qiaoling; Dong, Xiaogang; Lv, Siqi

    2017-01-01

    Accurate diagnosis of rolling bearing fault on the normal operation of machinery and equipment has a very important significance. A method combining Ensemble Empirical Mode Decomposition (EEMD) and Random Forest (RF) is proposed. Firstly, the original signal is decomposed into several intrinsic mode functions (IMFs) by EEMD, and the effective IMFs are selected. Then their energy entropy is calculated as the feature. Finally, the classification is performed by RF. In addition, the wavelet meth...

  12. A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification

    Science.gov (United States)

    Zhang, Ce; Pan, Xin; Li, Huapeng; Gardiner, Andy; Sargent, Isabel; Hare, Jonathon; Atkinson, Peter M.

    2018-06-01

    The contextual-based convolutional neural network (CNN) with deep architecture and pixel-based multilayer perceptron (MLP) with shallow structure are well-recognized neural network algorithms, representing the state-of-the-art deep learning method and the classical non-parametric machine learning approach, respectively. The two algorithms, which have very different behaviours, were integrated in a concise and effective way using a rule-based decision fusion approach for the classification of very fine spatial resolution (VFSR) remotely sensed imagery. The decision fusion rules, designed primarily based on the classification confidence of the CNN, reflect the generally complementary patterns of the individual classifiers. In consequence, the proposed ensemble classifier MLP-CNN harvests the complementary results acquired from the CNN based on deep spatial feature representation and from the MLP based on spectral discrimination. Meanwhile, limitations of the CNN due to the adoption of convolutional filters such as the uncertainty in object boundary partition and loss of useful fine spatial resolution detail were compensated. The effectiveness of the ensemble MLP-CNN classifier was tested in both urban and rural areas using aerial photography together with an additional satellite sensor dataset. The MLP-CNN classifier achieved promising performance, consistently outperforming the pixel-based MLP, spectral and textural-based MLP, and the contextual-based CNN in terms of classification accuracy. This research paves the way to effectively address the complicated problem of VFSR image classification.

  13. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm

    International Nuclear Information System (INIS)

    Yu, Lean; Wang, Shouyang; Lai, Kin Keung

    2008-01-01

    In this study, an empirical mode decomposition (EMD) based neural network ensemble learning paradigm is proposed for world crude oil spot price forecasting. For this purpose, the original crude oil spot price series were first decomposed into a finite, and often small, number of intrinsic mode functions (IMFs). Then a three-layer feed-forward neural network (FNN) model was used to model each of the extracted IMFs, so that the tendencies of these IMFs could be accurately predicted. Finally, the prediction results of all IMFs are combined with an adaptive linear neural network (ALNN), to formulate an ensemble output for the original crude oil price series. For verification and testing, two main crude oil price series, West Texas Intermediate (WTI) crude oil spot price and Brent crude oil spot price, are used to test the effectiveness of the proposed EMD-based neural network ensemble learning methodology. Empirical results obtained demonstrate attractiveness of the proposed EMD-based neural network ensemble learning paradigm. (author)

  14. On Ensemble Nonlinear Kalman Filtering with Symmetric Analysis Ensembles

    KAUST Repository

    Luo, Xiaodong

    2010-09-19

    The ensemble square root filter (EnSRF) [1, 2, 3, 4] is a popular method for data assimilation in high dimensional systems (e.g., geophysics models). Essentially the EnSRF is a Monte Carlo implementation of the conventional Kalman filter (KF) [5, 6]. It is mainly different from the KF at the prediction steps, where it is some ensembles, rather then the means and covariance matrices, of the system state that are propagated forward. In doing this, the EnSRF is computationally more efficient than the KF, since propagating a covariance matrix forward in high dimensional systems is prohibitively expensive. In addition, the EnSRF is also very convenient in implementation. By propagating the ensembles of the system state, the EnSRF can be directly applied to nonlinear systems without any change in comparison to the assimilation procedures in linear systems. However, by adopting the Monte Carlo method, the EnSRF also incurs certain sampling errors. One way to alleviate this problem is to introduce certain symmetry to the ensembles, which can reduce the sampling errors and spurious modes in evaluation of the means and covariances of the ensembles [7]. In this contribution, we present two methods to produce symmetric ensembles. One is based on the unscented transform [8, 9], which leads to the unscented Kalman filter (UKF) [8, 9] and its variant, the ensemble unscented Kalman filter (EnUKF) [7]. The other is based on Stirling’s interpolation formula (SIF), which results in the divided difference filter (DDF) [10]. Here we propose a simplified divided difference filter (sDDF) in the context of ensemble filtering. The similarity and difference between the sDDF and the EnUKF will be discussed. Numerical experiments will also be conducted to investigate the performance of the sDDF and the EnUKF, and compare them to a well‐established EnSRF, the ensemble transform Kalman filter (ETKF) [2].

  15. A comparison of resampling schemes for estimating model observer performance with small ensembles

    Science.gov (United States)

    Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.

    2017-09-01

    In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.

  16. Improving ECG classification accuracy using an ensemble of neural network modules.

    Directory of Open Access Journals (Sweden)

    Mehrdad Javadi

    Full Text Available This paper illustrates the use of a combined neural network model based on Stacked Generalization method for classification of electrocardiogram (ECG beats. In conventional Stacked Generalization method, the combiner learns to map the base classifiers' outputs to the target data. We claim adding the input pattern to the base classifiers' outputs helps the combiner to obtain knowledge about the input space and as the result, performs better on the same task. Experimental results support our claim that the additional knowledge according to the input space, improves the performance of the proposed method which is called Modified Stacked Generalization. In particular, for classification of 14966 ECG beats that were not previously seen during training phase, the Modified Stacked Generalization method reduced the error rate for 12.41% in comparison with the best of ten popular classifier fusion methods including Max, Min, Average, Product, Majority Voting, Borda Count, Decision Templates, Weighted Averaging based on Particle Swarm Optimization and Stacked Generalization.

  17. CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.

    Science.gov (United States)

    Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng

    2017-05-18

    Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).

  18. An approach for classification of hydrogeological systems at the regional scale based on groundwater hydrographs

    Science.gov (United States)

    Haaf, Ezra; Barthel, Roland

    2016-04-01

    When assessing hydrogeological conditions at the regional scale, the analyst is often confronted with uncertainty of structures, inputs and processes while having to base inference on scarce and patchy data. Haaf and Barthel (2015) proposed a concept for handling this predicament by developing a groundwater systems classification framework, where information is transferred from similar, but well-explored and better understood to poorly described systems. The concept is based on the central hypothesis that similar systems react similarly to the same inputs and vice versa. It is conceptually related to PUB (Prediction in ungauged basins) where organization of systems and processes by quantitative methods is intended and used to improve understanding and prediction. Furthermore, using the framework it is expected that regional conceptual and numerical models can be checked or enriched by ensemble generated data from neighborhood-based estimators. In a first step, groundwater hydrographs from a large dataset in Southern Germany are compared in an effort to identify structural similarity in groundwater dynamics. A number of approaches to group hydrographs, mostly based on a similarity measure - which have previously only been used in local-scale studies, can be found in the literature. These are tested alongside different global feature extraction techniques. The resulting classifications are then compared to a visual "expert assessment"-based classification which serves as a reference. A ranking of the classification methods is carried out and differences shown. Selected groups from the classifications are related to geological descriptors. Here we present the most promising results from a comparison of classifications based on series correlation, different series distances and series features, such as the coefficients of the discrete Fourier transform and the intrinsic mode functions of empirical mode decomposition. Additionally, we show examples of classes

  19. Ensemble candidate classification for the LOTAAS pulsar survey

    Science.gov (United States)

    Tan, C. M.; Lyon, R. J.; Stappers, B. W.; Cooper, S.; Hessels, J. W. T.; Kondratiev, V. I.; Michilli, D.; Sanidas, S.

    2018-03-01

    One of the biggest challenges arising from modern large-scale pulsar surveys is the number of candidates generated. Here, we implemented several improvements to the machine learning (ML) classifier previously used by the LOFAR Tied-Array All-Sky Survey (LOTAAS) to look for new pulsars via filtering the candidates obtained during periodicity searches. To assist the ML algorithm, we have introduced new features which capture the frequency and time evolution of the signal and improved the signal-to-noise calculation accounting for broad profiles. We enhanced the ML classifier by including a third class characterizing RFI instances, allowing candidates arising from RFI to be isolated, reducing the false positive return rate. We also introduced a new training data set used by the ML algorithm that includes a large sample of pulsars misclassified by the previous classifier. Lastly, we developed an ensemble classifier comprised of five different Decision Trees. Taken together these updates improve the pulsar recall rate by 2.5 per cent, while also improving the ability to identify pulsars with wide pulse profiles, often misclassified by the previous classifier. The new ensemble classifier is also able to reduce the percentage of false positive candidates identified from each LOTAAS pointing from 2.5 per cent (˜500 candidates) to 1.1 per cent (˜220 candidates).

  20. Harmony Search Based Parameter Ensemble Adaptation for Differential Evolution

    Directory of Open Access Journals (Sweden)

    Rammohan Mallipeddi

    2013-01-01

    Full Text Available In differential evolution (DE algorithm, depending on the characteristics of the problem at hand and the available computational resources, different strategies combined with a different set of parameters may be effective. In addition, a single, well-tuned combination of strategies and parameters may not guarantee optimal performance because different strategies combined with different parameter settings can be appropriate during different stages of the evolution. Therefore, various adaptive/self-adaptive techniques have been proposed to adapt the DE strategies and parameters during the course of evolution. In this paper, we propose a new parameter adaptation technique for DE based on ensemble approach and harmony search algorithm (HS. In the proposed method, an ensemble of parameters is randomly sampled which form the initial harmony memory. The parameter ensemble evolves during the course of the optimization process by HS algorithm. Each parameter combination in the harmony memory is evaluated by testing them on the DE population. The performance of the proposed adaptation method is evaluated using two recently proposed strategies (DE/current-to-pbest/bin and DE/current-to-gr_best/bin as basic DE frameworks. Numerical results demonstrate the effectiveness of the proposed adaptation technique compared to the state-of-the-art DE based algorithms on a set of challenging test problems (CEC 2005.

  1. Ensemble-based Probabilistic Forecasting at Horns Rev

    DEFF Research Database (Denmark)

    Pinson, Pierre; Madsen, Henrik

    2009-01-01

    forecasting methodology. In a first stage, ensemble forecasts of meteorological variables are converted to power through a suitable power curve model. This modelemploys local polynomial regression, and is adoptively estimated with an orthogonal fitting method. The obtained ensemble forecasts of wind power...

  2. Seismic assessment of existing RC buildings under alternative ground motion ensembles compatible to EC8 and NTC 2008

    NARCIS (Netherlands)

    Tanganelli, Marco; Viti, Stefania; Mariani, V.; Pianigiani, Maria

    2017-01-01

    This work investigates the effects of the choice of different ensembles of ground motions on the seismic assessment of existing RC buildings through nonlinear dynamic analysis. Nowadays indeed, all the main International Seismic Codes provide a soil classification which is based on the shear wave

  3. 'Lazy' quantum ensembles

    International Nuclear Information System (INIS)

    Parfionov, George; Zapatrin, Roman

    2006-01-01

    We compare different strategies aimed to prepare an ensemble with a given density matrix ρ. Preparing the ensemble of eigenstates of ρ with appropriate probabilities can be treated as 'generous' strategy: it provides maximal accessible information about the state. Another extremity is the so-called 'Scrooge' ensemble, which is mostly stingy in sharing the information. We introduce 'lazy' ensembles which require minimal effort to prepare the density matrix by selecting pure states with respect to completely random choice. We consider two parties, Alice and Bob, playing a kind of game. Bob wishes to guess which pure state is prepared by Alice. His null hypothesis, based on the lack of any information about Alice's intention, is that Alice prepares any pure state with equal probability. Then, the average quantum state measured by Bob turns out to be ρ, and he has to make a new hypothesis about Alice's intention solely based on the information that the observed density matrix is ρ. The arising 'lazy' ensemble is shown to be the alternative hypothesis which minimizes type I error

  4. A New Classification Approach Based on Multiple Classification Rules

    OpenAIRE

    Zhongmei Zhou

    2014-01-01

    A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when t...

  5. Extensions and applications of ensemble-of-trees methods in machine learning

    Science.gov (United States)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  6. A Matrix-Free Posterior Ensemble Kalman Filter Implementation Based on a Modified Cholesky Decomposition

    Directory of Open Access Journals (Sweden)

    Elias D. Nino-Ruiz

    2017-07-01

    Full Text Available In this paper, a matrix-free posterior ensemble Kalman filter implementation based on a modified Cholesky decomposition is proposed. The method works as follows: the precision matrix of the background error distribution is estimated based on a modified Cholesky decomposition. The resulting estimator can be expressed in terms of Cholesky factors which can be updated based on a series of rank-one matrices in order to approximate the precision matrix of the analysis distribution. By using this matrix, the posterior ensemble can be built by either sampling from the posterior distribution or using synthetic observations. Furthermore, the computational effort of the proposed method is linear with regard to the model dimension and the number of observed components from the model domain. Experimental tests are performed making use of the Lorenz-96 model. The results reveal that, the accuracy of the proposed implementation in terms of root-mean-square-error is similar, and in some cases better, to that of a well-known ensemble Kalman filter (EnKF implementation: the local ensemble transform Kalman filter. In addition, the results are comparable to those obtained by the EnKF with large ensemble sizes.

  7. A Fuzzy Integral Ensemble Method in Visual P300 Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Francesco Cavrini

    2016-01-01

    Full Text Available We evaluate the possibility of application of combination of classifiers using fuzzy measures and integrals to Brain-Computer Interface (BCI based on electroencephalography. In particular, we present an ensemble method that can be applied to a variety of systems and evaluate it in the context of a visual P300-based BCI. Offline analysis of data relative to 5 subjects lets us argue that the proposed classification strategy is suitable for BCI. Indeed, the achieved performance is significantly greater than the average of the base classifiers and, broadly speaking, similar to that of the best one. Thus the proposed methodology allows realizing systems that can be used by different subjects without the need for a preliminary configuration phase in which the best classifier for each user has to be identified. Moreover, the ensemble is often capable of detecting uncertain situations and turning them from misclassifications into abstentions, thereby improving the level of safety in BCI for environmental or device control.

  8. An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.

    Science.gov (United States)

    Tuarob, Suppawong; Tucker, Conrad S; Salathe, Marcel; Ram, Nilam

    2014-06-01

    The role of social media as a source of timely and massive information has become more apparent since the era of Web 2.0.Multiple studies illustrated the use of information in social media to discover biomedical and health-related knowledge.Most methods proposed in the literature employ traditional document classification techniques that represent a document as a bag of words.These techniques work well when documents are rich in text and conform to standard English; however, they are not optimal for social media data where sparsity and noise are norms.This paper aims to address the limitations posed by the traditional bag-of-word based methods and propose to use heterogeneous features in combination with ensemble machine learning techniques to discover health-related information, which could prove to be useful to multiple biomedical applications, especially those needing to discover health-related knowledge in large scale social media data.Furthermore, the proposed methodology could be generalized to discover different types of information in various kinds of textual data. Social media data is characterized by an abundance of short social-oriented messages that do not conform to standard languages, both grammatically and syntactically.The problem of discovering health-related knowledge in social media data streams is then transformed into a text classification problem, where a text is identified as positive if it is health-related and negative otherwise.We first identify the limitations of the traditional methods which train machines with N-gram word features, then propose to overcome such limitations by utilizing the collaboration of machine learning based classifiers, each of which is trained to learn a semantically different aspect of the data.The parameter analysis for tuning each classifier is also reported. Three data sets are used in this research.The first data set comprises of approximately 5000 hand-labeled tweets, and is used for cross validation of the

  9. Human resource recommendation algorithm based on ensemble learning and Spark

    Science.gov (United States)

    Cong, Zihan; Zhang, Xingming; Wang, Haoxiang; Xu, Hongjie

    2017-08-01

    Aiming at the problem of “information overload” in the human resources industry, this paper proposes a human resource recommendation algorithm based on Ensemble Learning. The algorithm considers the characteristics and behaviours of both job seeker and job features in the real business circumstance. Firstly, the algorithm uses two ensemble learning methods-Bagging and Boosting. The outputs from both learning methods are then merged to form user interest model. Based on user interest model, job recommendation can be extracted for users. The algorithm is implemented as a parallelized recommendation system on Spark. A set of experiments have been done and analysed. The proposed algorithm achieves significant improvement in accuracy, recall rate and coverage, compared with recommendation algorithms such as UserCF and ItemCF.

  10. Proposed hybrid-classifier ensemble algorithm to map snow cover area

    Science.gov (United States)

    Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

    2018-01-01

    Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.

  11. Realization of Deutsch-like algorithm using ensemble computing

    International Nuclear Information System (INIS)

    Wei Daxiu; Luo Jun; Sun Xianping; Zeng Xizhi

    2003-01-01

    The Deutsch-like algorithm [Phys. Rev. A. 63 (2001) 034101] distinguishes between even and odd query functions using fewer function calls than its possible classical counterpart in a two-qubit system. But the similar method cannot be applied to a multi-qubit system. We propose a new approach for solving Deutsch-like problem using ensemble computing. The proposed algorithm needs an ancillary qubit and can be easily extended to multi-qubit system with one query. Our ensemble algorithm beginning with a easily-prepared initial state has three main steps. The classifications of the functions can be obtained directly from the spectra of the ancilla qubit. We also demonstrate the new algorithm in a four-qubit molecular system using nuclear magnetic resonance (NMR). One hydrogen and three carbons are selected as the four qubits, and one of carbons is ancilla qubit. We choice two unitary transformations, corresponding to two functions (one odd function and one even function), to validate the ensemble algorithm. The results show that our experiment is successfully and our ensemble algorithm for solving the Deutsch-like problem is virtual

  12. Classifying Physical Morphology of Cocoa Beans Digital Images using Multiclass Ensemble Least-Squares Support Vector Machine

    Science.gov (United States)

    Lawi, Armin; Adhitya, Yudhi

    2018-03-01

    The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.

  13. EnsembleGraph: Interactive Visual Analysis of Spatial-Temporal Behavior for Ensemble Simulation Data

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Qingya; Guo, Hanqi; Che, Limei; Yuan, Xiaoru; Liu, Junfeng; Liang, Jie

    2016-04-19

    We present a novel visualization framework—EnsembleGraph— for analyzing ensemble simulation data, in order to help scientists understand behavior similarities between ensemble members over space and time. A graph-based representation is used to visualize individual spatiotemporal regions with similar behaviors, which are extracted by hierarchical clustering algorithms. A user interface with multiple-linked views is provided, which enables users to explore, locate, and compare regions that have similar behaviors between and then users can investigate and analyze the selected regions in detail. The driving application of this paper is the studies on regional emission influences over tropospheric ozone, which is based on ensemble simulations conducted with different anthropogenic emission absences using the MOZART-4 (model of ozone and related tracers, version 4) model. We demonstrate the effectiveness of our method by visualizing the MOZART-4 ensemble simulation data and evaluating the relative regional emission influences on tropospheric ozone concentrations. Positive feedbacks from domain experts and two case studies prove efficiency of our method.

  14. Ensemble-based flash-flood modelling: Taking into account hydrodynamic parameters and initial soil moisture uncertainties

    Science.gov (United States)

    Edouard, Simon; Vincendon, Béatrice; Ducrocq, Véronique

    2018-05-01

    Intense precipitation events in the Mediterranean often lead to devastating flash floods (FF). FF modelling is affected by several kinds of uncertainties and Hydrological Ensemble Prediction Systems (HEPS) are designed to take those uncertainties into account. The major source of uncertainty comes from rainfall forcing and convective-scale meteorological ensemble prediction systems can manage it for forecasting purpose. But other sources are related to the hydrological modelling part of the HEPS. This study focuses on the uncertainties arising from the hydrological model parameters and initial soil moisture with aim to design an ensemble-based version of an hydrological model dedicated to Mediterranean fast responding rivers simulations, the ISBA-TOP coupled system. The first step consists in identifying the parameters that have the strongest influence on FF simulations by assuming perfect precipitation. A sensitivity study is carried out first using a synthetic framework and then for several real events and several catchments. Perturbation methods varying the most sensitive parameters as well as initial soil moisture allow designing an ensemble-based version of ISBA-TOP. The first results of this system on some real events are presented. The direct perspective of this work will be to drive this ensemble-based version with the members of a convective-scale meteorological ensemble prediction system to design a complete HEPS for FF forecasting.

  15. Towards a GME ensemble forecasting system: Ensemble initialization using the breeding technique

    Directory of Open Access Journals (Sweden)

    Jan D. Keller

    2008-12-01

    Full Text Available The quantitative forecast of precipitation requires a probabilistic background particularly with regard to forecast lead times of more than 3 days. As only ensemble simulations can provide useful information of the underlying probability density function, we built a new ensemble forecasting system (GME-EFS based on the GME model of the German Meteorological Service (DWD. For the generation of appropriate initial ensemble perturbations we chose the breeding technique developed by Toth and Kalnay (1993, 1997, which develops perturbations by estimating the regions of largest model error induced uncertainty. This method is applied and tested in the framework of quasi-operational forecasts for a three month period in 2007. The performance of the resulting ensemble forecasts are compared to the operational ensemble prediction systems ECMWF EPS and NCEP GFS by means of ensemble spread of free atmosphere parameters (geopotential and temperature and ensemble skill of precipitation forecasting. This comparison indicates that the GME ensemble forecasting system (GME-EFS provides reasonable forecasts with spread skill score comparable to that of the NCEP GFS. An analysis with the continuous ranked probability score exhibits a lack of resolution for the GME forecasts compared to the operational ensembles. However, with significant enhancements during the 3 month test period, the first results of our work with the GME-EFS indicate possibilities for further development as well as the potential for later operational usage.

  16. A Region-Based GeneSIS Segmentation Algorithm for the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    Stelios K. Mylonas

    2015-03-01

    Full Text Available This paper proposes an object-based segmentation/classification scheme for remotely sensed images, based on a novel variant of the recently proposed Genetic Sequential Image Segmentation (GeneSIS algorithm. GeneSIS segments the image in an iterative manner, whereby at each iteration a single object is extracted via a genetic-based object extraction algorithm. Contrary to the previous pixel-based GeneSIS where the candidate objects to be extracted were evaluated through the fuzzy content of their included pixels, in the newly developed region-based GeneSIS algorithm, a watershed-driven fine segmentation map is initially obtained from the original image, which serves as the basis for the forthcoming GeneSIS segmentation. Furthermore, in order to enhance the spatial search capabilities, we introduce a more descriptive encoding scheme in the object extraction algorithm, where the structural search modules are represented by polygonal shapes. Our objectives in the new framework are posed as follows: enhance the flexibility of the algorithm in extracting more flexible object shapes, assure high level classification accuracies, and reduce the execution time of the segmentation, while at the same time preserving all the inherent attributes of the GeneSIS approach. Finally, exploiting the inherent attribute of GeneSIS to produce multiple segmentations, we also propose two segmentation fusion schemes that operate on the ensemble of segmentations generated by GeneSIS. Our approaches are tested on an urban and two agricultural images. The results show that region-based GeneSIS has considerably lower computational demands compared to the pixel-based one. Furthermore, the suggested methods achieve higher classification accuracies and good segmentation maps compared to a series of existing algorithms.

  17. Ensembl 2017

    OpenAIRE

    Aken, Bronwen L.; Achuthan, Premanand; Akanni, Wasiu; Amode, M. Ridwan; Bernsdorff, Friederike; Bhai, Jyothish; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Gil, Laurent; Gir?n, Carlos Garc?a; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.

    2016-01-01

    Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access ...

  18. Bayesian model ensembling using meta-trained recurrent neural networks

    NARCIS (Netherlands)

    Ambrogioni, L.; Berezutskaya, Y.; Gü ç lü , U.; Borne, E.W.P. van den; Gü ç lü tü rk, Y.; Gerven, M.A.J. van; Maris, E.G.G.

    2017-01-01

    In this paper we demonstrate that a recurrent neural network meta-trained on an ensemble of arbitrary classification tasks can be used as an approximation of the Bayes optimal classifier. This result is obtained by relying on the framework of e-free approximate Bayesian inference, where the Bayesian

  19. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units

    International Nuclear Information System (INIS)

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien

    2015-01-01

    Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables

  20. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

    Science.gov (United States)

    Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

    2014-12-08

    Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  1. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

    Directory of Open Access Journals (Sweden)

    Cuicui Zhang

    2014-12-01

    Full Text Available Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1 how to define diverse base classifiers from the small data; (2 how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  2. A Noise-Assisted Data Analysis Method for Automatic EOG-Based Sleep Stage Classification Using Ensemble Learning

    DEFF Research Database (Denmark)

    Olesen, Alexander Neergaard; Christensen, Julie Anja Engelhard; Sørensen, Helge Bjarup Dissing

    2016-01-01

    Reducing the number of recording modalities for sleep staging research can benefit both researchers and patients, under the condition that they provide as accurate results as conventional systems. This paper investigates the possibility of exploiting the multisource nature of the electrooculography...... (EOG) signals by presenting a method for automatic sleep staging using the complete ensemble empirical mode decomposition with adaptive noise algorithm, and a random forest classifier. It achieves a high overall accuracy of 82% and a Cohen’s kappa of 0.74 indicating substantial agreement between...

  3. Skill prediction of local weather forecasts based on the ECMWF ensemble

    Directory of Open Access Journals (Sweden)

    C. Ziehmann

    2001-01-01

    Full Text Available Ensemble Prediction has become an essential part of numerical weather forecasting. In this paper we investigate the ability of ensemble forecasts to provide an a priori estimate of the expected forecast skill. Several quantities derived from the local ensemble distribution are investigated for a two year data set of European Centre for Medium-Range Weather Forecasts (ECMWF temperature and wind speed ensemble forecasts at 30 German stations. The results indicate that the population of the ensemble mode provides useful information for the uncertainty in temperature forecasts. The ensemble entropy is a similar good measure. This is not true for the spread if it is simply calculated as the variance of the ensemble members with respect to the ensemble mean. The number of clusters in the C regions is almost unrelated to the local skill. For wind forecasts, the results are less promising.

  4. Can single classifiers be as useful as model ensembles to produce benthic seabed substratum maps?

    Science.gov (United States)

    Turner, Joseph A.; Babcock, Russell C.; Hovey, Renae; Kendrick, Gary A.

    2018-05-01

    Numerous machine-learning classifiers are available for benthic habitat map production, which can lead to different results. This study highlights the performance of the Random Forest (RF) classifier, which was significantly better than Classification Trees (CT), Naïve Bayes (NB), and a multi-model ensemble in terms of overall accuracy, Balanced Error Rate (BER), Kappa, and area under the curve (AUC) values. RF accuracy was often higher than 90% for each substratum class, even at the most detailed level of the substratum classification and AUC values also indicated excellent performance (0.8-1). Total agreement between classifiers was high at the broadest level of classification (75-80%) when differentiating between hard and soft substratum. However, this sharply declined as the number of substratum categories increased (19-45%) including a mix of rock, gravel, pebbles, and sand. The model ensemble, produced from the results of all three classifiers by majority voting, did not show any increase in predictive performance when compared to the single RF classifier. This study shows how a single classifier may be sufficient to produce benthic seabed maps and model ensembles of multiple classifiers.

  5. A Noise-Assisted Data Analysis Method for Automatic EOG-Based Sleep Stage Classification Using Ensemble Learning.

    Science.gov (United States)

    Olesen, Alexander Neergaard; Christensen, Julie A E; Sorensen, Helge B D; Jennum, Poul J

    2016-08-01

    Reducing the number of recording modalities for sleep staging research can benefit both researchers and patients, under the condition that they provide as accurate results as conventional systems. This paper investigates the possibility of exploiting the multisource nature of the electrooculography (EOG) signals by presenting a method for automatic sleep staging using the complete ensemble empirical mode decomposition with adaptive noise algorithm, and a random forest classifier. It achieves a high overall accuracy of 82% and a Cohen's kappa of 0.74 indicating substantial agreement between automatic and manual scoring.

  6. Performance analysis of a Principal Component Analysis ensemble classifier for Emotiv headset P300 spellers.

    Science.gov (United States)

    Elsawy, Amr S; Eldawlatly, Seif; Taher, Mohamed; Aly, Gamal M

    2014-01-01

    The current trend to use Brain-Computer Interfaces (BCIs) with mobile devices mandates the development of efficient EEG data processing methods. In this paper, we demonstrate the performance of a Principal Component Analysis (PCA) ensemble classifier for P300-based spellers. We recorded EEG data from multiple subjects using the Emotiv neuroheadset in the context of a classical oddball P300 speller paradigm. We compare the performance of the proposed ensemble classifier to the performance of traditional feature extraction and classifier methods. Our results demonstrate the capability of the PCA ensemble classifier to classify P300 data recorded using the Emotiv neuroheadset with an average accuracy of 86.29% on cross-validation data. In addition, offline testing of the recorded data reveals an average classification accuracy of 73.3% that is significantly higher than that achieved using traditional methods. Finally, we demonstrate the effect of the parameters of the P300 speller paradigm on the performance of the method.

  7. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  8. Developing an Ensemble Prediction System based on COSMO-DE

    Science.gov (United States)

    Theis, S.; Gebhardt, C.; Buchhold, M.; Ben Bouallègue, Z.; Ohl, R.; Paulat, M.; Peralta, C.

    2010-09-01

    The numerical weather prediction model COSMO-DE is a configuration of the COSMO model with a horizontal grid size of 2.8 km. It has been running operationally at DWD since 2007, it covers the area of Germany and produces forecasts with a lead time of 0-21 hours. The model COSMO-DE is convection-permitting, which means that it does without a parametrisation of deep convection and simulates deep convection explicitly. One aim is an improved forecast of convective heavy rain events. Convection-permitting models are in operational use at several weather services, but currently not in ensemble mode. It is expected that an ensemble system could reveal the advantages of a convection-permitting model even better. The probabilistic approach is necessary, because the explicit simulation of convective processes for more than a few hours cannot be viewed as a deterministic forecast anymore. This is due to the chaotic behaviour and short life cycle of the processes which are simulated explicitly now. In the framework of the project COSMO-DE-EPS, DWD is developing and implementing an ensemble prediction system (EPS) for the model COSMO-DE. The project COSMO-DE-EPS comprises the generation of ensemble members, as well as the verification and visualization of the ensemble forecasts and also statistical postprocessing. A pre-operational mode of the EPS with 20 ensemble members is foreseen to start in 2010. Operational use is envisaged to start in 2012, after an upgrade to 40 members and inclusion of statistical postprocessing. The presentation introduces the project COSMO-DE-EPS and describes the design of the ensemble as it is planned for the pre-operational mode. In particular, the currently implemented method for the generation of ensemble members will be explained and discussed. The method includes variations of initial conditions, lateral boundary conditions, and model physics. At present, pragmatic methods are applied which resemble the basic ideas of a multi-model approach

  9. Three-dimensional theory of quantum memories based on Λ-type atomic ensembles

    International Nuclear Information System (INIS)

    Zeuthen, Emil; Grodecka-Grad, Anna; Soerensen, Anders S.

    2011-01-01

    We develop a three-dimensional theory for quantum memories based on light storage in ensembles of Λ-type atoms, where two long-lived atomic ground states are employed. We consider light storage in an ensemble of finite spatial extent and we show that within the paraxial approximation the Fresnel number of the atomic ensemble and the optical depth are the only important physical parameters determining the quality of the quantum memory. We analyze the influence of these parameters on the storage of light followed by either forward or backward read-out from the quantum memory. We show that for small Fresnel numbers the forward memory provides higher efficiencies, whereas for large Fresnel numbers the backward memory is advantageous. The optimal light modes to store in the memory are presented together with the corresponding spin waves and outcoming light modes. We show that for high optical depths such Λ-type atomic ensembles allow for highly efficient backward and forward memories even for small Fresnel numbers F(greater-or-similar sign)0.1.

  10. Ensemble-based data assimilation and optimal sensor placement for scalar source reconstruction

    Science.gov (United States)

    Mons, Vincent; Wang, Qi; Zaki, Tamer

    2017-11-01

    Reconstructing the characteristics of a scalar source from limited remote measurements in a turbulent flow is a problem of great interest for environmental monitoring, and is challenging due to several aspects. Firstly, the numerical estimation of the scalar dispersion in a turbulent flow requires significant computational resources. Secondly, in actual practice, only a limited number of observations are available, which generally makes the corresponding inverse problem ill-posed. Ensemble-based variational data assimilation techniques are adopted to solve the problem of scalar source localization in a turbulent channel flow at Reτ = 180 . This approach combines the components of variational data assimilation and ensemble Kalman filtering, and inherits the robustness from the former and the ease of implementation from the latter. An ensemble-based methodology for optimal sensor placement is also proposed in order to improve the condition of the inverse problem, which enhances the performances of the data assimilation scheme. This work has been partially funded by the Office of Naval Research (Grant N00014-16-1-2542) and by the National Science Foundation (Grant 1461870).

  11. On evaluation of ensemble precipitation forecasts with observation-based ensembles

    Directory of Open Access Journals (Sweden)

    S. Jaun

    2007-04-01

    Full Text Available Spatial interpolation of precipitation data is uncertain. How important is this uncertainty and how can it be considered in evaluation of high-resolution probabilistic precipitation forecasts? These questions are discussed by experimental evaluation of the COSMO consortium's limited-area ensemble prediction system COSMO-LEPS. The applied performance measure is the often used Brier skill score (BSS. The observational references in the evaluation are (a analyzed rain gauge data by ordinary Kriging and (b ensembles of interpolated rain gauge data by stochastic simulation. This permits the consideration of either a deterministic reference (the event is observed or not with 100% certainty or a probabilistic reference that makes allowance for uncertainties in spatial averaging. The evaluation experiments show that the evaluation uncertainties are substantial even for the large area (41 300 km2 of Switzerland with a mean rain gauge distance as good as 7 km: the one- to three-day precipitation forecasts have skill decreasing with forecast lead time but the one- and two-day forecast performances differ not significantly.

  12. The Ensembl REST API: Ensembl Data for Any Language.

    Science.gov (United States)

    Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R S; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul

    2015-01-01

    We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. © The Author 2014. Published by Oxford University Press.

  13. Semantic labeling of high-resolution aerial images using an ensemble of fully convolutional networks

    Science.gov (United States)

    Sun, Xiaofeng; Shen, Shuhan; Lin, Xiangguo; Hu, Zhanyi

    2017-10-01

    High-resolution remote sensing data classification has been a challenging and promising research topic in the community of remote sensing. In recent years, with the rapid advances of deep learning, remarkable progress has been made in this field, which facilitates a transition from hand-crafted features designing to an automatic end-to-end learning. A deep fully convolutional networks (FCNs) based ensemble learning method is proposed to label the high-resolution aerial images. To fully tap the potentials of FCNs, both the Visual Geometry Group network and a deeper residual network, ResNet, are employed. Furthermore, to enlarge training samples with diversity and gain better generalization, in addition to the commonly used data augmentation methods (e.g., rotation, multiscale, and aspect ratio) in the literature, aerial images from other datasets are also collected for cross-scene learning. Finally, we combine these learned models to form an effective FCN ensemble and refine the results using a fully connected conditional random field graph model. Experiments on the ISPRS 2-D Semantic Labeling Contest dataset show that our proposed end-to-end classification method achieves an overall accuracy of 90.7%, a state-of-the-art in the field.

  14. Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China

    Science.gov (United States)

    Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping

    2017-11-01

    Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For

  15. The role of ensemble-based statistics in variational assimilation of cloud-affected observations from infrared imagers

    Science.gov (United States)

    Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris

    2017-04-01

    Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the

  16. Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA

    Directory of Open Access Journals (Sweden)

    Ming-gang Du

    2009-01-01

    Full Text Available Motivation. Independent Components Analysis (ICA maximizes the statistical independence of the representational components of a training gene expression profiles (GEP ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.

  17. An Ensemble Learning Based Framework for Traditional Chinese Medicine Data Analysis with ICD-10 Labels

    Directory of Open Access Journals (Sweden)

    Gang Zhang

    2015-01-01

    Full Text Available Objective. This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation. Methods. We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT and support vector machine (SVM are trained by bootstrapping the training dataset. The base learners are sorted by accuracy and diversity through nondominated sort (NDS algorithm and combined through a deep ensemble learning strategy. Results. We evaluate the proposed method with comparison to two currently successful methods on a clinical diagnosis dataset with manually labeled ICD-10 information. ICD-10 label annotation and acupoints recommendation are evaluated for three methods. The proposed method achieves an accuracy rate of 88.2%  ±  2.8% measured by zero-one loss for the first evaluation session and 79.6%  ±  3.6% measured by Hamming loss, which are superior to the other two methods. Conclusion. The proposed ensemble model can effectively model the implied knowledge and experience in historic clinical data records. The computational cost of training a set of base learners is relatively low.

  18. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    Science.gov (United States)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  19. Enhancing Predictive Accuracy of Cardiac Autonomic Neuropathy Using Blood Biochemistry Features and Iterative Multitier Ensembles.

    Science.gov (United States)

    Abawajy, Jemal; Kelarev, Andrei; Chowdhury, Morshed U; Jelinek, Herbert F

    2016-01-01

    Blood biochemistry attributes form an important class of tests, routinely collected several times per year for many patients with diabetes. The objective of this study is to investigate the role of blood biochemistry for improving the predictive accuracy of the diagnosis of cardiac autonomic neuropathy (CAN) progression. Blood biochemistry contributes to CAN, and so it is a causative factor that can provide additional power for the diagnosis of CAN especially in the absence of a complete set of Ewing tests. We introduce automated iterative multitier ensembles (AIME) and investigate their performance in comparison to base classifiers and standard ensemble classifiers for blood biochemistry attributes. AIME incorporate diverse ensembles into several tiers simultaneously and combine them into one automatically generated integrated system so that one ensemble acts as an integral part of another ensemble. We carried out extensive experimental analysis using large datasets from the diabetes screening research initiative (DiScRi) project. The results of our experiments show that several blood biochemistry attributes can be used to supplement the Ewing battery for the detection of CAN in situations where one or more of the Ewing tests cannot be completed because of the individual difficulties faced by each patient in performing the tests. The results show that AIME provide higher accuracy as a multitier CAN classification paradigm. The best predictive accuracy of 99.57% has been obtained by the AIME combining decorate on top tier with bagging on middle tier based on random forest. Practitioners can use these findings to increase the accuracy of CAN diagnosis.

  20. Ensemble-based forecasting at Horns Rev: Ensemble conversion and kernel dressing

    DEFF Research Database (Denmark)

    Pinson, Pierre; Madsen, Henrik

    . The obtained ensemble forecasts of wind power are then converted into predictive distributions with an original adaptive kernel dressing method. The shape of the kernels is driven by a mean-variance model, the parameters of which are recursively estimated in order to maximize the overall skill of obtained...

  1. Intelligent and robust prediction of short term wind power using genetic programming based ensemble of neural networks

    International Nuclear Information System (INIS)

    Zameer, Aneela; Arshad, Junaid; Khan, Asifullah; Raja, Muhammad Asif Zahoor

    2017-01-01

    Highlights: • Genetic programming based ensemble of neural networks is employed for short term wind power prediction. • Proposed predictor shows resilience against abrupt changes in weather. • Genetic programming evolves nonlinear mapping between meteorological measures and wind-power. • Proposed approach gives mathematical expressions of wind power to its independent variables. • Proposed model shows relatively accurate and steady wind-power prediction performance. - Abstract: The inherent instability of wind power production leads to critical problems for smooth power generation from wind turbines, which then requires an accurate forecast of wind power. In this study, an effective short term wind power prediction methodology is presented, which uses an intelligent ensemble regressor that comprises Artificial Neural Networks and Genetic Programming. In contrast to existing series based combination of wind power predictors, whereby the error or variation in the leading predictor is propagated down the stream to the next predictors, the proposed intelligent ensemble predictor avoids this shortcoming by introducing Genetical Programming based semi-stochastic combination of neural networks. It is observed that the decision of the individual base regressors may vary due to the frequent and inherent fluctuations in the atmospheric conditions and thus meteorological properties. The novelty of the reported work lies in creating ensemble to generate an intelligent, collective and robust decision space and thereby avoiding large errors due to the sensitivity of the individual wind predictors. The proposed ensemble based regressor, Genetic Programming based ensemble of Artificial Neural Networks, has been implemented and tested on data taken from five different wind farms located in Europe. Obtained numerical results of the proposed model in terms of various error measures are compared with the recent artificial intelligence based strategies to demonstrate the

  2. A new strategy for snow-cover mapping using remote sensing data and ensemble based systems techniques

    Science.gov (United States)

    Roberge, S.; Chokmani, K.; De Sève, D.

    2012-04-01

    The snow cover plays an important role in the hydrological cycle of Quebec (Eastern Canada). Consequently, evaluating its spatial extent interests the authorities responsible for the management of water resources, especially hydropower companies. The main objective of this study is the development of a snow-cover mapping strategy using remote sensing data and ensemble based systems techniques. Planned to be tested in a near real-time operational mode, this snow-cover mapping strategy has the advantage to provide the probability of a pixel to be snow covered and its uncertainty. Ensemble systems are made of two key components. First, a method is needed to build an ensemble of classifiers that is diverse as much as possible. Second, an approach is required to combine the outputs of individual classifiers that make up the ensemble in such a way that correct decisions are amplified, and incorrect ones are cancelled out. In this study, we demonstrate the potential of ensemble systems to snow-cover mapping using remote sensing data. The chosen classifier is a sequential thresholds algorithm using NOAA-AVHRR data adapted to conditions over Eastern Canada. Its special feature is the use of a combination of six sequential thresholds varying according to the day in the winter season. Two versions of the snow-cover mapping algorithm have been developed: one is specific for autumn (from October 1st to December 31st) and the other for spring (from March 16th to May 31st). In order to build the ensemble based system, different versions of the algorithm are created by varying randomly its parameters. One hundred of the versions are included in the ensemble. The probability of a pixel to be snow, no-snow or cloud covered corresponds to the amount of votes the pixel has been classified as such by all classifiers. The overall performance of ensemble based mapping is compared to the overall performance of the chosen classifier, and also with ground observations at meteorological

  3. Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure

    Directory of Open Access Journals (Sweden)

    Xiaodong Zeng

    2014-01-01

    Full Text Available A weighted accuracy and diversity (WAD method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.

  4. A convection-allowing ensemble forecast based on the breeding growth mode and associated optimization of precipitation forecast

    Science.gov (United States)

    Li, Xiang; He, Hongrang; Chen, Chaohui; Miao, Ziqing; Bai, Shigang

    2017-10-01

    A convection-allowing ensemble forecast experiment on a squall line was conducted based on the breeding growth mode (BGM). Meanwhile, the probability matched mean (PMM) and neighborhood ensemble probability (NEP) methods were used to optimize the associated precipitation forecast. The ensemble forecast predicted the precipitation tendency accurately, which was closer to the observation than in the control forecast. For heavy rainfall, the precipitation center produced by the ensemble forecast was also better. The Fractions Skill Score (FSS) results indicated that the ensemble mean was skillful in light rainfall, while the PMM produced better probability distribution of precipitation for heavy rainfall. Preliminary results demonstrated that convection-allowing ensemble forecast could improve precipitation forecast skill through providing valuable probability forecasts. It is necessary to employ new methods, such as the PMM and NEP, to generate precipitation probability forecasts. Nonetheless, the lack of spread and the overprediction of precipitation by the ensemble members are still problems that need to be solved.

  5. Ensemble-based prediction of RNA secondary structures.

    Science.gov (United States)

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between

  6. Pitch Based Sound Classification

    DEFF Research Database (Denmark)

    Nielsen, Andreas Brinch; Hansen, Lars Kai; Kjems, U

    2006-01-01

    A sound classification model is presented that can classify signals into music, noise and speech. The model extracts the pitch of the signal using the harmonic product spectrum. Based on the pitch estimate and a pitch error measure, features are created and used in a probabilistic model with soft......-max output function. Both linear and quadratic inputs are used. The model is trained on 2 hours of sound and tested on publicly available data. A test classification error below 0.05 with 1 s classification windows is achieved. Further more it is shown that linear input performs as well as a quadratic......, and that even though classification gets marginally better, not much is achieved by increasing the window size beyond 1 s....

  7. The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

    Directory of Open Access Journals (Sweden)

    Xiwen Qin

    2017-01-01

    Full Text Available Accurate diagnosis of rolling bearing fault on the normal operation of machinery and equipment has a very important significance. A method combining Ensemble Empirical Mode Decomposition (EEMD and Random Forest (RF is proposed. Firstly, the original signal is decomposed into several intrinsic mode functions (IMFs by EEMD, and the effective IMFs are selected. Then their energy entropy is calculated as the feature. Finally, the classification is performed by RF. In addition, the wavelet method is also used in the proposed process, the same as EEMD. The results of the comparison show that the EEMD method is more accurate than the wavelet method.

  8. Ensemble-based evaluation of extreme water levels for the eastern Baltic Sea

    Science.gov (United States)

    Eelsalu, Maris; Soomere, Tarmo

    2016-04-01

    The risks and damages associated with coastal flooding that are naturally associated with an increase in the magnitude of extreme storm surges are one of the largest concerns of countries with extensive low-lying nearshore areas. The relevant risks are even more contrast for semi-enclosed water bodies such as the Baltic Sea where subtidal (weekly-scale) variations in the water volume of the sea substantially contribute to the water level and lead to large spreading of projections of future extreme water levels. We explore the options for using large ensembles of projections to more reliably evaluate return periods of extreme water levels. Single projections of the ensemble are constructed by means of fitting several sets of block maxima with various extreme value distributions. The ensemble is based on two simulated data sets produced in the Swedish Meteorological and Hydrological Institute. A hindcast by the Rossby Centre Ocean model is sampled with a resolution of 6 h and a similar hindcast by the circulation model NEMO with a resolution of 1 h. As the annual maxima of water levels in the Baltic Sea are not always uncorrelated, we employ maxima for calendar years and for stormy seasons. As the shape parameter of the Generalised Extreme Value distribution changes its sign and substantially varies in magnitude along the eastern coast of the Baltic Sea, the use of a single distribution for the entire coast is inappropriate. The ensemble involves projections based on the Generalised Extreme Value, Gumbel and Weibull distributions. The parameters of these distributions are evaluated using three different ways: maximum likelihood method and method of moments based on both biased and unbiased estimates. The total number of projections in the ensemble is 40. As some of the resulting estimates contain limited additional information, the members of pairs of projections that are highly correlated are assigned weights 0.6. A comparison of the ensemble-based projection of

  9. The Drag-based Ensemble Model (DBEM) for Coronal Mass Ejection Propagation

    Science.gov (United States)

    Dumbović, Mateja; Čalogović, Jaša; Vršnak, Bojan; Temmer, Manuela; Mays, M. Leila; Veronig, Astrid; Piantschitsch, Isabell

    2018-02-01

    The drag-based model for heliospheric propagation of coronal mass ejections (CMEs) is a widely used analytical model that can predict CME arrival time and speed at a given heliospheric location. It is based on the assumption that the propagation of CMEs in interplanetary space is solely under the influence of magnetohydrodynamical drag, where CME propagation is determined based on CME initial properties as well as the properties of the ambient solar wind. We present an upgraded version, the drag-based ensemble model (DBEM), that covers ensemble modeling to produce a distribution of possible ICME arrival times and speeds. Multiple runs using uncertainty ranges for the input values can be performed in almost real-time, within a few minutes. This allows us to define the most likely ICME arrival times and speeds, quantify prediction uncertainties, and determine forecast confidence. The performance of the DBEM is evaluated and compared to that of ensemble WSA-ENLIL+Cone model (ENLIL) using the same sample of events. It is found that the mean error is ME = ‑9.7 hr, mean absolute error MAE = 14.3 hr, and root mean square error RMSE = 16.7 hr, which is somewhat higher than, but comparable to ENLIL errors (ME = ‑6.1 hr, MAE = 12.8 hr and RMSE = 14.4 hr). Overall, DBEM and ENLIL show a similar performance. Furthermore, we find that in both models fast CMEs are predicted to arrive earlier than observed, most likely owing to the physical limitations of models, but possibly also related to an overestimation of the CME initial speed for fast CMEs.

  10. R-FCN Object Detection Ensemble based on Object Resolution and Image Quality

    DEFF Research Database (Denmark)

    Rasmussen, Christoffer Bøgelund; Nasrollahi, Kamal; Moeslund, Thomas B.

    2017-01-01

    Object detection can be difficult due to challenges such as variations in objects both inter- and intra-class. Additionally, variations can also be present between images. Based on this, research was conducted into creating an ensemble of Region-based Fully Convolutional Networks (R-FCN) object d...

  11. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...... the accuracy at the same time. The test example is classified using simpler and smaller model. The training examples in a particular cluster share the common vocabulary. At the time of clustering, we do not take into account the labels of the training examples. After the clusters have been created......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...

  12. Ensemble data assimilation in the Red Sea: sensitivity to ensemble selection and atmospheric forcing

    KAUST Repository

    Toye, Habib

    2017-05-26

    We present our efforts to build an ensemble data assimilation and forecasting system for the Red Sea. The system consists of the high-resolution Massachusetts Institute of Technology general circulation model (MITgcm) to simulate ocean circulation and of the Data Research Testbed (DART) for ensemble data assimilation. DART has been configured to integrate all members of an ensemble adjustment Kalman filter (EAKF) in parallel, based on which we adapted the ensemble operations in DART to use an invariant ensemble, i.e., an ensemble Optimal Interpolation (EnOI) algorithm. This approach requires only single forward model integration in the forecast step and therefore saves substantial computational cost. To deal with the strong seasonal variability of the Red Sea, the EnOI ensemble is then seasonally selected from a climatology of long-term model outputs. Observations of remote sensing sea surface height (SSH) and sea surface temperature (SST) are assimilated every 3 days. Real-time atmospheric fields from the National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are used as forcing in different assimilation experiments. We investigate the behaviors of the EAKF and (seasonal-) EnOI and compare their performances for assimilating and forecasting the circulation of the Red Sea. We further assess the sensitivity of the assimilation system to various filtering parameters (ensemble size, inflation) and atmospheric forcing.

  13. Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning

    DEFF Research Database (Denmark)

    Aakerberg, Andreas; Nasrollahi, Kamal; Heder, Thomas

    2018-01-01

    Augmenting RGB images with depth information is a well-known method to significantly improve the recognition accuracy of object recognition models. Another method to im- prove the performance of visual recognition models is ensemble learning. However, this method has not been widely explored...... in combination with deep convolutional neural network based RGB-D object recognition models. Hence, in this paper, we form different ensembles of complementary deep convolutional neural network models, and show that this can be used to increase the recognition performance beyond existing limits. Experiments...

  14. Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

    Science.gov (United States)

    Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

    2018-01-01

    In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.

  15. Genetic algorithm based adaptive neural network ensemble and its application in predicting carbon flux

    Science.gov (United States)

    Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.

    2007-01-01

    To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.

  16. Semi-Supervised Multi-View Ensemble Learning Based On Extracting Cross-View Correlation

    Directory of Open Access Journals (Sweden)

    ZALL, R.

    2016-05-01

    Full Text Available Correlated information between different views incorporate useful for learning in multi view data. Canonical correlation analysis (CCA plays important role to extract these information. However, CCA only extracts the correlated information between paired data and cannot preserve correlated information between within-class samples. In this paper, we propose a two-view semi-supervised learning method called semi-supervised random correlation ensemble base on spectral clustering (SS_RCE. SS_RCE uses a multi-view method based on spectral clustering which takes advantage of discriminative information in multiple views to estimate labeling information of unlabeled samples. In order to enhance discriminative power of CCA features, we incorporate the labeling information of both unlabeled and labeled samples into CCA. Then, we use random correlation between within-class samples from cross view to extract diverse correlated features for training component classifiers. Furthermore, we extend a general model namely SSMV_RCE to construct ensemble method to tackle semi-supervised learning in the presence of multiple views. Finally, we compare the proposed methods with existing multi-view feature extraction methods using multi-view semi-supervised ensembles. Experimental results on various multi-view data sets are presented to demonstrate the effectiveness of the proposed methods.

  17. Ensemble atmospheric dispersion calculations for decision support systems

    International Nuclear Information System (INIS)

    Borysiewicz, M.; Potempski, S.; Galkowski, A.; Zelazny, R.

    2003-01-01

    This document describes two approaches to long-range atmospheric dispersion of pollutants based on the ensemble concept. In the first part of the report some experiences related to the exercises undertaken under the ENSEMBLE project of the European Union are presented. The second part is devoted to the implementation of mesoscale numerical prediction models RAMS and atmospheric dispersion model HYPACT on Beowulf cluster and theirs usage for ensemble forecasting and long range atmospheric ensemble dispersion calculations based on available meteorological data from NCEO, NOAA (USA). (author)

  18. Sow-activity classification from acceleration patterns

    DEFF Research Database (Denmark)

    Escalante, Hugo Jair; Rodriguez, Sara V.; Cordero, Jorge

    2013-01-01

    sow-activity classification can be approached with standard machine learning methods for pattern classification. Individual predictions for elements of times series of arbitrary length are combined to classify it as a whole. An extensive comparison of representative learning algorithms, including......This paper describes a supervised learning approach to sow-activity classification from accelerometer measurements. In the proposed methodology, pairs of accelerometer measurements and activity types are considered as labeled instances of a usual supervised classification task. Under this scenario...... neural networks, support vector machines, and ensemble methods, is presented. Experimental results are reported using a data set for sow-activity classification collected in a real production herd. The data set, which has been widely used in related works, includes measurements from active (Feeding...

  19. A Prediction Method of Airport Noise Based on Hybrid Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Tao XU

    2014-05-01

    Full Text Available Using monitoring history data to build and to train a prediction model for airport noise is a normal method in recent years. However, the single model built in different ways has various performances in the storage, efficiency and accuracy. In order to predict the noise accurately in some complex environment around airport, this paper presents a prediction method based on hybrid ensemble learning. The proposed method ensembles three algorithms: artificial neural network as an active learner, nearest neighbor as a passive leaner and nonlinear regression as a synthesized learner. The experimental results show that the three learners can meet forecast demands respectively in on- line, near-line and off-line. And the accuracy of prediction is improved by integrating these three learners’ results.

  20. Impacts of calibration strategies and ensemble methods on ensemble flood forecasting over Lanjiang basin, Southeast China

    Science.gov (United States)

    Liu, Li; Xu, Yue-Ping

    2017-04-01

    Ensemble flood forecasting driven by numerical weather prediction products is becoming more commonly used in operational flood forecasting applications.In this study, a hydrological ensemble flood forecasting system based on Variable Infiltration Capacity (VIC) model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated.The hydrological model is optimized by parallel programmed ɛ-NSGAII multi-objective algorithm and two respectively parameterized models are determined to simulate daily flows and peak flows coupled with a modular approach.The results indicatethat the ɛ-NSGAII algorithm permits more efficient optimization and rational determination on parameter setting.It is demonstrated that the multimodel ensemble streamflow mean have better skills than the best singlemodel ensemble mean (ECMWF) and the multimodel ensembles weighted on members and skill scores outperform other multimodel ensembles. For typical flood event, it is proved that the flood can be predicted 3-4 days in advance, but the flows in rising limb can be captured with only 1-2 days ahead due to the flash feature. With respect to peak flows selected by Peaks Over Threshold approach, the ensemble means from either singlemodel or multimodels are generally underestimated as the extreme values are smoothed out by ensemble process.

  1. Coastal aquifer management under parameter uncertainty: Ensemble surrogate modeling based simulation-optimization

    Science.gov (United States)

    Janardhanan, S.; Datta, B.

    2011-12-01

    Surrogate models are widely used to develop computationally efficient simulation-optimization models to solve complex groundwater management problems. Artificial intelligence based models are most often used for this purpose where they are trained using predictor-predictand data obtained from a numerical simulation model. Most often this is implemented with the assumption that the parameters and boundary conditions used in the numerical simulation model are perfectly known. However, in most practical situations these values are uncertain. Under these circumstances the application of such approximation surrogates becomes limited. In our study we develop a surrogate model based coupled simulation optimization methodology for determining optimal pumping strategies for coastal aquifers considering parameter uncertainty. An ensemble surrogate modeling approach is used along with multiple realization optimization. The methodology is used to solve a multi-objective coastal aquifer management problem considering two conflicting objectives. Hydraulic conductivity and the aquifer recharge are considered as uncertain values. Three dimensional coupled flow and transport simulation model FEMWATER is used to simulate the aquifer responses for a number of scenarios corresponding to Latin hypercube samples of pumping and uncertain parameters to generate input-output patterns for training the surrogate models. Non-parametric bootstrap sampling of this original data set is used to generate multiple data sets which belong to different regions in the multi-dimensional decision and parameter space. These data sets are used to train and test multiple surrogate models based on genetic programming. The ensemble of surrogate models is then linked to a multi-objective genetic algorithm to solve the pumping optimization problem. Two conflicting objectives, viz, maximizing total pumping from beneficial wells and minimizing the total pumping from barrier wells for hydraulic control of

  2. The classicality and quantumness of a quantum ensemble

    International Nuclear Information System (INIS)

    Zhu Xuanmin; Pang Shengshi; Wu Shengjun; Liu Quanhui

    2011-01-01

    In this Letter, we investigate the classicality and quantumness of a quantum ensemble. We define a quantity called ensemble classicality based on classical cloning strategy (ECCC) to characterize how classical a quantum ensemble is. An ensemble of commuting states has a unit ECCC, while a general ensemble can have a ECCC less than 1. We also study how quantum an ensemble is by defining a related quantity called quantumness. We find that the classicality of an ensemble is closely related to how perfectly the ensemble can be cloned, and that the quantumness of the ensemble used in a quantum key distribution (QKD) protocol is exactly the attainable lower bound of the error rate in the sifted key. - Highlights: → A quantity is defined to characterize how classical a quantum ensemble is. → The classicality of an ensemble is closely related to the cloning performance. → Another quantity is also defined to investigate how quantum an ensemble is. → This quantity gives the lower bound of the error rate in a QKD protocol.

  3. An ensemble model of QSAR tools for regulatory risk assessment.

    Science.gov (United States)

    Pradeep, Prachi; Povinelli, Richard J; White, Shannon; Merrill, Stephen J

    2016-01-01

    Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa ( κ ): 0

  4. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment betw...

  5. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge Emil Borch Laurs; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment...

  6. Cloud field classification based on textural features

    Science.gov (United States)

    Sengupta, Sailes Kumar

    1989-01-01

    An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes

  7. A Contribution to the Study of Ensemble of Self-Organizing Maps

    Directory of Open Access Journals (Sweden)

    Leandro Antonio Pasa

    2015-01-01

    Full Text Available This study presents a factorial experiment to investigate the ensemble of Kohonen Self-Organizing Maps. Clusters Validity Indexes and the Mean Square Quantization Error were used as a criterion for fusing Kohonen Maps, through three different equations and four approaches. Computational simulations were performed with traditional dataset, including those with high dimensionality, not linearly separable classes, Gaussian mixtures, almost touching clusters, and unbalanced classes, from the UCI Machine Learning Repository and from Fundamental Clustering Problems Suite, with variations in map size, number of ensemble components, and the percentage of dataset bagging. The proposed method achieves a better classification than a single Kohonen Map and we applied the Wilcoxon Signed Rank Test to evidence its effectiveness.

  8. Utilising Tree-Based Ensemble Learning for Speaker Segmentation

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Tan, Zheng-Hua; Christensen, Mads Græsbøll

    2014-01-01

    In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used...... for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability. In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each...... tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant...

  9. ordination et classification de la vegetation des zones humides

    African Journals Online (AJOL)

    USER

    classes des Phragmitetea Tüxen & Pressing 1942 et des Potametea Tuxen et ... ORDINATION AND CLASSIFICATION OF WETLAND VEGETATION IN ... Principal Component Analysis, Cluster .... ensemble ; ce qui a permis d'avoir une vue.

  10. AN OBJECT-BASED METHOD FOR CHINESE LANDFORM TYPES CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    H. Ding

    2016-06-01

    Full Text Available Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM. In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.

  11. An Adjoint-Based Adaptive Ensemble Kalman Filter

    KAUST Repository

    Song, Hajoon

    2013-10-01

    A new hybrid ensemble Kalman filter/four-dimensional variational data assimilation (EnKF/4D-VAR) approach is introduced to mitigate background covariance limitations in the EnKF. The work is based on the adaptive EnKF (AEnKF) method, which bears a strong resemblance to the hybrid EnKF/three-dimensional variational data assimilation (3D-VAR) method. In the AEnKF, the representativeness of the EnKF ensemble is regularly enhanced with new members generated after back projection of the EnKF analysis residuals to state space using a 3D-VAR [or optimal interpolation (OI)] scheme with a preselected background covariance matrix. The idea here is to reformulate the transformation of the residuals as a 4D-VAR problem, constraining the new member with model dynamics and the previous observations. This should provide more information for the estimation of the new member and reduce dependence of the AEnKF on the assumed stationary background covariance matrix. This is done by integrating the analysis residuals backward in time with the adjoint model. Numerical experiments are performed with the Lorenz-96 model under different scenarios to test the new approach and to evaluate its performance with respect to the EnKF and the hybrid EnKF/3D-VAR. The new method leads to the least root-mean-square estimation errors as long as the linear assumption guaranteeing the stability of the adjoint model holds. It is also found to be less sensitive to choices of the assimilation system inputs and parameters.

  12. An Adjoint-Based Adaptive Ensemble Kalman Filter

    KAUST Repository

    Song, Hajoon; Hoteit, Ibrahim; Cornuelle, Bruce D.; Luo, Xiaodong; Subramanian, Aneesh C.

    2013-01-01

    A new hybrid ensemble Kalman filter/four-dimensional variational data assimilation (EnKF/4D-VAR) approach is introduced to mitigate background covariance limitations in the EnKF. The work is based on the adaptive EnKF (AEnKF) method, which bears a strong resemblance to the hybrid EnKF/three-dimensional variational data assimilation (3D-VAR) method. In the AEnKF, the representativeness of the EnKF ensemble is regularly enhanced with new members generated after back projection of the EnKF analysis residuals to state space using a 3D-VAR [or optimal interpolation (OI)] scheme with a preselected background covariance matrix. The idea here is to reformulate the transformation of the residuals as a 4D-VAR problem, constraining the new member with model dynamics and the previous observations. This should provide more information for the estimation of the new member and reduce dependence of the AEnKF on the assumed stationary background covariance matrix. This is done by integrating the analysis residuals backward in time with the adjoint model. Numerical experiments are performed with the Lorenz-96 model under different scenarios to test the new approach and to evaluate its performance with respect to the EnKF and the hybrid EnKF/3D-VAR. The new method leads to the least root-mean-square estimation errors as long as the linear assumption guaranteeing the stability of the adjoint model holds. It is also found to be less sensitive to choices of the assimilation system inputs and parameters.

  13. Estimating Uncertainty of Point-Cloud Based Single-Tree Segmentation with Ensemble Based Filtering

    Directory of Open Access Journals (Sweden)

    Matthew Parkan

    2018-02-01

    Full Text Available Individual tree crown segmentation from Airborne Laser Scanning data is a nodal problem in forest remote sensing. Focusing on single layered spruce and fir dominated coniferous forests, this article addresses the problem of directly estimating 3D segment shape uncertainty (i.e., without field/reference surveys, using a probabilistic approach. First, a coarse segmentation (marker controlled watershed is applied. Then, the 3D alpha hull and several descriptors are computed for each segment. Based on these descriptors, the alpha hulls are grouped to form ensembles (i.e., groups of similar tree shapes. By examining how frequently regions of a shape occur within an ensemble, it is possible to assign a shape probability to each point within a segment. The shape probability can subsequently be thresholded to obtain improved (filtered tree segments. Results indicate this approach can be used to produce segmentation reliability maps. A comparison to manually segmented tree crowns also indicates that the approach is able to produce more reliable tree shapes than the initial (unfiltered segmentation.

  14. Making decisions based on an imperfect ensemble of climate simulators: strategies and future directions

    Science.gov (United States)

    Sanderson, B. M.

    2017-12-01

    The CMIP ensembles represent the most comprehensive source of information available to decision-makers for climate adaptation, yet it is clear that there are fundamental limitations in our ability to treat the ensemble as an unbiased sample of possible future climate trajectories. There is considerable evidence that models are not independent, and increasing complexity and resolution combined with computational constraints prevent a thorough exploration of parametric uncertainty or internal variability. Although more data than ever is available for calibration, the optimization of each model is influenced by institutional priorities, historical precedent and available resources. The resulting ensemble thus represents a miscellany of climate simulators which defy traditional statistical interpretation. Models are in some cases interdependent, but are sufficiently complex that the degree of interdependency is conditional on the application. Configurations have been updated using available observations to some degree, but not in a consistent or easily identifiable fashion. This means that the ensemble cannot be viewed as a true posterior distribution updated by available data, but nor can observational data alone be used to assess individual model likelihood. We assess recent literature for combining projections from an imperfect ensemble of climate simulators. Beginning with our published methodology for addressing model interdependency and skill in the weighting scheme for the 4th US National Climate Assessment, we consider strategies for incorporating process-based constraints on future response, perturbed parameter experiments and multi-model output into an integrated framework. We focus on a number of guiding questions: Is the traditional framework of confidence in projections inferred from model agreement leading to biased or misleading conclusions? Can the benefits of upweighting skillful models be reconciled with the increased risk of truth lying outside the

  15. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

    Science.gov (United States)

    Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

    2014-08-01

    As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.

  16. Comparison Effectiveness of Pixel Based Classification and Object Based Classification Using High Resolution Image In Floristic Composition Mapping (Study Case: Gunung Tidar Magelang City)

    Science.gov (United States)

    Ardha Aryaguna, Prama; Danoedoro, Projo

    2016-11-01

    Developments of analysis remote sensing have same way with development of technology especially in sensor and plane. Now, a lot of image have high spatial and radiometric resolution, that's why a lot information. Vegetation object analysis such floristic composition got a lot advantage of that development. Floristic composition can be interpreted using a lot of method such pixel based classification and object based classification. The problems for pixel based method on high spatial resolution image are salt and paper who appear in result of classification. The purpose of this research are compare effectiveness between pixel based classification and object based classification for composition vegetation mapping on high resolution image Worldview-2. The results show that pixel based classification using majority 5×5 kernel windows give the highest accuracy between another classifications. The highest accuracy is 73.32% from image Worldview-2 are being radiometric corrected level surface reflectance, but for overall accuracy in every class, object based are the best between another methods. Reviewed from effectiveness aspect, pixel based are more effective then object based for vegetation composition mapping in Tidar forest.

  17. Dysphonic Voice Pattern Analysis of Patients in Parkinson’s Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods

    Directory of Open Access Journals (Sweden)

    Yunfeng Wu

    2017-01-01

    Full Text Available Analysis of quantified voice patterns is useful in the detection and assessment of dysphonia and related phonation disorders. In this paper, we first study the linear correlations between 22 voice parameters of fundamental frequency variability, amplitude variations, and nonlinear measures. The highly correlated vocal parameters are combined by using the linear discriminant analysis method. Based on the probability density functions estimated by the Parzen-window technique, we propose an interclass probability risk (ICPR method to select the vocal parameters with small ICPR values as dominant features and compare with the modified Kullback-Leibler divergence (MKLD feature selection approach. The experimental results show that the generalized logistic regression analysis (GLRA, support vector machine (SVM, and Bagging ensemble algorithm input with the ICPR features can provide better classification results than the same classifiers with the MKLD selected features. The SVM is much better at distinguishing normal vocal patterns with a specificity of 0.8542. Among the three classification methods, the Bagging ensemble algorithm with ICPR features can identify 90.77% vocal patterns, with the highest sensitivity of 0.9796 and largest area value of 0.9558 under the receiver operating characteristic curve. The classification results demonstrate the effectiveness of our feature selection and pattern analysis methods for dysphonic voice detection and measurement.

  18. Efficient Kernel-Based Ensemble Gaussian Mixture Filtering

    KAUST Repository

    Liu, Bo; Ait-El-Fquih, Boujemaa; Hoteit, Ibrahim

    2015-01-01

    (KF)-like update of the ensemble members and a particle filter (PF)-like update of the weights, followed by a resampling step to start a new forecast cycle. After formulating EnGMF for any observational operator, we analyze the influence

  19. Decision Aggregation in Distributed Classification by a Transductive Extension of Maximum Entropy/Improved Iterative Scaling

    Directory of Open Access Journals (Sweden)

    George Kesidis

    2008-06-01

    Full Text Available In many ensemble classification paradigms, the function which combines local/base classifier decisions is learned in a supervised fashion. Such methods require common labeled training examples across the classifier ensemble. However, in some scenarios, where an ensemble solution is necessitated, common labeled data may not exist: (i legacy/proprietary classifiers, and (ii spatially distributed and/or multiple modality sensors. In such cases, it is standard to apply fixed (untrained decision aggregation such as voting, averaging, or naive Bayes rules. In recent work, an alternative transductive learning strategy was proposed. There, decisions on test samples were chosen aiming to satisfy constraints measured by each local classifier. This approach was shown to reliably correct for class prior mismatch and to robustly account for classifier dependencies. Significant gains in accuracy over fixed aggregation rules were demonstrated. There are two main limitations of that work. First, feasibility of the constraints was not guaranteed. Second, heuristic learning was applied. Here, we overcome these problems via a transductive extension of maximum entropy/improved iterative scaling for aggregation in distributed classification. This method is shown to achieve improved decision accuracy over the earlier transductive approach and fixed rules on a number of UC Irvine datasets.

  20. Inventory classification based on decoupling points

    Directory of Open Access Journals (Sweden)

    Joakim Wikner

    2015-01-01

    Full Text Available The ideal state of continuous one-piece flow may never be achieved. Still the logistics manager can improve the flow by carefully positioning inventory to buffer against variations. Strategies such as lean, postponement, mass customization, and outsourcing all rely on strategic positioning of decoupling points to separate forecast-driven from customer-order-driven flows. Planning and scheduling of the flow are also based on classification of decoupling points as master scheduled or not. A comprehensive classification scheme for these types of decoupling points is introduced. The approach rests on identification of flows as being either demand based or supply based. The demand or supply is then combined with exogenous factors, classified as independent, or endogenous factors, classified as dependent. As a result, eight types of strategic as well as tactical decoupling points are identified resulting in a process-based framework for inventory classification that can be used for flow design.

  1. A Comparison Study on Rule Extraction from Neural Network Ensembles, Boosted Shallow Trees, and SVMs

    Directory of Open Access Journals (Sweden)

    Guido Bologna

    2018-01-01

    Full Text Available One way to make the knowledge stored in an artificial neural network more intelligible is to extract symbolic rules. However, producing rules from Multilayer Perceptrons (MLPs is an NP-hard problem. Many techniques have been introduced to generate rules from single neural networks, but very few were proposed for ensembles. Moreover, experiments were rarely assessed by 10-fold cross-validation trials. In this work, based on the Discretized Interpretable Multilayer Perceptron (DIMLP, experiments were performed on 10 repetitions of stratified 10-fold cross-validation trials over 25 binary classification problems. The DIMLP architecture allowed us to produce rules from DIMLP ensembles, boosted shallow trees (BSTs, and Support Vector Machines (SVM. The complexity of rulesets was measured with the average number of generated rules and average number of antecedents per rule. From the 25 used classification problems, the most complex rulesets were generated from BSTs trained by “gentle boosting” and “real boosting.” Moreover, we clearly observed that the less complex the rules were, the better their fidelity was. In fact, rules generated from decision stumps trained by modest boosting were, for almost all the 25 datasets, the simplest with the highest fidelity. Finally, in terms of average predictive accuracy and average ruleset complexity, the comparison of some of our results to those reported in the literature proved to be competitive.

  2. Sentiment classification technology based on Markov logic networks

    Science.gov (United States)

    He, Hui; Li, Zhigang; Yao, Chongchong; Zhang, Weizhe

    2016-07-01

    With diverse online media emerging, there is a growing concern of sentiment classification problem. At present, text sentiment classification mainly utilizes supervised machine learning methods, which feature certain domain dependency. On the basis of Markov logic networks (MLNs), this study proposed a cross-domain multi-task text sentiment classification method rooted in transfer learning. Through many-to-one knowledge transfer, labeled text sentiment classification, knowledge was successfully transferred into other domains, and the precision of the sentiment classification analysis in the text tendency domain was improved. The experimental results revealed the following: (1) the model based on a MLN demonstrated higher precision than the single individual learning plan model. (2) Multi-task transfer learning based on Markov logical networks could acquire more knowledge than self-domain learning. The cross-domain text sentiment classification model could significantly improve the precision and efficiency of text sentiment classification.

  3. Wang-Landau Reaction Ensemble Method: Simulation of Weak Polyelectrolytes and General Acid-Base Reactions.

    Science.gov (United States)

    Landsgesell, Jonas; Holm, Christian; Smiatek, Jens

    2017-02-14

    We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.

  4. Mechanism-based drug exposure classification in pharmacoepidemiological studies

    NARCIS (Netherlands)

    Verdel, B.M.

    2010-01-01

    Mechanism-based classification of drug exposure in pharmacoepidemiological studies In pharmacoepidemiology and pharmacovigilance, the relation between drug exposure and clinical outcomes is crucial. Exposure classification in pharmacoepidemiological studies is traditionally based on

  5. ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization

    Directory of Open Access Journals (Sweden)

    Krasnogor Natalio

    2009-10-01

    Full Text Available Abstract Background Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

  6. A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    P.Balaji

    2015-01-01

    Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.

  7. Symmetric minimally entangled typical thermal states, grand-canonical ensembles, and the influence of the collapse bases

    Science.gov (United States)

    Binder, Moritz; Barthel, Thomas

    Based on DMRG, strongly correlated quantum many-body systems at finite temperatures can be simulated by sampling over a certain class of pure matrix product states (MPS) called minimally entangled typical thermal states (METTS). Here, we show how symmetries of the system can be exploited to considerably reduce computation costs in the METTS algorithm. While this is straightforward for the canonical ensemble, we introduce a modification of the algorithm to efficiently simulate the grand-canonical ensemble under utilization of symmetries. In addition, we construct novel symmetry-conserving collapse bases for the transitions in the Markov chain of METTS that improve the speed of convergence of the algorithm by reducing autocorrelations.

  8. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    Science.gov (United States)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  9. Conservative strategy-based ensemble surrogate model for optimal groundwater remediation design at DNAPLs-contaminated sites

    Science.gov (United States)

    Ouyang, Qi; Lu, Wenxi; Lin, Jin; Deng, Wenbing; Cheng, Weiguo

    2017-08-01

    The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.

  10. Protein folding simulations by generalized-ensemble algorithms.

    Science.gov (United States)

    Yoda, Takao; Sugita, Yuji; Okamoto, Yuko

    2014-01-01

    In the protein folding problem, conventional simulations in physical statistical mechanical ensembles, such as the canonical ensemble with fixed temperature, face a great difficulty. This is because there exist a huge number of local-minimum-energy states in the system and the conventional simulations tend to get trapped in these states, giving wrong results. Generalized-ensemble algorithms are based on artificial unphysical ensembles and overcome the above difficulty by performing random walks in potential energy, volume, and other physical quantities or their corresponding conjugate parameters such as temperature, pressure, etc. The advantage of generalized-ensemble simulations lies in the fact that they not only avoid getting trapped in states of energy local minima but also allows the calculations of physical quantities as functions of temperature or other parameters from a single simulation run. In this article we review the generalized-ensemble algorithms. Four examples, multicanonical algorithm, replica-exchange method, replica-exchange multicanonical algorithm, and multicanonical replica-exchange method, are described in detail. Examples of their applications to the protein folding problem are presented.

  11. Ensembl 2004.

    Science.gov (United States)

    Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

    2004-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.

  12. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Science.gov (United States)

    Glaab, Enrico; Bacardit, Jaume; Garibaldi, Jonathan M; Krasnogor, Natalio

    2012-01-01

    Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  13. Ensemble Learning or Deep Learning? Application to Default Risk Analysis

    Directory of Open Access Journals (Sweden)

    Shigeyuki Hamori

    2018-03-01

    Full Text Available Proper credit-risk management is essential for lending institutions, as substantial losses can be incurred when borrowers default. Consequently, statistical methods that can measure and analyze credit risk objectively are becoming increasingly important. This study analyzes default payment data and compares the prediction accuracy and classification ability of three ensemble-learning methods—specifically, bagging, random forest, and boosting—with those of various neural-network methods, each of which has a different activation function. The results obtained indicate that the classification ability of boosting is superior to other machine-learning methods including neural networks. It is also found that the performance of neural-network models depends on the choice of activation function, the number of middle layers, and the inclusion of dropout.

  14. Automating the expert consensus paradigm for robust lung tissue classification

    Science.gov (United States)

    Rajagopalan, Srinivasan; Karwoski, Ronald A.; Raghunath, Sushravya; Bartholmai, Brian J.; Robb, Richard A.

    2012-03-01

    Clinicians confirm the efficacy of dynamic multidisciplinary interactions in diagnosing Lung disease/wellness from CT scans. However, routine clinical practice cannot readily accomodate such interactions. Current schemes for automating lung tissue classification are based on a single elusive disease differentiating metric; this undermines their reliability in routine diagnosis. We propose a computational workflow that uses a collection (#: 15) of probability density functions (pdf)-based similarity metrics to automatically cluster pattern-specific (#patterns: 5) volumes of interest (#VOI: 976) extracted from the lung CT scans of 14 patients. The resultant clusters are refined for intra-partition compactness and subsequently aggregated into a super cluster using a cluster ensemble technique. The super clusters were validated against the consensus agreement of four clinical experts. The aggregations correlated strongly with expert consensus. By effectively mimicking the expertise of physicians, the proposed workflow could make automation of lung tissue classification a clinical reality.

  15. An Ensemble Approach to Knowledge-Based Intensity-Modulated Radiation Therapy Planning

    Directory of Open Access Journals (Sweden)

    Jiahan Zhang

    2018-03-01

    Full Text Available Knowledge-based planning (KBP utilizes experienced planners’ knowledge embedded in prior plans to estimate optimal achievable dose volume histogram (DVH of new cases. In the regression-based KBP framework, previously planned patients’ anatomical features and DVHs are extracted, and prior knowledge is summarized as the regression coefficients that transform features to organ-at-risk DVH predictions. In our study, we find that in different settings, different regression methods work better. To improve the robustness of KBP models, we propose an ensemble method that combines the strengths of various linear regression models, including stepwise, lasso, elastic net, and ridge regression. In the ensemble approach, we first obtain individual model prediction metadata using in-training-set leave-one-out cross validation. A constrained optimization is subsequently performed to decide individual model weights. The metadata is also used to filter out impactful training set outliers. We evaluate our method on a fresh set of retrospectively retrieved anonymized prostate intensity-modulated radiation therapy (IMRT cases and head and neck IMRT cases. The proposed approach is more robust against small training set size, wrongly labeled cases, and dosimetric inferior plans, compared with other individual models. In summary, we believe the improved robustness makes the proposed method more suitable for clinical settings than individual models.

  16. AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.

    Science.gov (United States)

    Zhao, X G; Dai, W; Li, Y; Tian, L

    2011-11-01

    The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.

  17. Deep learning for EEG-Based preference classification

    Science.gov (United States)

    Teo, Jason; Hou, Chew Lin; Mountstephens, James

    2017-10-01

    Electroencephalogram (EEG)-based emotion classification is rapidly becoming one of the most intensely studied areas of brain-computer interfacing (BCI). The ability to passively identify yet accurately correlate brainwaves with our immediate emotions opens up truly meaningful and previously unattainable human-computer interactions such as in forensic neuroscience, rehabilitative medicine, affective entertainment and neuro-marketing. One particularly useful yet rarely explored areas of EEG-based emotion classification is preference recognition [1], which is simply the detection of like versus dislike. Within the limited investigations into preference classification, all reported studies were based on musically-induced stimuli except for a single study which used 2D images. The main objective of this study is to apply deep learning, which has been shown to produce state-of-the-art results in diverse hard problems such as in computer vision, natural language processing and audio recognition, to 3D object preference classification over a larger group of test subjects. A cohort of 16 users was shown 60 bracelet-like objects as rotating visual stimuli on a computer display while their preferences and EEGs were recorded. After training a variety of machine learning approaches which included deep neural networks, we then attempted to classify the users' preferences for the 3D visual stimuli based on their EEGs. Here, we show that that deep learning outperforms a variety of other machine learning classifiers for this EEG-based preference classification task particularly in a highly challenging dataset with large inter- and intra-subject variability.

  18. Shallow cumuli ensemble statistics for development of a stochastic parameterization

    Science.gov (United States)

    Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs

    2014-05-01

    According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a

  19. Knowledge-based approach to video content classification

    Science.gov (United States)

    Chen, Yu; Wong, Edward K.

    2001-01-01

    A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.

  20. Advanced Atmospheric Ensemble Modeling Techniques

    Energy Technology Data Exchange (ETDEWEB)

    Buckley, R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Chiswell, S. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Kurzeja, R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Maze, G. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Viner, B. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Werth, D. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL)

    2017-09-29

    Ensemble modeling (EM), the creation of multiple atmospheric simulations for a given time period, has become an essential tool for characterizing uncertainties in model predictions. We explore two novel ensemble modeling techniques: (1) perturbation of model parameters (Adaptive Programming, AP), and (2) data assimilation (Ensemble Kalman Filter, EnKF). The current research is an extension to work from last year and examines transport on a small spatial scale (<100 km) in complex terrain, for more rigorous testing of the ensemble technique. Two different release cases were studied, a coastal release (SF6) and an inland release (Freon) which consisted of two release times. Observations of tracer concentration and meteorology are used to judge the ensemble results. In addition, adaptive grid techniques have been developed to reduce required computing resources for transport calculations. Using a 20- member ensemble, the standard approach generated downwind transport that was quantitatively good for both releases; however, the EnKF method produced additional improvement for the coastal release where the spatial and temporal differences due to interior valley heating lead to the inland movement of the plume. The AP technique showed improvements for both release cases, with more improvement shown in the inland release. This research demonstrated that transport accuracy can be improved when models are adapted to a particular location/time or when important local data is assimilated into the simulation and enhances SRNL’s capability in atmospheric transport modeling in support of its current customer base and local site missions, as well as our ability to attract new customers within the intelligence community.

  1. Land Cover and Land Use Classification with TWOPAC: towards Automated Processing for Pixel- and Object-Based Image Classification

    Directory of Open Access Journals (Sweden)

    Stefan Dech

    2012-09-01

    Full Text Available We present a novel and innovative automated processing environment for the derivation of land cover (LC and land use (LU information. This processing framework named TWOPAC (TWinned Object and Pixel based Automated classification Chain enables the standardized, independent, user-friendly, and comparable derivation of LC and LU information, with minimized manual classification labor. TWOPAC allows classification of multi-spectral and multi-temporal remote sensing imagery from different sensor types. TWOPAC enables not only pixel-based classification, but also allows classification based on object-based characteristics. Classification is based on a Decision Tree approach (DT for which the well-known C5.0 code has been implemented, which builds decision trees based on the concept of information entropy. TWOPAC enables automatic generation of the decision tree classifier based on a C5.0-retrieved ascii-file, as well as fully automatic validation of the classification output via sample based accuracy assessment.Envisaging the automated generation of standardized land cover products, as well as area-wide classification of large amounts of data in preferably a short processing time, standardized interfaces for process control, Web Processing Services (WPS, as introduced by the Open Geospatial Consortium (OGC, are utilized. TWOPAC’s functionality to process geospatial raster or vector data via web resources (server, network enables TWOPAC’s usability independent of any commercial client or desktop software and allows for large scale data processing on servers. Furthermore, the components of TWOPAC were built-up using open source code components and are implemented as a plug-in for Quantum GIS software for easy handling of the classification process from the user’s perspective.

  2. WE-E-BRE-05: Ensemble of Graphical Models for Predicting Radiation Pneumontis Risk

    Energy Technology Data Exchange (ETDEWEB)

    Lee, S; Ybarra, N; Jeyaseelan, K; El Naqa, I [McGill University, Montreal, Quebec (Canada); Faria, S; Kopek, N [Montreal General Hospital, Montreal, Quebec (Canada)

    2014-06-15

    Purpose: We propose a prior knowledge-based approach to construct an interaction graph of biological and dosimetric radiation pneumontis (RP) covariates for the purpose of developing a RP risk classifier. Methods: We recruited 59 NSCLC patients who received curative radiotherapy with minimum 6 month follow-up. 16 RP events was observed (CTCAE grade ≥2). Blood serum was collected from every patient before (pre-RT) and during RT (mid-RT). From each sample the concentration of the following five candidate biomarkers were taken as covariates: alpha-2-macroglobulin (α2M), angiotensin converting enzyme (ACE), transforming growth factor β (TGF-β), interleukin-6 (IL-6), and osteopontin (OPN). Dose-volumetric parameters were also included as covariates. The number of biological and dosimetric covariates was reduced by a variable selection scheme implemented by L1-regularized logistic regression (LASSO). Posterior probability distribution of interaction graphs between the selected variables was estimated from the data under the literature-based prior knowledge to weight more heavily the graphs that contain the expected associations. A graph ensemble was formed by averaging the most probable graphs weighted by their posterior, creating a Bayesian Network (BN)-based RP risk classifier. Results: The LASSO selected the following 7 RP covariates: (1) pre-RT concentration level of α2M, (2) α2M level mid- RT/pre-RT, (3) pre-RT IL6 level, (4) IL6 level mid-RT/pre-RT, (5) ACE mid-RT/pre-RT, (6) PTV volume, and (7) mean lung dose (MLD). The ensemble BN model achieved the maximum sensitivity/specificity of 81%/84% and outperformed univariate dosimetric predictors as shown by larger AUC values (0.78∼0.81) compared with MLD (0.61), V20 (0.65) and V30 (0.70). The ensembles obtained by incorporating the prior knowledge improved classification performance for the ensemble size 5∼50. Conclusion: We demonstrated a probabilistic ensemble method to detect robust associations between

  3. Data assimilation in integrated hydrological modeling using ensemble Kalman filtering

    DEFF Research Database (Denmark)

    Rasmussen, Jørn; Madsen, H.; Jensen, Karsten Høgh

    2015-01-01

    Groundwater head and stream discharge is assimilated using the ensemble transform Kalman filter in an integrated hydrological model with the aim of studying the relationship between the filter performance and the ensemble size. In an attempt to reduce the required number of ensemble members...... and estimating parameters requires a much larger ensemble size than just assimilating groundwater head observations. However, the required ensemble size can be greatly reduced with the use of adaptive localization, which by far outperforms distance-based localization. The study is conducted using synthetic data...

  4. Bioactive focus in conformational ensembles: a pluralistic approach

    Science.gov (United States)

    Habgood, Matthew

    2017-12-01

    Computational generation of conformational ensembles is key to contemporary drug design. Selecting the members of the ensemble that will approximate the conformation most likely to bind to a desired target (the bioactive conformation) is difficult, given that the potential energy usually used to generate and rank the ensemble is a notoriously poor discriminator between bioactive and non-bioactive conformations. In this study an approach to generating a focused ensemble is proposed in which each conformation is assigned multiple rankings based not just on potential energy but also on solvation energy, hydrophobic or hydrophilic interaction energy, radius of gyration, and on a statistical potential derived from Cambridge Structural Database data. The best ranked structures derived from each system are then assembled into a new ensemble that is shown to be better focused on bioactive conformations. This pluralistic approach is tested on ensembles generated by the Molecular Operating Environment's Low Mode Molecular Dynamics module, and by the Cambridge Crystallographic Data Centre's conformation generator software.

  5. Using decision trees and their ensembles for analysis of NIR spectroscopic data

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey V.

    and interpretation of the models. In this presentation, we are going to discuss an applicability of decision trees based methods (including gradient boosting) for solving classification and regression tasks with NIR spectra as predictors. We will cover such aspects as evaluation, optimization and validation......Advanced machine learning methods, like convolutional neural networks and decision trees, became extremely popular in the last decade. This, first of all, is directly related to the current boom in Big data analysis, where traditional statistical methods are not efficient. According to the kaggle.......com — the most popular online resource for Big data problems and solutions — methods based on decision trees and their ensembles are most widely used for solving the problems. It can be noted that the decision trees and convolutional neural networks are not very popular in Chemometrics. One of the reasons...

  6. Comparison of ensemble post-processing approaches, based on empirical and dynamical error modelisation of rainfall-runoff model forecasts

    Science.gov (United States)

    Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.

    2012-04-01

    In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological

  7. Enhancing COSMO-DE ensemble forecasts by inexpensive techniques

    Directory of Open Access Journals (Sweden)

    Zied Ben Bouallègue

    2013-02-01

    Full Text Available COSMO-DE-EPS, a convection-permitting ensemble prediction system based on the high-resolution numerical weather prediction model COSMO-DE, is pre-operational since December 2010, providing probabilistic forecasts which cover Germany. This ensemble system comprises 20 members based on variations of the lateral boundary conditions, the physics parameterizations and the initial conditions. In order to increase the sample size in a computationally inexpensive way, COSMO-DE-EPS is combined with alternative ensemble techniques: the neighborhood method and the time-lagged approach. Their impact on the quality of the resulting probabilistic forecasts is assessed. Objective verification is performed over a six months period, scores based on the Brier score and its decomposition are shown for June 2011. The combination of the ensemble system with the alternative approaches improves probabilistic forecasts of precipitation in particular for high precipitation thresholds. Moreover, combining COSMO-DE-EPS with only the time-lagged approach improves the skill of area probabilities for precipitation and does not deteriorate the skill of 2 m-temperature and wind gusts forecasts.

  8. Automatic structure classification of small proteins using random forest

    Directory of Open Access Journals (Sweden)

    Hirst Jonathan D

    2010-07-01

    Full Text Available Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs. An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.

  9. Scaling theory and the classification of phase transitions

    International Nuclear Information System (INIS)

    Hilfer, R.

    1992-01-01

    In this paper, the recent classification theory for phase transitions and its relation with the foundations of statistical physics is reviewed. First it is outlined how Ehrenfests classification scheme can be generalized into a general thermodynamic classification theory for phase transitions. The classification theory implies scaling and multiscaling thereby eliminating the need to postulate the scaling hypothesis as a fourth law of thermodynamics. The new classification has also led to the discovery and distinction of nonequilibrium transitions within equilibrium statistical physics. Nonequilibrium phase transitions are distinguished from equilibrium transitions by orders less than unity and by the fact the equilibrium thermodynamics and statistical mechanics become inapplicable at the critical point. The latter fact requires a change in the Gibbs assumption underlying the canonical and grandcanonical ensembles in order to recover the thermodynamic description in the critical limit

  10. Decoding of Human Movements Based on Deep Brain Local Field Potentials Using Ensemble Neural Networks

    Directory of Open Access Journals (Sweden)

    Mohammad S. Islam

    2017-01-01

    Full Text Available Decoding neural activities related to voluntary and involuntary movements is fundamental to understanding human brain motor circuits and neuromotor disorders and can lead to the development of neuromotor prosthetic devices for neurorehabilitation. This study explores using recorded deep brain local field potentials (LFPs for robust movement decoding of Parkinson’s disease (PD and Dystonia patients. The LFP data from voluntary movement activities such as left and right hand index finger clicking were recorded from patients who underwent surgeries for implantation of deep brain stimulation electrodes. Movement-related LFP signal features were extracted by computing instantaneous power related to motor response in different neural frequency bands. An innovative neural network ensemble classifier has been proposed and developed for accurate prediction of finger movement and its forthcoming laterality. The ensemble classifier contains three base neural network classifiers, namely, feedforward, radial basis, and probabilistic neural networks. The majority voting rule is used to fuse the decisions of the three base classifiers to generate the final decision of the ensemble classifier. The overall decoding performance reaches a level of agreement (kappa value at about 0.729±0.16 for decoding movement from the resting state and about 0.671±0.14 for decoding left and right visually cued movements.

  11. Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

    Science.gov (United States)

    Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

    2016-03-01

    The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care

  12. The generalization ability of online SVM classification based on Markov sampling.

    Science.gov (United States)

    Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang

    2015-03-01

    In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.

  13. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    Science.gov (United States)

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as

  14. Creating ensembles of decision trees through sampling

    Science.gov (United States)

    Kamath, Chandrika; Cantu-Paz, Erick

    2005-08-30

    A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

  15. Uncertainty analysis of neural network based flood forecasting models: An ensemble based approach for constructing prediction interval

    Science.gov (United States)

    Kasiviswanathan, K.; Sudheer, K.

    2013-05-01

    Artificial neural network (ANN) based hydrologic models have gained lot of attention among water resources engineers and scientists, owing to their potential for accurate prediction of flood flows as compared to conceptual or physics based hydrologic models. The ANN approximates the non-linear functional relationship between the complex hydrologic variables in arriving at the river flow forecast values. Despite a large number of applications, there is still some criticism that ANN's point prediction lacks in reliability since the uncertainty of predictions are not quantified, and it limits its use in practical applications. A major concern in application of traditional uncertainty analysis techniques on neural network framework is its parallel computing architecture with large degrees of freedom, which makes the uncertainty assessment a challenging task. Very limited studies have considered assessment of predictive uncertainty of ANN based hydrologic models. In this study, a novel method is proposed that help construct the prediction interval of ANN flood forecasting model during calibration itself. The method is designed to have two stages of optimization during calibration: at stage 1, the ANN model is trained with genetic algorithm (GA) to obtain optimal set of weights and biases vector, and during stage 2, the optimal variability of ANN parameters (obtained in stage 1) is identified so as to create an ensemble of predictions. During the 2nd stage, the optimization is performed with multiple objectives, (i) minimum residual variance for the ensemble mean, (ii) maximum measured data points to fall within the estimated prediction interval and (iii) minimum width of prediction interval. The method is illustrated using a real world case study of an Indian basin. The method was able to produce an ensemble that has an average prediction interval width of 23.03 m3/s, with 97.17% of the total validation data points (measured) lying within the interval. The derived

  16. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  17. Entropy of network ensembles

    Science.gov (United States)

    Bianconi, Ginestra

    2009-03-01

    In this paper we generalize the concept of random networks to describe network ensembles with nontrivial features by a statistical mechanics approach. This framework is able to describe undirected and directed network ensembles as well as weighted network ensembles. These networks might have nontrivial community structure or, in the case of networks embedded in a given space, they might have a link probability with a nontrivial dependence on the distance between the nodes. These ensembles are characterized by their entropy, which evaluates the cardinality of networks in the ensemble. In particular, in this paper we define and evaluate the structural entropy, i.e., the entropy of the ensembles of undirected uncorrelated simple networks with given degree sequence. We stress the apparent paradox that scale-free degree distributions are characterized by having small structural entropy while they are so widely encountered in natural, social, and technological complex systems. We propose a solution to the paradox by proving that scale-free degree distributions are the most likely degree distribution with the corresponding value of the structural entropy. Finally, the general framework we present in this paper is able to describe microcanonical ensembles of networks as well as canonical or hidden-variable network ensembles with significant implications for the formulation of network-constructing algorithms.

  18. Facial Expression Recognition using Multiclass Ensemble Least-Square Support Vector Machine

    Science.gov (United States)

    Lawi, Armin; Sya'Rani Machrizzandi, M.

    2018-03-01

    Facial expression is one of behavior characteristics of human-being. The use of biometrics technology system with facial expression characteristics makes it possible to recognize a person’s mood or emotion. The basic components of facial expression analysis system are face detection, face image extraction, facial classification and facial expressions recognition. This paper uses Principal Component Analysis (PCA) algorithm to extract facial features with expression parameters, i.e., happy, sad, neutral, angry, fear, and disgusted. Then Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM) is used for the classification process of facial expression. The result of MELS-SVM model obtained from our 185 different expression images of 10 persons showed high accuracy level of 99.998% using RBF kernel.

  19. Ensembl variation resources

    Directory of Open Access Journals (Sweden)

    Marin-Garcia Pablo

    2010-05-01

    Full Text Available Abstract Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.

  20. Generation of scenarios from calibrated ensemble forecasts with a dual ensemble copula coupling approach

    DEFF Research Database (Denmark)

    Ben Bouallègue, Zied; Heppelmann, Tobias; Theis, Susanne E.

    2016-01-01

    the original ensemble forecasts. Based on the assumption of error stationarity, parametric methods aim to fully describe the forecast dependence structures. In this study, the concept of ECC is combined with past data statistics in order to account for the autocorrelation of the forecast error. The new...... approach, called d-ECC, is applied to wind forecasts from the high resolution ensemble system COSMO-DE-EPS run operationally at the German weather service. Scenarios generated by ECC and d-ECC are compared and assessed in the form of time series by means of multivariate verification tools and in a product...

  1. A review of supervised object-based land-cover image classification

    Science.gov (United States)

    Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue

    2017-08-01

    Object-based image classification for land-cover mapping purposes using remote-sensing imagery has attracted significant attention in recent years. Numerous studies conducted over the past decade have investigated a broad array of sensors, feature selection, classifiers, and other factors of interest. However, these research results have not yet been synthesized to provide coherent guidance on the effect of different supervised object-based land-cover classification processes. In this study, we first construct a database with 28 fields using qualitative and quantitative information extracted from 254 experimental cases described in 173 scientific papers. Second, the results of the meta-analysis are reported, including general characteristics of the studies (e.g., the geographic range of relevant institutes, preferred journals) and the relationships between factors of interest (e.g., spatial resolution and study area or optimal segmentation scale, accuracy and number of targeted classes), especially with respect to the classification accuracy of different sensors, segmentation scale, training set size, supervised classifiers, and land-cover types. Third, useful data on supervised object-based image classification are determined from the meta-analysis. For example, we find that supervised object-based classification is currently experiencing rapid advances, while development of the fuzzy technique is limited in the object-based framework. Furthermore, spatial resolution correlates with the optimal segmentation scale and study area, and Random Forest (RF) shows the best performance in object-based classification. The area-based accuracy assessment method can obtain stable classification performance, and indicates a strong correlation between accuracy and training set size, while the accuracy of the point-based method is likely to be unstable due to mixed objects. In addition, the overall accuracy benefits from higher spatial resolution images (e.g., unmanned aerial

  2. NYYD Ensemble

    Index Scriptorium Estoniae

    2002-01-01

    NYYD Ensemble'i duost Traksmann - Lukk E.-S. Tüüri teosega "Symbiosis", mis on salvestatud ka hiljuti ilmunud NYYD Ensemble'i CDle. 2. märtsil Rakvere Teatri väikeses saalis ja 3. märtsil Rotermanni Soolalaos, kavas Tüür, Kaumann, Berio, Reich, Yun, Hauta-aho, Buckinx

  3. Development of multimodel ensemble based district level medium ...

    Indian Academy of Sciences (India)

    tively by computing the anomaly correlation coef- ficient between the predicted rainfall and observed rainfall. High resolution (lat./long.) gridded data ..... particularly in the prediction of intensity and mesoscale rainfall features causing inland flooding. During recent years, Ensemble. Prediction System (EPS) has emerged as ...

  4. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    Science.gov (United States)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  5. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Directory of Open Access Journals (Sweden)

    Enrico Glaab

    Full Text Available Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  6. Structure-based classification and ontology in chemistry

    Directory of Open Access Journals (Sweden)

    Hastings Janna

    2012-04-01

    Full Text Available Abstract Background Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures, while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. Results We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. Conclusion Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational

  7. Contextual segment-based classification of airborne laser scanner data

    NARCIS (Netherlands)

    Vosselman, George; Coenen, Maximilian; Rottensteiner, Franz

    2017-01-01

    Classification of point clouds is needed as a first step in the extraction of various types of geo-information from point clouds. We present a new approach to contextual classification of segmented airborne laser scanning data. Potential advantages of segment-based classification are easily offset

  8. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model.

    Science.gov (United States)

    Yin, Zhong; Zhao, Mengyuan; Wang, Yongxiong; Yang, Jingdong; Zhang, Jianhua

    2017-03-01

    Using deep-learning methodologies to analyze multimodal physiological signals becomes increasingly attractive for recognizing human emotions. However, the conventional deep emotion classifiers may suffer from the drawback of the lack of the expertise for determining model structure and the oversimplification of combining multimodal feature abstractions. In this study, a multiple-fusion-layer based ensemble classifier of stacked autoencoder (MESAE) is proposed for recognizing emotions, in which the deep structure is identified based on a physiological-data-driven approach. Each SAE consists of three hidden layers to filter the unwanted noise in the physiological features and derives the stable feature representations. An additional deep model is used to achieve the SAE ensembles. The physiological features are split into several subsets according to different feature extraction approaches with each subset separately encoded by a SAE. The derived SAE abstractions are combined according to the physiological modality to create six sets of encodings, which are then fed to a three-layer, adjacent-graph-based network for feature fusion. The fused features are used to recognize binary arousal or valence states. DEAP multimodal database was employed to validate the performance of the MESAE. By comparing with the best existing emotion classifier, the mean of classification rate and F-score improves by 5.26%. The superiority of the MESAE against the state-of-the-art shallow and deep emotion classifiers has been demonstrated under different sizes of the available physiological instances. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    Science.gov (United States)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  10. Ensemble Kalman filtering with one-step-ahead smoothing

    KAUST Repository

    Raboudi, Naila F.

    2018-01-11

    The ensemble Kalman filter (EnKF) is widely used for sequential data assimilation. It operates as a succession of forecast and analysis steps. In realistic large-scale applications, EnKFs are implemented with small ensembles and poorly known model error statistics. This limits their representativeness of the background error covariances and, thus, their performance. This work explores the efficiency of the one-step-ahead (OSA) smoothing formulation of the Bayesian filtering problem to enhance the data assimilation performance of EnKFs. Filtering with OSA smoothing introduces an updated step with future observations, conditioning the ensemble sampling with more information. This should provide an improved background ensemble in the analysis step, which may help to mitigate the suboptimal character of EnKF-based methods. Here, the authors demonstrate the efficiency of a stochastic EnKF with OSA smoothing for state estimation. They then introduce a deterministic-like EnKF-OSA based on the singular evolutive interpolated ensemble Kalman (SEIK) filter. The authors show that the proposed SEIK-OSA outperforms both SEIK, as it efficiently exploits the data twice, and the stochastic EnKF-OSA, as it avoids observational error undersampling. They present extensive assimilation results from numerical experiments conducted with the Lorenz-96 model to demonstrate SEIK-OSA’s capabilities.

  11. Analysis of ensemble learning using simple perceptrons based on online learning theory

    Science.gov (United States)

    Miyoshi, Seiji; Hara, Kazuyuki; Okada, Masato

    2005-03-01

    Ensemble learning of K nonlinear perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. This paper shows that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning, and AdaTron (adaptive perceptron) learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is “maintaining variety among students.” Results show that AdaTron learning is superior to the other two rules with respect to that affinity.

  12. Representing Color Ensembles.

    Science.gov (United States)

    Chetverikov, Andrey; Campana, Gianluca; Kristjánsson, Árni

    2017-10-01

    Colors are rarely uniform, yet little is known about how people represent color distributions. We introduce a new method for studying color ensembles based on intertrial learning in visual search. Participants looked for an oddly colored diamond among diamonds with colors taken from either uniform or Gaussian color distributions. On test trials, the targets had various distances in feature space from the mean of the preceding distractor color distribution. Targets on test trials therefore served as probes into probabilistic representations of distractor colors. Test-trial response times revealed a striking similarity between the physical distribution of colors and their internal representations. The results demonstrate that the visual system represents color ensembles in a more detailed way than previously thought, coding not only mean and variance but, most surprisingly, the actual shape (uniform or Gaussian) of the distribution of colors in the environment.

  13. Integrating Globality and Locality for Robust Representation Based Classification

    Directory of Open Access Journals (Sweden)

    Zheng Zhang

    2014-01-01

    Full Text Available The representation based classification method (RBCM has shown huge potential for face recognition since it first emerged. Linear regression classification (LRC method and collaborative representation classification (CRC method are two well-known RBCMs. LRC and CRC exploit training samples of each class and all the training samples to represent the testing sample, respectively, and subsequently conduct classification on the basis of the representation residual. LRC method can be viewed as a “locality representation” method because it just uses the training samples of each class to represent the testing sample and it cannot embody the effectiveness of the “globality representation.” On the contrary, it seems that CRC method cannot own the benefit of locality of the general RBCM. Thus we propose to integrate CRC and LRC to perform more robust representation based classification. The experimental results on benchmark face databases substantially demonstrate that the proposed method achieves high classification accuracy.

  14. EMG finger movement classification based on ANFIS

    Science.gov (United States)

    Caesarendra, W.; Tjahjowidodo, T.; Nico, Y.; Wahyudati, S.; Nurhasanah, L.

    2018-04-01

    An increase number of people suffering from stroke has impact to the rapid development of finger hand exoskeleton to enable an automatic physical therapy. Prior to the development of finger exoskeleton, a research topic yet important i.e. machine learning of finger gestures classification is conducted. This paper presents a study on EMG signal classification of 5 finger gestures as a preliminary study toward the finger exoskeleton design and development in Indonesia. The EMG signals of 5 finger gestures were acquired using Myo EMG sensor. The EMG signal features were extracted and reduced using PCA. The ANFIS based learning is used to classify reduced features of 5 finger gestures. The result shows that the classification of finger gestures is less than the classification of 7 hand gestures.

  15. Reproducing multi-model ensemble average with Ensemble-averaged Reconstructed Forcings (ERF) in regional climate modeling

    Science.gov (United States)

    Erfanian, A.; Fomenko, L.; Wang, G.

    2016-12-01

    Multi-model ensemble (MME) average is considered the most reliable for simulating both present-day and future climates. It has been a primary reference for making conclusions in major coordinated studies i.e. IPCC Assessment Reports and CORDEX. The biases of individual models cancel out each other in MME average, enabling the ensemble mean to outperform individual members in simulating the mean climate. This enhancement however comes with tremendous computational cost, which is especially inhibiting for regional climate modeling as model uncertainties can originate from both RCMs and the driving GCMs. Here we propose the Ensemble-based Reconstructed Forcings (ERF) approach to regional climate modeling that achieves a similar level of bias reduction at a fraction of cost compared with the conventional MME approach. The new method constructs a single set of initial and boundary conditions (IBCs) by averaging the IBCs of multiple GCMs, and drives the RCM with this ensemble average of IBCs to conduct a single run. Using a regional climate model (RegCM4.3.4-CLM4.5), we tested the method over West Africa for multiple combination of (up to six) GCMs. Our results indicate that the performance of the ERF method is comparable to that of the MME average in simulating the mean climate. The bias reduction seen in ERF simulations is achieved by using more realistic IBCs in solving the system of equations underlying the RCM physics and dynamics. This endows the new method with a theoretical advantage in addition to reducing computational cost. The ERF output is an unaltered solution of the RCM as opposed to a climate state that might not be physically plausible due to the averaging of multiple solutions with the conventional MME approach. The ERF approach should be considered for use in major international efforts such as CORDEX. Key words: Multi-model ensemble, ensemble analysis, ERF, regional climate modeling

  16. A Pseudoproxy-Ensemble Study of Late-Holocene Climate Field Reconstructions Using CCA

    Science.gov (United States)

    Amrhein, D. E.; Smerdon, J. E.

    2009-12-01

    Recent evaluations of late-Holocene multi-proxy reconstruction methods have used pseudoproxy experiments derived from millennial General Circulation Model (GCM) integrations. These experiments assess the performance of a reconstruction technique by comparing pseudoproxy reconstructions, which use restricted subsets of model data, against complete GCM data fields. Most previous studies have tested methodologies using different pseudoproxy noise levels, but only with single realizations for each noise classification. A more robust evaluation of performance is to create an ensemble of pseudoproxy networks with distinct sets of noise realizations and a corresponding reconstruction ensemble that can be evaluated for consistency and sensitivity to random error. This work investigates canonical correlation analysis (CCA) as a late-Holocene climate field reconstruction (CFR) technique using ensembles of pseudoproxy experiments derived from the NCAR CSM 1.4 millennial integration. Three 200-member reconstruction ensembles are computed using pseudoproxies with signal-to-noise ratios (by standard deviation) of 1, 0.5, and 0.25 and locations that approximate the spatial distribution of real-world multiproxy networks. An important component of these ensemble calculations is the independent optimization of the three CCA truncation parameters for each ensemble member. This task is accomplished using an inexpensive discrete optimization algorithm that minimizes both RMS error in the calibration interval and the number of free parameters in the reconstruction model to avoid artificial skill. Within this framework, CCA is investigated for its sensitivity to the level of noise in the pseudoproxy network and the spatial distribution of the network. Warm biases, variance losses, and validation-interval error increase with noise level and vary spatially within the reconstructed fields. Reconstruction skill, measured as grid-point correlations during the validation interval, is lowest in

  17. JEnsembl: a version-aware Java API to Ensembl data systems.

    Science.gov (United States)

    Paterson, Trevor; Law, Andy

    2012-11-01

    The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).

  18. Chinese Sentence Classification Based on Convolutional Neural Network

    Science.gov (United States)

    Gu, Chengwei; Wu, Ming; Zhang, Chuang

    2017-10-01

    Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.

  19. Preliminary Research on Grassland Fine-classification Based on MODIS

    International Nuclear Information System (INIS)

    Hu, Z W; Zhang, S; Yu, X Y; Wang, X S

    2014-01-01

    Grassland ecosystem is important for climatic regulation, maintaining the soil and water. Research on the grassland monitoring method could provide effective reference for grassland resource investigation. In this study, we used the vegetation index method for grassland classification. There are several types of climate in China. Therefore, we need to use China's Main Climate Zone Maps and divide the study region into four climate zones. Based on grassland classification system of the first nation-wide grass resource survey in China, we established a new grassland classification system which is only suitable for this research. We used MODIS images as the basic data resources, and use the expert classifier method to perform grassland classification. Based on the 1:1,000,000 Grassland Resource Map of China, we obtained the basic distribution of all the grassland types and selected 20 samples evenly distributed in each type, then used NDVI/EVI product to summarize different spectral features of different grassland types. Finally, we introduced other classification auxiliary data, such as elevation, accumulate temperature (AT), humidity index (HI) and rainfall. China's nation-wide grassland classification map is resulted by merging the grassland in different climate zone. The overall classification accuracy is 60.4%. The result indicated that expert classifier is proper for national wide grassland classification, but the classification accuracy need to be improved

  20. Exploiting Reject Option in Classification for Social Discrimination Control

    KAUST Repository

    Kamiran, Faisal

    2017-09-29

    Social discrimination is said to occur when an unfavorable decision for an individual is influenced by her membership to certain protected groups such as females and minority ethnic groups. Such discriminatory decisions often exist in historical data. Despite recent works in discrimination-aware data mining, there remains the need for robust, yet easily usable, methods for discrimination control. In this paper, we utilize reject option in classification, a general decision theoretic framework for handling instances whose labels are uncertain, for modeling and controlling discriminatory decisions. Specifically, this framework permits a formal treatment of the intuition that instances close to the decision boundary are more likely to be discriminated in a dataset. Based on this framework, we present three different solutions for discrimination-aware classification. The first solution invokes probabilistic rejection in single or multiple probabilistic classifiers while the second solution relies upon ensemble rejection in classifier ensembles. The third solution integrates one of the first two solutions with situation testing which is a procedure commonly used in the court of law. All solutions are easy to use and provide strong justifications for the decisions. We evaluate our solutions extensively on four real-world datasets and compare their performances with previously proposed discrimination-aware classifiers. The results demonstrate the superiority of our solutions in terms of both performance and flexibility of applicability. In particular, our solutions are effective at removing illegal discrimination from the predictions.

  1. An Authentication Technique Based on Classification

    Institute of Scientific and Technical Information of China (English)

    李钢; 杨杰

    2004-01-01

    We present a novel watermarking approach based on classification for authentication, in which a watermark is embedded into the host image. When the marked image is modified, the extracted watermark is also different to the original watermark, and different kinds of modification lead to different extracted watermarks. In this paper, different kinds of modification are considered as classes, and we used classification algorithm to recognize the modifications with high probability. Simulation results show that the proposed method is potential and effective.

  2. Robust Ensemble Filtering and Its Relation to Covariance Inflation in the Ensemble Kalman Filter

    KAUST Repository

    Luo, Xiaodong; Hoteit, Ibrahim

    2011-01-01

    A robust ensemble filtering scheme based on the H∞ filtering theory is proposed. The optimal H∞ filter is derived by minimizing the supremum (or maximum) of a predefined cost function, a criterion different from the minimum variance used

  3. Improving the ensemble-optimization method through covariance-matrix adaptation

    NARCIS (Netherlands)

    Fonseca, R.M.; Leeuwenburgh, O.; Hof, P.M.J. van den; Jansen, J.D.

    2015-01-01

    Ensemble optimization (referred to throughout the remainder of the paper as EnOpt) is a rapidly emerging method for reservoirmodel-based production optimization. EnOpt uses an ensemble of controls to approximate the gradient of the objective function with respect to the controls. Current

  4. Imprinting and recalling cortical ensembles.

    Science.gov (United States)

    Carrillo-Reid, Luis; Yang, Weijian; Bando, Yuki; Peterka, Darcy S; Yuste, Rafael

    2016-08-12

    Neuronal ensembles are coactive groups of neurons that may represent building blocks of cortical circuits. These ensembles could be formed by Hebbian plasticity, whereby synapses between coactive neurons are strengthened. Here we report that repetitive activation with two-photon optogenetics of neuronal populations from ensembles in the visual cortex of awake mice builds neuronal ensembles that recur spontaneously after being imprinted and do not disrupt preexisting ones. Moreover, imprinted ensembles can be recalled by single- cell stimulation and remain coactive on consecutive days. Our results demonstrate the persistent reconfiguration of cortical circuits by two-photon optogenetics into neuronal ensembles that can perform pattern completion. Copyright © 2016, American Association for the Advancement of Science.

  5. Investigating energy-based pool structure selection in the structure ensemble modeling with experimental distance constraints: The example from a multidomain protein Pub1.

    Science.gov (United States)

    Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan

    2018-05-01

    The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.

  6. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    Science.gov (United States)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  7. In the eye of the beholder: Inhomogeneous distribution of high-resolution shapes within the random-walk ensemble

    Science.gov (United States)

    Müller, Christian L.; Sbalzarini, Ivo F.; van Gunsteren, Wilfred F.; Žagrović, Bojan; Hünenberger, Philippe H.

    2009-06-01

    The concept of high-resolution shapes (also referred to as folds or states, depending on the context) of a polymer chain plays a central role in polymer science, structural biology, bioinformatics, and biopolymer dynamics. However, although the idea of shape is intuitively very useful, there is no unambiguous mathematical definition for this concept. In the present work, the distributions of high-resolution shapes within the ideal random-walk ensembles with N =3,…,6 beads (or up to N =10 for some properties) are investigated using a systematic (grid-based) approach based on a simple working definition of shapes relying on the root-mean-square atomic positional deviation as a metric (i.e., to define the distance between pairs of structures) and a single cutoff criterion for the shape assignment. Although the random-walk ensemble appears to represent the paramount of homogeneity and randomness, this analysis reveals that the distribution of shapes within this ensemble, i.e., in the total absence of interatomic interactions characteristic of a specific polymer (beyond the generic connectivity constraint), is significantly inhomogeneous. In particular, a specific (densest) shape occurs with a local probability that is 1.28, 1.79, 2.94, and 10.05 times (N =3,…,6) higher than the corresponding average over all possible shapes (these results can tentatively be extrapolated to a factor as large as about 1028 for N =100). The qualitative results of this analysis lead to a few rather counterintuitive suggestions, namely, that, e.g., (i) a fold classification analysis applied to the random-walk ensemble would lead to the identification of random-walk "folds;" (ii) a clustering analysis applied to the random-walk ensemble would also lead to the identification random-walk "states" and associated relative free energies; and (iii) a random-walk ensemble of polymer chains could lead to well-defined diffraction patterns in hypothetical fiber or crystal diffraction experiments

  8. Deep neural network convolution (NNC) for three-class classification of diffuse lung disease opacities in high-resolution CT (HRCT): consolidation, ground-glass opacity (GGO), and normal opacity

    Science.gov (United States)

    Hashimoto, Noriaki; Suzuki, Kenji; Liu, Junchi; Hirano, Yasushi; MacMahon, Heber; Kido, Shoji

    2018-02-01

    Consolidation and ground-glass opacity (GGO) are two major types of opacities associated with diffuse lung diseases. Accurate detection and classification of such opacities are crucially important in the diagnosis of lung diseases, but the process is subjective, and suffers from interobserver variability. Our study purpose was to develop a deep neural network convolution (NNC) system for distinguishing among consolidation, GGO, and normal lung tissue in high-resolution CT (HRCT). We developed ensemble of two deep NNC models, each of which was composed of neural network regression (NNR) with an input layer, a convolution layer, a fully-connected hidden layer, and a fully-connected output layer followed by a thresholding layer. The output layer of each NNC provided a map for the likelihood of being each corresponding lung opacity of interest. The two NNC models in the ensemble were connected in a class-selection layer. We trained our NNC ensemble with pairs of input 2D axial slices and "teaching" probability maps for the corresponding lung opacity, which were obtained by combining three radiologists' annotations. We randomly selected 10 and 40 slices from HRCT scans of 172 patients for each class as a training and test set, respectively. Our NNC ensemble achieved an area under the receiver-operating-characteristic (ROC) curve (AUC) of 0.981 and 0.958 in distinction of consolidation and GGO, respectively, from normal opacity, yielding a classification accuracy of 93.3% among 3 classes. Thus, our deep-NNC-based system for classifying diffuse lung diseases achieved high accuracies for classification of consolidation, GGO, and normal opacity.

  9. Ensemble Network Architecture for Deep Reinforcement Learning

    Directory of Open Access Journals (Sweden)

    Xi-liang Chen

    2018-01-01

    Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

  10. World Music Ensemble: Kulintang

    Science.gov (United States)

    Beegle, Amy C.

    2012-01-01

    As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

  11. On-line Learning of Unlearnable True Teacher through Mobile Ensemble Teachers

    Science.gov (United States)

    Hirama, Takeshi; Hukushima, Koji

    2008-09-01

    The on-line learning of a hierarchical learning model is studied by a method based on statistical mechanics. In our model, a student of a simple perceptron learns from not a true teacher directly, but ensemble teachers who learn from a true teacher with a perceptron learning rule. Since the true teacher and ensemble teachers are expressed as nonmonotonic and simple perceptrons, respectively, the ensemble teachers go around the unlearnable true teacher with the distance between them fixed in an asymptotic steady state. The generalization performance of the student is shown to exceed that of the ensemble teachers in a transient state, as was shown in similar ensemble-teachers models. Furthermore, it is found that moving the ensemble teachers even in the steady state, in contrast to the fixed ensemble teachers, is efficient for the performance of the student.

  12. An Efficient Local Correlation Matrix Decomposition Approach for the Localization Implementation of Ensemble-Based Assimilation Methods

    Science.gov (United States)

    Zhang, Hongqin; Tian, Xiangjun

    2018-04-01

    Ensemble-based data assimilation methods often use the so-called localization scheme to improve the representation of the ensemble background error covariance (Be). Extensive research has been undertaken to reduce the computational cost of these methods by using the localized ensemble samples to localize Be by means of a direct decomposition of the local correlation matrix C. However, the computational costs of the direct decomposition of the local correlation matrix C are still extremely high due to its high dimension. In this paper, we propose an efficient local correlation matrix decomposition approach based on the concept of alternating directions. This approach is intended to avoid direct decomposition of the correlation matrix. Instead, we first decompose the correlation matrix into 1-D correlation matrices in the three coordinate directions, then construct their empirical orthogonal function decomposition at low resolution. This procedure is followed by the 1-D spline interpolation process to transform the above decompositions to the high-resolution grid. Finally, an efficient correlation matrix decomposition is achieved by computing the very similar Kronecker product. We conducted a series of comparison experiments to illustrate the validity and accuracy of the proposed local correlation matrix decomposition approach. The effectiveness of the proposed correlation matrix decomposition approach and its efficient localization implementation of the nonlinear least-squares four-dimensional variational assimilation are further demonstrated by several groups of numerical experiments based on the Advanced Research Weather Research and Forecasting model.

  13. An Improved Rotation Forest for Multi-Feature Remote-Sensing Imagery Classification

    Directory of Open Access Journals (Sweden)

    Yingchang Xiu

    2017-11-01

    Full Text Available Multi-feature, especially multi-temporal, remote-sensing data have the potential to improve land cover classification accuracy. However, sometimes it is difficult to utilize all the features efficiently. To enhance classification performance based on multi-feature imagery, an improved rotation forest, combining Principal Component Analysis (PCA and a boosting naïve Bayesian tree (NBTree, is proposed. First, feature extraction was carried out with PCA. The feature set was randomly split into several disjoint subsets; then, PCA was applied to each subset, and new training data for linear extracted features based on original training data were obtained. These steps were repeated several times. Second, based on the new training data, a boosting naïve Bayesian tree was constructed as the base classifier, which aims to achieve lower prediction error than a decision tree in the original rotation forest. At the classification phase, the improved rotation forest has two-layer voting. It first obtains several predictions through weighted voting in a boosting naïve Bayesian tree; then, the first-layer vote predicts by majority to obtain the final result. To examine the classification performance, the improved rotation forest was applied to multi-feature remote-sensing images, including MODIS Enhanced Vegetation Index (EVI imagery time series, MODIS Surface Reflectance products and ancillary data in Shandong Province for 2013. The EVI imagery time series was preprocessed using harmonic analysis of time series (HANTS to reduce the noise effects. The overall accuracy of the final classification result was 89.17%, and the Kappa coefficient was 0.71, which outperforms the original rotation forest and other classifier ensemble results, as well as the NASA land cover product. However, this new algorithm requires more computational time, meaning the efficiency needs to be further improved. Generally, the improved rotation forest has a potential advantage in

  14. Benchmarking Commercial Conformer Ensemble Generators.

    Science.gov (United States)

    Friedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, Johannes

    2017-11-27

    We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf. 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.

  15. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

    Science.gov (United States)

    Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

    2017-09-01

    Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.

  16. Regional interdependency of precipitation indices across Denmark in two ensembles of high-resolution RCMs

    DEFF Research Database (Denmark)

    Sunyer Pinya, Maria Antonia; Madsen, Henrik; Rosbjerg, Dan

    2013-01-01

    all these methods is that the climate models are independent. This study addresses the validity of this assumption for two ensembles of regional climate models (RCMs) from the Ensemble-Based Predictions of Climate Changes and their Impacts (ENSEMBLES) project based on the land cells covering Denmark....... Daily precipitation indices from an ensemble of RCMs driven by the 40-yrECMWFRe-Analysis (ERA-40) and an ensemble of the same RCMs driven by different general circulation models (GCMs) are analyzed. Two different methods are used to estimate the amount of independent information in the ensembles....... These are based on different statistical properties of a measure of climate model error. Additionally, a hierarchical cluster analysis is carried out. Regardless of the method used, the effective number of RCMs is smaller than the total number of RCMs. The estimated effective number of RCMs varies depending...

  17. Simple adaptive sparse representation based classification schemes for EEG based brain-computer interface applications.

    Science.gov (United States)

    Shin, Younghak; Lee, Seungchan; Ahn, Minkyu; Cho, Hohyun; Jun, Sung Chan; Lee, Heung-No

    2015-11-01

    One of the main problems related to electroencephalogram (EEG) based brain-computer interface (BCI) systems is the non-stationarity of the underlying EEG signals. This results in the deterioration of the classification performance during experimental sessions. Therefore, adaptive classification techniques are required for EEG based BCI applications. In this paper, we propose simple adaptive sparse representation based classification (SRC) schemes. Supervised and unsupervised dictionary update techniques for new test data and a dictionary modification method by using the incoherence measure of the training data are investigated. The proposed methods are very simple and additional computation for the re-training of the classifier is not needed. The proposed adaptive SRC schemes are evaluated using two BCI experimental datasets. The proposed methods are assessed by comparing classification results with the conventional SRC and other adaptive classification methods. On the basis of the results, we find that the proposed adaptive schemes show relatively improved classification accuracy as compared to conventional methods without requiring additional computation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Convolutional Neural Network for Multi-Source Deep Learning Crop Classification in Ukraine

    Science.gov (United States)

    Lavreniuk, M. S.

    2016-12-01

    Land cover and crop type maps are one of the most essential inputs when dealing with environmental and agriculture monitoring tasks [1]. During long time neural network (NN) approach was one of the most efficient and popular approach for most applications, including crop classification using remote sensing data, with high an overall accuracy (OA) [2]. In the last years the most popular and efficient method for multi-sensor and multi-temporal land cover classification is convolution neural networks (CNNs). Taking into account presence clouds in optical data, self-organizing Kohonen maps (SOMs) are used to restore missing pixel values in a time series of optical imagery from Landsat-8 satellite. After missing data restoration, optical data from Landsat-8 was merged with Sentinel-1A radar data for better crop types discrimination [3]. An ensemble of CNNs is proposed for multi-temporal satellite images supervised classification. Each CNN in the corresponding ensemble is a 1-d CNN with 4 layers implemented using the Google's library TensorFlow. The efficiency of the proposed approach was tested on a time-series of Landsat-8 and Sentinel-1A images over the JECAM test site (Kyiv region) in Ukraine in 2015. Overall classification accuracy for ensemble of CNNs was 93.5% that outperformed an ensemble of multi-layer perceptrons (MLPs) by +0.8% and allowed us to better discriminate summer crops, in particular maize and soybeans. For 2016 we would like to validate this method using Sentinel-1 and Sentinel-2 data for Ukraine territory within ESA project on country level demonstration Sen2Agri. 1. A. Kolotii et al., "Comparison of biophysical and satellite predictors for wheat yield forecasting in Ukraine," The Int. Arch. of Photogram., Rem. Sens. and Spatial Inform. Scie., vol. 40, no. 7, pp. 39-44, 2015. 2. F. Waldner et al., "Towards a set of agrosystem-specific cropland mapping methods to address the global cropland diversity," Int. Journal of Rem. Sens. vol. 37, no. 14, pp

  19. Improving the ensemble optimization method through covariance matrix adaptation (CMA-EnOpt)

    NARCIS (Netherlands)

    Fonseca, R.M.; Leeuwenburgh, O.; Hof, P.M.J. van den; Jansen, J.D.

    2013-01-01

    Ensemble Optimization (EnOpt) is a rapidly emerging method for reservoir model based production optimization. EnOpt uses an ensemble of controls to approximate the gradient of the objective function with respect to the controls. Current implementations of EnOpt use a Gaussian ensemble with a

  20. Wavelet images and Chou's pseudo amino acid composition for protein classification.

    Science.gov (United States)

    Nanni, Loris; Brahnam, Sheryl; Lumini, Alessandra

    2012-08-01

    The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou's pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar .

  1. Hybrid nanomembrane-based capacitors for the determination of the dielectric constant of semiconducting molecular ensembles

    Science.gov (United States)

    Petrini, Paula A.; Silva, Ricardo M. L.; de Oliveira, Rafael F.; Merces, Leandro; Bof Bufon, Carlos C.

    2018-06-01

    Considerable advances in the field of molecular electronics have been achieved over the recent years. One persistent challenge, however, is the exploitation of the electronic properties of molecules fully integrated into devices. Typically, the molecular electronic properties are investigated using sophisticated techniques incompatible with a practical device technology, such as the scanning tunneling microscopy. The incorporation of molecular materials in devices is not a trivial task as the typical dimensions of electrical contacts are much larger than the molecular ones. To tackle this issue, we report on hybrid capacitors using mechanically-compliant nanomembranes to encapsulate ultrathin molecular ensembles for the investigation of molecular dielectric properties. As the prototype material, copper (II) phthalocyanine (CuPc) has been chosen as information on its dielectric constant (k CuPc) at the molecular scale is missing. Here, hybrid nanomembrane-based capacitors containing metallic nanomembranes, insulating Al2O3 layers, and the CuPc molecular ensembles have been fabricated and evaluated. The Al2O3 is used to prevent short circuits through the capacitor plates as the molecular layer is considerably thin (electrical measurements of devices with molecular layers of different thicknesses, the CuPc dielectric constant has been reliably determined (k CuPc = 4.5 ± 0.5). These values suggest a mild contribution of the molecular orientation on the CuPc dielectric properties. The reported nanomembrane-based capacitor is a viable strategy for the dielectric characterization of ultrathin molecular ensembles integrated into a practical, real device technology.

  2. Comparison of standard resampling methods for performance estimation of artificial neural network ensembles

    OpenAIRE

    Green, Michael; Ohlsson, Mattias

    2007-01-01

    Estimation of the generalization performance for classification within the medical applications domain is always an important task. In this study we focus on artificial neural network ensembles as the machine learning technique. We present a numerical comparison between five common resampling techniques: k-fold cross validation (CV), holdout, using three cutoffs, and bootstrap using five different data sets. The results show that CV together with holdout $0.25$ and $0.50$ are the best resampl...

  3. Fluorescent Binary Ensemble Based on Pyrene Derivative and Sodium Dodecyl Sulfate Assemblies as a Chemical Tongue for Discriminating Metal Ions and Brand Water.

    Science.gov (United States)

    Zhang, Lijun; Huang, Xinyan; Cao, Yuan; Xin, Yunhong; Ding, Liping

    2017-12-22

    Enormous effort has been put to the detection and recognition of various heavy metal ions due to their involvement in serious environmental pollution and many major diseases. The present work has developed a single fluorescent sensor ensemble that can distinguish and identify a variety of heavy metal ions. A pyrene-based fluorophore (PB) containing a metal ion receptor group was specially designed and synthesized. Anionic surfactant sodium dodecyl sulfate (SDS) assemblies can effectively adjust its fluorescence behavior. The selected binary ensemble based on PB/SDS assemblies can exhibit multiple emission bands and provide wavelength-based cross-reactive responses to a series of metal ions to realize pattern recognition ability. The combination of surfactant assembly modulation and the receptor for metal ions empowers the present sensor ensemble with strong discrimination power, which could well differentiate 13 metal ions, including Cu 2+ , Co 2+ , Ni 2+ , Cr 3+ , Hg 2+ , Fe 3+ , Zn 2+ , Cd 2+ , Al 3+ , Pb 2+ , Ca 2+ , Mg 2+ , and Ba 2+ . Moreover, this single sensing ensemble could be further applied for identifying different brands of drinking water.

  4. 3-D visualization of ensemble weather forecasts - Part 2: Forecasting warm conveyor belt situations for aircraft-based field campaigns

    Science.gov (United States)

    Rautenhaus, M.; Grams, C. M.; Schäfler, A.; Westermann, R.

    2015-02-01

    We present the application of interactive 3-D visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the ECMWF ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and forecast wind field resolution. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (three to seven days before take-off).

  5. ICF-based classification and measurement of functioning.

    Science.gov (United States)

    Stucki, G; Kostanjsek, N; Ustün, B; Cieza, A

    2008-09-01

    If we aim towards a comprehensive understanding of human functioning and the development of comprehensive programs to optimize functioning of individuals and populations we need to develop suitable measures. The approval of the International Classification, Disability and Health (ICF) in 2001 by the 54th World Health Assembly as the first universally shared model and classification of functioning, disability and health marks, therefore an important step in the development of measurement instruments and ultimately for our understanding of functioning, disability and health. The acceptance and use of the ICF as a reference framework and classification has been facilitated by its development in a worldwide, comprehensive consensus process and the increasing evidence regarding its validity. However, the broad acceptance and use of the ICF as a reference framework and classification will also depend on the resolution of conceptual and methodological challenges relevant for the classification and measurement of functioning. This paper therefore describes first how the ICF categories can serve as building blocks for the measurement of functioning and then the current state of the development of ICF based practical tools and international standards such as the ICF Core Sets. Finally it illustrates how to map the world of measures to the ICF and vice versa and the methodological principles relevant for the transformation of information obtained with a clinical test or a patient-oriented instrument to the ICF as well as the development of ICF-based clinical and self-reported measurement instruments.

  6. Asymmetric similarity-weighted ensembles for image segmentation

    DEFF Research Database (Denmark)

    Cheplygina, V.; Van Opbroek, A.; Ikram, M. A.

    2016-01-01

    Supervised classification is widely used for image segmentation. To work effectively, these techniques need large amounts of labeled training data, that is representative of the test data. Different patient groups, different scanners or different scanning protocols can lead to differences between...... the images, thus representative data might not be available. Transfer learning techniques can be used to account for these differences, thus taking advantage of all the available data acquired with different protocols. We investigate the use of classifier ensembles, where each classifier is weighted...... and the direction of measurement needs to be chosen carefully. We also show that a point set similarity measure is robust across different studies, and outperforms state-of-the-art results on a multi-center brain tissue segmentation task....

  7. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  8. On the proper use of Ensembles for Predictive Uncertainty assessment

    Science.gov (United States)

    Todini, Ezio; Coccia, Gabriele; Ortiz, Enrique

    2015-04-01

    Probabilistic forecasting has become popular in the last decades. Hydrological probabilistic forecasts have been based either on uncertainty processors (Krzysztofowic, 1999; Todini, 2004; Todini, 2008) or on ensembles, following meteorological traditional approaches and the establishment of the HEPEX program (http://hepex.irstea.fr. Unfortunately, the direct use of ensembles as a measure of the predictive density is an incorrect practice, because the ensemble measures the spread of the forecast instead of, following the definition of predictive uncertainty, the conditional probability of the future outcome conditional on the forecast. Only few correct approaches are reported in the literature, which correctly use the ensemble to estimate an expected conditional predictive density (Reggiani et al., 2009), similarly to what is done when several predictive models are available as in the BMA (Raftery et al., 2005) or MCP(Todini, 2008; Coccia and Todini, 2011) approaches. A major problem, limiting the correct use of ensembles, is in fact the difficulty of defining the time dependence of the ensemble members, due to the lack of a consistent ranking: in other words, when dealing with multiple models, the ith model remains the ith model regardless to the time of forecast, while this does not happen when dealing with ensemble members, since there is no definition for the ith member of an ensemble. Nonetheless, the MCP approach (Todini, 2008; Coccia and Todini, 2011), essentially based on a multiple regression in the Normal space, can be easily extended to use ensembles to represent the local (in time) smaller or larger conditional predictive uncertainty, as a function of the ensemble spread. This is done by modifying the classical linear regression equations, impliying perfectly observed predictors, to alternative regression equations similar to the Kalman filter ones, allowing for uncertain predictors. In this way, each prediction in time accounts for both the predictive

  9. Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

    Directory of Open Access Journals (Sweden)

    Bogdan Gliwa

    2011-01-01

    Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.

  10. A two-stage method of quantitative flood risk analysis for reservoir real-time operation using ensemble-based hydrologic forecasts

    Science.gov (United States)

    Liu, P.

    2013-12-01

    Quantitative analysis of the risk for reservoir real-time operation is a hard task owing to the difficulty of accurate description of inflow uncertainties. The ensemble-based hydrologic forecasts directly depict the inflows not only the marginal distributions but also their persistence via scenarios. This motivates us to analyze the reservoir real-time operating risk with ensemble-based hydrologic forecasts as inputs. A method is developed by using the forecast horizon point to divide the future time into two stages, the forecast lead-time and the unpredicted time. The risk within the forecast lead-time is computed based on counting the failure number of forecast scenarios, and the risk in the unpredicted time is estimated using reservoir routing with the design floods and the reservoir water levels of forecast horizon point. As a result, a two-stage risk analysis method is set up to quantify the entire flood risks by defining the ratio of the number of scenarios that excessive the critical value to the total number of scenarios. The China's Three Gorges Reservoir (TGR) is selected as a case study, where the parameter and precipitation uncertainties are implemented to produce ensemble-based hydrologic forecasts. The Bayesian inference, Markov Chain Monte Carlo, is used to account for the parameter uncertainty. Two reservoir operation schemes, the real operated and scenario optimization, are evaluated for the flood risks and hydropower profits analysis. With the 2010 flood, it is found that the improvement of the hydrologic forecast accuracy is unnecessary to decrease the reservoir real-time operation risk, and most risks are from the forecast lead-time. It is therefore valuable to decrease the avarice of ensemble-based hydrologic forecasts with less bias for a reservoir operational purpose.

  11. Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    LI Jian-Wei

    2014-08-01

    Full Text Available On the basis of the cluster validity function based on geometric probability in literature [1, 2], propose a cluster analysis method based on geometric probability to process large amount of data in rectangular area. The basic idea is top-down stepwise refinement, firstly categories then subcategories. On all clustering levels, use the cluster validity function based on geometric probability firstly, determine clusters and the gathering direction, then determine the center of clustering and the border of clusters. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.

  12. Device and Method for Gathering Ensemble Data Sets

    Science.gov (United States)

    Racette, Paul E. (Inventor)

    2014-01-01

    An ensemble detector uses calibrated noise references to produce ensemble sets of data from which properties of non-stationary processes may be extracted. The ensemble detector comprising: a receiver; a switching device coupled to the receiver, the switching device configured to selectively connect each of a plurality of reference noise signals to the receiver; and a gain modulation circuit coupled to the receiver and configured to vary a gain of the receiver based on a forcing signal; whereby the switching device selectively connects each of the plurality of reference noise signals to the receiver to produce an output signal derived from the plurality of reference noise signals and the forcing signal.

  13. A study on reducing update frequency of the forecast samples in the ensemble-based 4DVar data assimilation method

    Energy Technology Data Exchange (ETDEWEB)

    Shao, Aimei; Xu, Daosheng [Lanzhou Univ. (China). Key Lab. of Arid Climatic Changing and Reducing Disaster of Gansu Province; Chinese Academy of Meteorological Sciences, Beijing (China). State Key Lab. of Severe Weather; Qiu, Xiaobin [Lanzhou Univ. (China). Key Lab. of Arid Climatic Changing and Reducing Disaster of Gansu Province; Tianjin Institute of Meteorological Science (China); Qiu, Chongjian [Lanzhou Univ. (China). Key Lab. of Arid Climatic Changing and Reducing Disaster of Gansu Province

    2013-02-15

    In the ensemble-based four dimensional variational assimilation method (SVD-En4DVar), a singular value decomposition (SVD) technique is used to select the leading eigenvectors and the analysis variables are expressed as the orthogonal bases expansion of the eigenvectors. The experiments with a two-dimensional shallow-water equation model and simulated observations show that the truncation error and rejection of observed signals due to the reduced-dimensional reconstruction of the analysis variable are the major factors that damage the analysis when the ensemble size is not large enough. However, a larger-sized ensemble is daunting computational burden. Experiments with a shallow-water equation model also show that the forecast error covariances remain relatively constant over time. For that reason, we propose an approach that increases the members of the forecast ensemble while reducing the update frequency of the forecast error covariance in order to increase analysis accuracy and to reduce the computational cost. A series of experiments were conducted with the shallow-water equation model to test the efficiency of this approach. The experimental results indicate that this approach is promising. Further experiments with the WRF model show that this approach is also suitable for the real atmospheric data assimilation problem, but the update frequency of the forecast error covariances should not be too low. (orig.)

  14. Development of a regional ensemble prediction method for probabilistic weather prediction

    International Nuclear Information System (INIS)

    Nohara, Daisuke; Tamura, Hidetoshi; Hirakuchi, Hiromaru

    2015-01-01

    A regional ensemble prediction method has been developed to provide probabilistic weather prediction using a numerical weather prediction model. To obtain consistent perturbations with the synoptic weather pattern, both of initial and lateral boundary perturbations were given by differences between control and ensemble member of the Japan Meteorological Agency (JMA)'s operational one-week ensemble forecast. The method provides a multiple ensemble member with a horizontal resolution of 15 km for 48-hour based on a downscaling of the JMA's operational global forecast accompanied with the perturbations. The ensemble prediction was examined in the case of heavy snow fall event in Kanto area on January 14, 2013. The results showed that the predictions represent different features of high-resolution spatiotemporal distribution of precipitation affected by intensity and location of extra-tropical cyclone in each ensemble member. Although the ensemble prediction has model bias of mean values and variances in some variables such as wind speed and solar radiation, the ensemble prediction has a potential to append a probabilistic information to a deterministic prediction. (author)

  15. Analysis and Classification of Stride Patterns Associated with Children Development Using Gait Signal Dynamics Parameters and Ensemble Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Meihong Wu

    2016-01-01

    Full Text Available Measuring stride variability and dynamics in children is useful for the quantitative study of gait maturation and neuromotor development in childhood and adolescence. In this paper, we computed the sample entropy (SampEn and average stride interval (ASI parameters to quantify the stride series of 50 gender-matched children participants in three age groups. We also normalized the SampEn and ASI values by leg length and body mass for each participant, respectively. Results show that the original and normalized SampEn values consistently decrease over the significance level of the Mann-Whitney U test (p<0.01 in children of 3–14 years old, which indicates the stride irregularity has been significantly ameliorated with the body growth. The original and normalized ASI values are also significantly changing when comparing between any two groups of young (aged 3–5 years, middle (aged 6–8 years, and elder (aged 10–14 years children. Such results suggest that healthy children may better modulate their gait cadence rhythm with the development of their musculoskeletal and neurological systems. In addition, the AdaBoost.M2 and Bagging algorithms were used to effectively distinguish the children’s gait patterns. These ensemble learning algorithms both provided excellent gait classification results in terms of overall accuracy (≥90%, recall (≥0.8, and precision (≥0.8077.

  16. Multi-model ensembles for assessment of flood losses and associated uncertainty

    Science.gov (United States)

    Figueiredo, Rui; Schröter, Kai; Weiss-Motz, Alexander; Martina, Mario L. V.; Kreibich, Heidi

    2018-05-01

    Flood loss modelling is a crucial part of risk assessments. However, it is subject to large uncertainty that is often neglected. Most models available in the literature are deterministic, providing only single point estimates of flood loss, and large disparities tend to exist among them. Adopting any one such model in a risk assessment context is likely to lead to inaccurate loss estimates and sub-optimal decision-making. In this paper, we propose the use of multi-model ensembles to address these issues. This approach, which has been applied successfully in other scientific fields, is based on the combination of different model outputs with the aim of improving the skill and usefulness of predictions. We first propose a model rating framework to support ensemble construction, based on a probability tree of model properties, which establishes relative degrees of belief between candidate models. Using 20 flood loss models in two test cases, we then construct numerous multi-model ensembles, based both on the rating framework and on a stochastic method, differing in terms of participating members, ensemble size and model weights. We evaluate the performance of ensemble means, as well as their probabilistic skill and reliability. Our results demonstrate that well-designed multi-model ensembles represent a pragmatic approach to consistently obtain more accurate flood loss estimates and reliable probability distributions of model uncertainty.

  17. Localization of atomic ensembles via superfluorescence

    International Nuclear Information System (INIS)

    Macovei, Mihai; Evers, Joerg; Keitel, Christoph H.; Zubairy, M. Suhail

    2007-01-01

    The subwavelength localization of an ensemble of atoms concentrated to a small volume in space is investigated. The localization relies on the interaction of the ensemble with a standing wave laser field. The light scattered in the interaction of the standing wave field and the atom ensemble depends on the position of the ensemble relative to the standing wave nodes. This relation can be described by a fluorescence intensity profile, which depends on the standing wave field parameters and the ensemble properties and which is modified due to collective effects in the ensemble of nearby particles. We demonstrate that the intensity profile can be tailored to suit different localization setups. Finally, we apply these results to two localization schemes. First, we show how to localize an ensemble fixed at a certain position in the standing wave field. Second, we discuss localization of an ensemble passing through the standing wave field

  18. Signal classification using global dynamical models, Part I: Theory

    International Nuclear Information System (INIS)

    Kadtke, J.; Kremliovsky, M.

    1996-01-01

    Detection and classification of signals is one of the principal areas of signal processing, and the utilization of nonlinear information has long been considered as a way of improving performance beyond standard linear (e.g. spectral) techniques. Here, we develop a method for using global models of chaotic dynamical systems theory to define a signal classification processing chain, which is sensitive to nonlinear correlations in the data. We use it to demonstrate classification in high noise regimes (negative SNR), and argue that classification probabilities can be directly computed from ensemble statistics in the model coefficient space. We also develop a modification for non-stationary signals (i.e. transients) using non-autonomous ODEs. In Part II of this paper, we demonstrate the analysis on actual open ocean acoustic data from marine biologics. copyright 1996 American Institute of Physics

  19. Ensemble Sampling

    OpenAIRE

    Lu, Xiuyuan; Van Roy, Benjamin

    2017-01-01

    Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applica...

  20. Assessing an ensemble Kalman filter inference of Manning's n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    Science.gov (United States)

    Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

    2017-08-01

    Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and

  1. Multi-objective optimization for generating a weighted multi-model ensemble

    Science.gov (United States)

    Lee, H.

    2017-12-01

    Many studies have demonstrated that multi-model ensembles generally show better skill than each ensemble member. When generating weighted multi-model ensembles, the first step is measuring the performance of individual model simulations using observations. There is a consensus on the assignment of weighting factors based on a single evaluation metric. When considering only one evaluation metric, the weighting factor for each model is proportional to a performance score or inversely proportional to an error for the model. While this conventional approach can provide appropriate combinations of multiple models, the approach confronts a big challenge when there are multiple metrics under consideration. When considering multiple evaluation metrics, it is obvious that a simple averaging of multiple performance scores or model ranks does not address the trade-off problem between conflicting metrics. So far, there seems to be no best method to generate weighted multi-model ensembles based on multiple performance metrics. The current study applies the multi-objective optimization, a mathematical process that provides a set of optimal trade-off solutions based on a range of evaluation metrics, to combining multiple performance metrics for the global climate models and their dynamically downscaled regional climate simulations over North America and generating a weighted multi-model ensemble. NASA satellite data and the Regional Climate Model Evaluation System (RCMES) software toolkit are used for assessment of the climate simulations. Overall, the performance of each model differs markedly with strong seasonal dependence. Because of the considerable variability across the climate simulations, it is important to evaluate models systematically and make future projections by assigning optimized weighting factors to the models with relatively good performance. Our results indicate that the optimally weighted multi-model ensemble always shows better performance than an arithmetic

  2. Extending Correlation Filter-Based Visual Tracking by Tree-Structured Ensemble and Spatial Windowing.

    Science.gov (United States)

    Gundogdu, Erhan; Ozkan, Huseyin; Alatan, A Aydin

    2017-11-01

    Correlation filters have been successfully used in visual tracking due to their modeling power and computational efficiency. However, the state-of-the-art correlation filter-based (CFB) tracking algorithms tend to quickly discard the previous poses of the target, since they consider only a single filter in their models. On the contrary, our approach is to register multiple CFB trackers for previous poses and exploit the registered knowledge when an appearance change occurs. To this end, we propose a novel tracking algorithm [of complexity O(D) ] based on a large ensemble of CFB trackers. The ensemble [of size O(2 D ) ] is organized over a binary tree (depth D ), and learns the target appearance subspaces such that each constituent tracker becomes an expert of a certain appearance. During tracking, the proposed algorithm combines only the appearance-aware relevant experts to produce boosted tracking decisions. Additionally, we propose a versatile spatial windowing technique to enhance the individual expert trackers. For this purpose, spatial windows are learned for target objects as well as the correlation filters and then the windowed regions are processed for more robust correlations. In our extensive experiments on benchmark datasets, we achieve a substantial performance increase by using the proposed tracking algorithm together with the spatial windowing.

  3. Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting.

    Science.gov (United States)

    Coop, Robert; Mishtal, Aaron; Arel, Itamar

    2013-10-01

    Catastrophic forgetting is a well-studied attribute of most parameterized supervised learning systems. A variation of this phenomenon, in the context of feedforward neural networks, arises when nonstationary inputs lead to loss of previously learned mappings. The majority of the schemes proposed in the literature for mitigating catastrophic forgetting were not data driven and did not scale well. We introduce the fixed expansion layer (FEL) feedforward neural network, which embeds a sparsely encoding hidden layer to help mitigate forgetting of prior learned representations. In addition, we investigate a novel framework for training ensembles of FEL networks, based on exploiting an information-theoretic measure of diversity between FEL learners, to further control undesired plasticity. The proposed methodology is demonstrated on a basic classification task, clearly emphasizing its advantages over existing techniques. The architecture proposed can be enhanced to address a range of computational intelligence tasks, such as regression problems and system control.

  4. EnsembleGASVR: A novel ensemble method for classifying missense single nucleotide polymorphisms

    KAUST Repository

    Rapakoulia, Trisevgeni; Theofilatos, Konstantinos A.; Kleftogiannis, Dimitrios A.; Likothanasis, Spiridon D.; Tsakalidis, Athanasios K.; Mavroudi, Seferina P.

    2014-01-01

    do not support their predictions with confidence scores. Results: To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a twostep algorithm, which in its first step applies a novel

  5. Vision-Based Perception and Classification of Mosquitoes Using Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Masataka Fuchida

    2017-01-01

    Full Text Available The need for a novel automated mosquito perception and classification method is becoming increasingly essential in recent years, with steeply increasing number of mosquito-borne diseases and associated casualties. There exist remote sensing and GIS-based methods for mapping potential mosquito inhabitants and locations that are prone to mosquito-borne diseases, but these methods generally do not account for species-wise identification of mosquitoes in closed-perimeter regions. Traditional methods for mosquito classification involve highly manual processes requiring tedious sample collection and supervised laboratory analysis. In this research work, we present the design and experimental validation of an automated vision-based mosquito classification module that can deploy in closed-perimeter mosquito inhabitants. The module is capable of identifying mosquitoes from other bugs such as bees and flies by extracting the morphological features, followed by support vector machine-based classification. In addition, this paper presents the results of three variants of support vector machine classifier in the context of mosquito classification problem. This vision-based approach to the mosquito classification problem presents an efficient alternative to the conventional methods for mosquito surveillance, mapping and sample image collection. Experimental results involving classification between mosquitoes and a predefined set of other bugs using multiple classification strategies demonstrate the efficacy and validity of the proposed approach with a maximum recall of 98%.

  6. Mass Conservation and Positivity Preservation with Ensemble-type Kalman Filter Algorithms

    Science.gov (United States)

    Janjic, Tijana; McLaughlin, Dennis B.; Cohn, Stephen E.; Verlaan, Martin

    2013-01-01

    Maintaining conservative physical laws numerically has long been recognized as being important in the development of numerical weather prediction (NWP) models. In the broader context of data assimilation, concerted efforts to maintain conservation laws numerically and to understand the significance of doing so have begun only recently. In order to enforce physically based conservation laws of total mass and positivity in the ensemble Kalman filter, we incorporate constraints to ensure that the filter ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. We show that the analysis steps of ensemble transform Kalman filter (ETKF) algorithm and ensemble Kalman filter algorithm (EnKF) can conserve the mass integral, but do not preserve positivity. Further, if localization is applied or if negative values are simply set to zero, then the total mass is not conserved either. In order to ensure mass conservation, a projection matrix that corrects for localization effects is constructed. In order to maintain both mass conservation and positivity preservation through the analysis step, we construct a data assimilation algorithms based on quadratic programming and ensemble Kalman filtering. Mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate constraints. Some simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. The results show clear improvements in both analyses and forecasts, particularly in the presence of localized features. Behavior of the algorithm is also tested in presence of model error.

  7. Skill of Global Raw and Postprocessed Ensemble Predictions of Rainfall over Northern Tropical Africa

    Science.gov (United States)

    Vogel, Peter; Knippertz, Peter; Fink, Andreas H.; Schlueter, Andreas; Gneiting, Tilmann

    2018-04-01

    Accumulated precipitation forecasts are of high socioeconomic importance for agriculturally dominated societies in northern tropical Africa. In this study, we analyze the performance of nine operational global ensemble prediction systems (EPSs) relative to climatology-based forecasts for 1 to 5-day accumulated precipitation based on the monsoon seasons 2007-2014 for three regions within northern tropical Africa. To assess the full potential of raw ensemble forecasts across spatial scales, we apply state-of-the-art statistical postprocessing methods in form of Bayesian Model Averaging (BMA) and Ensemble Model Output Statistics (EMOS), and verify against station and spatially aggregated, satellite-based gridded observations. Raw ensemble forecasts are uncalibrated, unreliable, and underperform relative to climatology, independently of region, accumulation time, monsoon season, and ensemble. Differences between raw ensemble and climatological forecasts are large, and partly stem from poor prediction for low precipitation amounts. BMA and EMOS postprocessed forecasts are calibrated, reliable, and strongly improve on the raw ensembles, but - somewhat disappointingly - typically do not outperform climatology. Most EPSs exhibit slight improvements over the period 2007-2014, but overall have little added value compared to climatology. We suspect that the parametrization of convection is a potential cause for the sobering lack of ensemble forecast skill in a region dominated by mesoscale convective systems.

  8. Classification of research reactors and discussion of thinking of safety regulation based on the classification

    International Nuclear Information System (INIS)

    Song Chenxiu; Zhu Lixin

    2013-01-01

    Research reactors have different characteristics in the fields of reactor type, use, power level, design principle, operation model and safety performance, etc, and also have significant discrepancy in the aspect of nuclear safety regulation. This paper introduces classification of research reactors and discusses thinking of safety regulation based on the classification of research reactors. (authors)

  9. Effective Feature Selection for Classification of Promoter Sequences.

    Directory of Open Access Journals (Sweden)

    Kouser K

    Full Text Available Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine, KNN (K Nearest Neighbor and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  10. Radar Target Classification using Recursive Knowledge-Based Methods

    DEFF Research Database (Denmark)

    Jochumsen, Lars Wurtz

    The topic of this thesis is target classification of radar tracks from a 2D mechanically scanning coastal surveillance radar. The measurements provided by the radar are position data and therefore the classification is mainly based on kinematic data, which is deduced from the position. The target...... been terminated. Therefore, an update of the classification results must be made for each measurement of the target. The data for this work are collected throughout the PhD and are both collected from radars and other sensors such as GPS....

  11. Energy-efficiency based classification of the manufacturing workstation

    Science.gov (United States)

    Frumuşanu, G.; Afteni, C.; Badea, N.; Epureanu, A.

    2017-08-01

    EU Directive 92/75/EC established for the first time an energy consumption labelling scheme, further implemented by several other directives. As consequence, nowadays many products (e.g. home appliances, tyres, light bulbs, houses) have an EU Energy Label when offered for sale or rent. Several energy consumption models of manufacturing equipments have been also developed. This paper proposes an energy efficiency - based classification of the manufacturing workstation, aiming to characterize its energetic behaviour. The concept of energy efficiency of the manufacturing workstation is defined. On this base, a classification methodology has been developed. It refers to specific criteria and their evaluation modalities, together to the definition & delimitation of energy efficiency classes. The energy class position is defined after the amount of energy needed by the workstation in the middle point of its operating domain, while its extension is determined by the value of the first coefficient from the Taylor series that approximates the dependence between the energy consume and the chosen parameter of the working regime. The main domain of interest for this classification looks to be the optimization of the manufacturing activities planning and programming. A case-study regarding an actual lathe classification from energy efficiency point of view, based on two different approaches (analytical and numerical) is also included.

  12. Hybrid nanomembrane-based capacitors for the determination of the dielectric constant of semiconducting molecular ensembles.

    Science.gov (United States)

    Petrini, Paula Andreia; Lopes da Silva, Ricardo Magno; de Oliveira, Rafael Furlan; Merces, Leandro; Bufon, Carlos César Bof

    2018-04-06

    Considerable advances in the field of molecular electronics have been achieved over the recent years. One persistent challenge, however, is the exploitation of the electronic properties of molecules fully integrated into devices. Typically, the molecular electronic properties are investigated using sophisticated techniques incompatible with a practical device technology, such as the scanning tunneling microscope (STM). The incorporation of molecular materials in devices is not a trivial task since the typical dimensions of electrical contacts are much larger than the molecular ones. To tackle this issue, we report on hybrid capacitors using mechanically-compliant nanomembranes to encapsulate ultrathin molecular ensembles for the investigation of molecular dielectric properties. As the prototype material, copper (II) phthalocyanine (CuPc) has been chosen as information on its dielectric constant (kCuPc) at the molecular scale is missing. Here, hybrid nanomembrane-based capacitors containing metallic nanomembranes, insulating Al2O3 layers, and the CuPc molecular ensemble have been fabricated and evaluated. The Al2O3 is used to prevent short circuits through the capacitor plates as the molecular layer is considerably thin (< 30 nm). From the electrical measurements of devices with molecular layers of different thicknesses, the CuPc dielectric constant has been reliably determined (kCuPc = 4.5 ± 0.5). These values suggest a mild contribution of molecular orientation in the CuPc dielectric properties. The reported nanomembrane-based capacitor is a viable strategy for the dielectric characterization of ultrathin molecular ensembles integrated into a practical, real device technology. © 2018 IOP Publishing Ltd.

  13. Three-dimensional visualization of ensemble weather forecasts - Part 2: Forecasting warm conveyor belt situations for aircraft-based field campaigns

    Science.gov (United States)

    Rautenhaus, M.; Grams, C. M.; Schäfler, A.; Westermann, R.

    2015-07-01

    We present the application of interactive three-dimensional (3-D) visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 (THORPEX - North Atlantic Waveguide and Downstream Impact Experiment) campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts (WCBs) has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the European Centre for Medium Range Weather Forecasts (ECMWF) ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and grid spacing of the forecast wind field. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (3 to 7 days before take-off).

  14. NIM: A Node Influence Based Method for Cancer Classification

    Directory of Open Access Journals (Sweden)

    Yiwen Wang

    2014-01-01

    Full Text Available The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.

  15. Operational hydrological forecasting in Bavaria. Part II: Ensemble forecasting

    Science.gov (United States)

    Ehret, U.; Vogelbacher, A.; Moritz, K.; Laurent, S.; Meyer, I.; Haag, I.

    2009-04-01

    In part I of this study, the operational flood forecasting system in Bavaria and an approach to identify and quantify forecast uncertainty was introduced. The approach is split into the calculation of an empirical 'overall error' from archived forecasts and the calculation of an empirical 'model error' based on hydrometeorological forecast tests, where rainfall observations were used instead of forecasts. The 'model error' can especially in upstream catchments where forecast uncertainty is strongly dependent on the current predictability of the atrmosphere be superimposed on the spread of a hydrometeorological ensemble forecast. In Bavaria, two meteorological ensemble prediction systems are currently tested for operational use: the 16-member COSMO-LEPS forecast and a poor man's ensemble composed of DWD GME, DWD Cosmo-EU, NCEP GFS, Aladin-Austria, MeteoSwiss Cosmo-7. The determination of the overall forecast uncertainty is dependent on the catchment characteristics: 1. Upstream catchment with high influence of weather forecast a) A hydrological ensemble forecast is calculated using each of the meteorological forecast members as forcing. b) Corresponding to the characteristics of the meteorological ensemble forecast, each resulting forecast hydrograph can be regarded as equally likely. c) The 'model error' distribution, with parameters dependent on hydrological case and lead time, is added to each forecast timestep of each ensemble member d) For each forecast timestep, the overall (i.e. over all 'model error' distribution of each ensemble member) error distribution is calculated e) From this distribution, the uncertainty range on a desired level (here: the 10% and 90% percentile) is extracted and drawn as forecast envelope. f) As the mean or median of an ensemble forecast does not necessarily exhibit meteorologically sound temporal evolution, a single hydrological forecast termed 'lead forecast' is chosen and shown in addition to the uncertainty bounds. This can be

  16. Multivariate localization methods for ensemble Kalman filtering

    KAUST Repository

    Roh, S.

    2015-12-03

    In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.

  17. Microcanonical ensemble extensive thermodynamics of Tsallis statistics

    International Nuclear Information System (INIS)

    Parvan, A.S.

    2005-01-01

    The microscopic foundation of the generalized equilibrium statistical mechanics based on the Tsallis entropy is given by using the Gibbs idea of statistical ensembles of the classical and quantum mechanics.The equilibrium distribution functions are derived by the thermodynamic method based upon the use of the fundamental equation of thermodynamics and the statistical definition of the functions of the state of the system. It is shown that if the entropic index ξ = 1/q - 1 in the microcanonical ensemble is an extensive variable of the state of the system, then in the thermodynamic limit z bar = 1/(q - 1)N = const the principle of additivity and the zero law of thermodynamics are satisfied. In particular, the Tsallis entropy of the system is extensive and the temperature is intensive. Thus, the Tsallis statistics completely satisfies all the postulates of the equilibrium thermodynamics. Moreover, evaluation of the thermodynamic identities in the microcanonical ensemble is provided by the Euler theorem. The principle of additivity and the Euler theorem are explicitly proved by using the illustration of the classical microcanonical ideal gas in the thermodynamic limit

  18. Multivariate localization methods for ensemble Kalman filtering

    KAUST Repository

    Roh, S.

    2015-05-08

    In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.

  19. Multivariate localization methods for ensemble Kalman filtering

    KAUST Repository

    Roh, S.; Jun, M.; Szunyogh, I.; Genton, Marc G.

    2015-01-01

    In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.

  20. Multivariate localization methods for ensemble Kalman filtering

    Science.gov (United States)

    Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.

    2015-12-01

    In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.

  1. Microcanonical ensemble extensive thermodynamics of Tsallis statistics

    International Nuclear Information System (INIS)

    Parvan, A.S.

    2006-01-01

    The microscopic foundation of the generalized equilibrium statistical mechanics based on the Tsallis entropy is given by using the Gibbs idea of statistical ensembles of the classical and quantum mechanics. The equilibrium distribution functions are derived by the thermodynamic method based upon the use of the fundamental equation of thermodynamics and the statistical definition of the functions of the state of the system. It is shown that if the entropic index ξ=1/(q-1) in the microcanonical ensemble is an extensive variable of the state of the system, then in the thermodynamic limit z-bar =1/(q-1)N=const the principle of additivity and the zero law of thermodynamics are satisfied. In particular, the Tsallis entropy of the system is extensive and the temperature is intensive. Thus, the Tsallis statistics completely satisfies all the postulates of the equilibrium thermodynamics. Moreover, evaluation of the thermodynamic identities in the microcanonical ensemble is provided by the Euler theorem. The principle of additivity and the Euler theorem are explicitly proved by using the illustration of the classical microcanonical ideal gas in the thermodynamic limit

  2. Robust Ensemble Filtering and Its Relation to Covariance Inflation in the Ensemble Kalman Filter

    KAUST Repository

    Luo, Xiaodong

    2011-12-01

    A robust ensemble filtering scheme based on the H∞ filtering theory is proposed. The optimal H∞ filter is derived by minimizing the supremum (or maximum) of a predefined cost function, a criterion different from the minimum variance used in the Kalman filter. By design, the H∞ filter is more robust than the Kalman filter, in the sense that the estimation error in the H∞ filter in general has a finite growth rate with respect to the uncertainties in assimilation, except for a special case that corresponds to the Kalman filter. The original form of the H∞ filter contains global constraints in time, which may be inconvenient for sequential data assimilation problems. Therefore a variant is introduced that solves some time-local constraints instead, and hence it is called the time-local H∞ filter (TLHF). By analogy to the ensemble Kalman filter (EnKF), the concept of ensemble time-local H∞ filter (EnTLHF) is also proposed. The general form of the EnTLHF is outlined, and some of its special cases are discussed. In particular, it is shown that an EnKF with certain covariance inflation is essentially an EnTLHF. In this sense, the EnTLHF provides a general framework for conducting covariance inflation in the EnKF-based methods. Some numerical examples are used to assess the relative robustness of the TLHF–EnTLHF in comparison with the corresponding KF–EnKF method.

  3. Stacking Ensemble Learning for Short-Term Electricity Consumption Forecasting

    Directory of Open Access Journals (Sweden)

    Federico Divina

    2018-04-01

    Full Text Available The ability to predict short-term electric energy demand would provide several benefits, both at the economic and environmental level. For example, it would allow for an efficient use of resources in order to face the actual demand, reducing the costs associated to the production as well as the emission of CO 2 . To this aim, in this paper we propose a strategy based on ensemble learning in order to tackle the short-term load forecasting problem. In particular, our approach is based on a stacking ensemble learning scheme, where the predictions produced by three base learning methods are used by a top level method in order to produce final predictions. We tested the proposed scheme on a dataset reporting the energy consumption in Spain over more than nine years. The obtained experimental results show that an approach for short-term electricity consumption forecasting based on ensemble learning can help in combining predictions produced by weaker learning methods in order to obtain superior results. In particular, the system produces a lower error with respect to the existing state-of-the art techniques used on the same dataset. More importantly, this case study has shown that using an ensemble scheme can achieve very accurate predictions, and thus that it is a suitable approach for addressing the short-term load forecasting problem.

  4. TENSOR MODELING BASED FOR AIRBORNE LiDAR DATA CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    N. Li

    2016-06-01

    Full Text Available Feature selection and description is a key factor in classification of Earth observation data. In this paper a classification method based on tensor decomposition is proposed. First, multiple features are extracted from raw LiDAR point cloud, and raster LiDAR images are derived by accumulating features or the “raw” data attributes. Then, the feature rasters of LiDAR data are stored as a tensor, and tensor decomposition is used to select component features. This tensor representation could keep the initial spatial structure and insure the consideration of the neighborhood. Based on a small number of component features a k nearest neighborhood classification is applied.

  5. The canonical ensemble redefined - 1: Formalism

    International Nuclear Information System (INIS)

    Venkataraman, R.

    1984-12-01

    For studying the thermodynamic properties of systems we propose an ensemble that lies in between the familiar canonical and microcanonical ensembles. We point out the transition from the canonical to microcanonical ensemble and prove from a comparative study that all these ensembles do not yield the same results even in the thermodynamic limit. An investigation of the coupling between two or more systems with these ensembles suggests that the state of thermodynamical equilibrium is a special case of statistical equilibrium. (author)

  6. ENSEMBLE methods to reconcile disparate national long range dispersion forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Mikkelsen, T; Galmarini, S; Bianconi, R; French, S [eds.

    2003-11-01

    ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparate national forecasts for long-range dispersion. ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an accidental atmospheric release of radioactive material. A series of new decision-making 'ENSEMBLE' procedures and Web-based software evaluation and exchange tools have been created for real-time reconciliation and harmonisation of real-time dispersion forecasts from meteorological and emergency centres across Europe during an accident. The new ENSEMBLE software tools is available to participating national emergency and meteorological forecasting centres, which may choose to integrate them directly into operational emergency information systems, or possibly use them as a basis for future system development. (au)

  7. ENSEMBLE methods to reconcile disparate national long range dispersion forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Mikkelsen, T.; Galmarini, S.; Bianconi, R.; French, S. (eds.)

    2003-11-01

    ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparate national forecasts for long-range dispersion. ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an accidental atmospheric release of radioactive material. A series of new decision-making 'ENSEMBLE' procedures and Web-based software evaluation and exchange tools have been created for real-time reconciliation and harmonisation of real-time dispersion forecasts from meteorological and emergency centres across Europe during an accident. The new ENSEMBLE software tools is available to participating national emergency and meteorological forecasting centres, which may choose to integrate them directly into operational emergency information systems, or possibly use them as a basis for future system development. (au)

  8. A multi-stage intelligent approach based on an ensemble of two-way interaction model for forecasting the global horizontal radiation of India

    International Nuclear Information System (INIS)

    Jiang, He; Dong, Yao; Xiao, Ling

    2017-01-01

    Highlights: • Ensemble learning system is proposed to forecast the global solar radiation. • LASSO is utilized as feature selection method for subset model. • GSO is used to select the weight vector aggregating the response of subset model. • A simple and efficient algorithm is designed based on thresholding function. • Theoretical analysis focusing on error rate is provided. - Abstract: Forecasting of effective solar irradiation has developed a huge interest in recent decades, mainly due to its various applications in grid connect photovoltaic installations. This paper develops and investigates an ensemble learning based multistage intelligent approach to forecast 5 days global horizontal radiation at four given locations of India. The two-way interaction model is considered with purpose of detecting the associated correlation between the features. The main structure of the novel method is the ensemble learning, which is based on Divide and Conquer principle, is applied to enhance the forecasting accuracy and model stability. An efficient feature selection method LASSO is performed in the input space with the regularization parameter selected by Cross-Validation. A weight vector which best represents the importance of each individual model in ensemble system is provided by glowworm swarm optimization. The combination of feature selection and parameter selection are helpful in creating the diversity of the ensemble learning. In order to illustrate the validity of the proposed method, the datasets at four different locations of the India are split into training and test datasets. The results of the real data experiments demonstrate the efficiency and efficacy of the proposed method comparing with other competitors.

  9. Encoding of Spatial Attention by Primate Prefrontal Cortex Neuronal Ensembles

    Science.gov (United States)

    Treue, Stefan

    2018-01-01

    Abstract Single neurons in the primate lateral prefrontal cortex (LPFC) encode information about the allocation of visual attention and the features of visual stimuli. However, how this compares to the performance of neuronal ensembles at encoding the same information is poorly understood. Here, we recorded the responses of neuronal ensembles in the LPFC of two macaque monkeys while they performed a task that required attending to one of two moving random dot patterns positioned in different hemifields and ignoring the other pattern. We found single units selective for the location of the attended stimulus as well as for its motion direction. To determine the coding of both variables in the population of recorded units, we used a linear classifier and progressively built neuronal ensembles by iteratively adding units according to their individual performance (best single units), or by iteratively adding units based on their contribution to the ensemble performance (best ensemble). For both methods, ensembles of relatively small sizes (n decoding performance relative to individual single units. However, the decoder reached similar performance using fewer neurons with the best ensemble building method compared with the best single units method. Our results indicate that neuronal ensembles within the LPFC encode more information about the attended spatial and nonspatial features of visual stimuli than individual neurons. They further suggest that efficient coding of attention can be achieved by relatively small neuronal ensembles characterized by a certain relationship between signal and noise correlation structures. PMID:29568798

  10. Waste-acceptance criteria and risk-based thinking for radioactive-waste classification

    International Nuclear Information System (INIS)

    Lowenthal, M.D.

    1998-01-01

    The US system of radioactive-waste classification and its development provide a reference point for the discussion of risk-based thinking in waste classification. The official US system is described and waste-acceptance criteria for disposal sites are introduced because they constitute a form of de facto waste classification. Risk-based classification is explored and it is found that a truly risk-based system is context-dependent: risk depends not only on the waste-management activity but, for some activities such as disposal, it depends on the specific physical context. Some of the elements of the official US system incorporate risk-based thinking, but like many proposed alternative schemes, the physical context of disposal is ignored. The waste-acceptance criteria for disposal sites do account for this context dependence and could be used as a risk-based classification scheme for disposal. While different classes would be necessary for different management activities, the waste-acceptance criteria would obviate the need for the current system and could better match wastes to disposal environments saving money or improving safety or both

  11. Research on Classification of Chinese Text Data Based on SVM

    Science.gov (United States)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  12. A study of several CAD methods for classification of clustered microcalcifications

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei

    2005-04-01

    In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.

  13. Distributed HUC-based modeling with SUMMA for ensemble streamflow forecasting over large regional domains.

    Science.gov (United States)

    Saharia, M.; Wood, A.; Clark, M. P.; Bennett, A.; Nijssen, B.; Clark, E.; Newman, A. J.

    2017-12-01

    Most operational streamflow forecasting systems rely on a forecaster-in-the-loop approach in which some parts of the forecast workflow require an experienced human forecaster. But this approach faces challenges surrounding process reproducibility, hindcasting capability, and extension to large domains. The operational hydrologic community is increasingly moving towards `over-the-loop' (completely automated) large-domain simulations yet recent developments indicate a widespread lack of community knowledge about the strengths and weaknesses of such systems for forecasting. A realistic representation of land surface hydrologic processes is a critical element for improving forecasts, but often comes at the substantial cost of forecast system agility and efficiency. While popular grid-based models support the distributed representation of land surface processes, intermediate-scale Hydrologic Unit Code (HUC)-based modeling could provide a more efficient and process-aligned spatial discretization, reducing the need for tradeoffs between model complexity and critical forecasting requirements such as ensemble methods and comprehensive model calibration. The National Center for Atmospheric Research is collaborating with the University of Washington, the Bureau of Reclamation and the USACE to implement, assess, and demonstrate real-time, over-the-loop distributed streamflow forecasting for several large western US river basins and regions. In this presentation, we present early results from short to medium range hydrologic and streamflow forecasts for the Pacific Northwest (PNW). We employ a real-time 1/16th degree daily ensemble model forcings as well as downscaled Global Ensemble Forecasting System (GEFS) meteorological forecasts. These datasets drive an intermediate-scale configuration of the Structure for Unifying Multiple Modeling Alternatives (SUMMA) model, which represents the PNW using over 11,700 HUCs. The system produces not only streamflow forecasts (using the Mizu

  14. Iris Image Classification Based on Hierarchical Visual Codebook.

    Science.gov (United States)

    Zhenan Sun; Hui Zhang; Tieniu Tan; Jianyu Wang

    2014-06-01

    Iris recognition as a reliable method for personal identification has been well-studied with the objective to assign the class label of each iris image to a unique subject. In contrast, iris image classification aims to classify an iris image to an application specific category, e.g., iris liveness detection (classification of genuine and fake iris images), race classification (e.g., classification of iris images of Asian and non-Asian subjects), coarse-to-fine iris identification (classification of all iris images in the central database into multiple categories). This paper proposes a general framework for iris image classification based on texture analysis. A novel texture pattern representation method called Hierarchical Visual Codebook (HVC) is proposed to encode the texture primitives of iris images. The proposed HVC method is an integration of two existing Bag-of-Words models, namely Vocabulary Tree (VT), and Locality-constrained Linear Coding (LLC). The HVC adopts a coarse-to-fine visual coding strategy and takes advantages of both VT and LLC for accurate and sparse representation of iris texture. Extensive experimental results demonstrate that the proposed iris image classification method achieves state-of-the-art performance for iris liveness detection, race classification, and coarse-to-fine iris identification. A comprehensive fake iris image database simulating four types of iris spoof attacks is developed as the benchmark for research of iris liveness detection.

  15. Improving the Computational Performance of Ontology-Based Classification Using Graph Databases

    Directory of Open Access Journals (Sweden)

    Thomas J. Lampoltshammer

    2015-07-01

    Full Text Available The increasing availability of very high-resolution remote sensing imagery (i.e., from satellites, airborne laser scanning, or aerial photography represents both a blessing and a curse for researchers. The manual classification of these images, or other similar geo-sensor data, is time-consuming and leads to subjective and non-deterministic results. Due to this fact, (semi- automated classification approaches are in high demand in affected research areas. Ontologies provide a proper way of automated classification for various kinds of sensor data, including remotely sensed data. However, the processing of data entities—so-called individuals—is one of the most cost-intensive computational operations within ontology reasoning. Therefore, an approach based on graph databases is proposed to overcome the issue of a high time consumption regarding the classification task. The introduced approach shifts the classification task from the classical Protégé environment and its common reasoners to the proposed graph-based approaches. For the validation, the authors tested the approach on a simulation scenario based on a real-world example. The results demonstrate a quite promising improvement of classification speed—up to 80,000 times faster than the Protégé-based approach.

  16. Uncertainty visualization in HARDI based on ensembles of ODFs

    KAUST Repository

    Jiao, Fangxiang; Phillips, Jeff M.; Gur, Yaniv; Johnson, Chris R.

    2012-01-01

    In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes. © 2012 IEEE.

  17. Uncertainty visualization in HARDI based on ensembles of ODFs

    KAUST Repository

    Jiao, Fangxiang

    2012-02-01

    In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes. © 2012 IEEE.

  18. Long-term ensemble forecast of snowmelt inflow into the Cheboksary Reservoir under two different weather scenarios

    Science.gov (United States)

    Gelfan, Alexander; Moreydo, Vsevolod; Motovilov, Yury; Solomatine, Dimitri P.

    2018-04-01

    A long-term forecasting ensemble methodology, applied to water inflows into the Cheboksary Reservoir (Russia), is presented. The methodology is based on a version of the semi-distributed hydrological model ECOMAG (ECOlogical Model for Applied Geophysics) that allows for the calculation of an ensemble of inflow hydrographs using two different sets of weather ensembles for the lead time period: observed weather data, constructed on the basis of the Ensemble Streamflow Prediction methodology (ESP-based forecast), and synthetic weather data, simulated by a multi-site weather generator (WG-based forecast). We have studied the following: (1) whether there is any advantage of the developed ensemble forecasts in comparison with the currently issued operational forecasts of water inflow into the Cheboksary Reservoir, and (2) whether there is any noticeable improvement in probabilistic forecasts when using the WG-simulated ensemble compared to the ESP-based ensemble. We have found that for a 35-year period beginning from the reservoir filling in 1982, both continuous and binary model-based ensemble forecasts (issued in the deterministic form) outperform the operational forecasts of the April-June inflow volume actually used and, additionally, provide acceptable forecasts of additional water regime characteristics besides the inflow volume. We have also demonstrated that the model performance measures (in the verification period) obtained from the WG-based probabilistic forecasts, which are based on a large number of possible weather scenarios, appeared to be more statistically reliable than the corresponding measures calculated from the ESP-based forecasts based on the observed weather scenarios.

  19. Long-term ensemble forecast of snowmelt inflow into the Cheboksary Reservoir under two different weather scenarios

    Directory of Open Access Journals (Sweden)

    A. Gelfan

    2018-04-01

    Full Text Available A long-term forecasting ensemble methodology, applied to water inflows into the Cheboksary Reservoir (Russia, is presented. The methodology is based on a version of the semi-distributed hydrological model ECOMAG (ECOlogical Model for Applied Geophysics that allows for the calculation of an ensemble of inflow hydrographs using two different sets of weather ensembles for the lead time period: observed weather data, constructed on the basis of the Ensemble Streamflow Prediction methodology (ESP-based forecast, and synthetic weather data, simulated by a multi-site weather generator (WG-based forecast. We have studied the following: (1 whether there is any advantage of the developed ensemble forecasts in comparison with the currently issued operational forecasts of water inflow into the Cheboksary Reservoir, and (2 whether there is any noticeable improvement in probabilistic forecasts when using the WG-simulated ensemble compared to the ESP-based ensemble. We have found that for a 35-year period beginning from the reservoir filling in 1982, both continuous and binary model-based ensemble forecasts (issued in the deterministic form outperform the operational forecasts of the April–June inflow volume actually used and, additionally, provide acceptable forecasts of additional water regime characteristics besides the inflow volume. We have also demonstrated that the model performance measures (in the verification period obtained from the WG-based probabilistic forecasts, which are based on a large number of possible weather scenarios, appeared to be more statistically reliable than the corresponding measures calculated from the ESP-based forecasts based on the observed weather scenarios.

  20. OmniGA: Optimized Omnivariate Decision Trees for Generalizable Classification Models

    KAUST Repository

    Magana-Mora, Arturo

    2017-06-14

    Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.

  1. OmniGA: Optimized Omnivariate Decision Trees for Generalizable Classification Models

    KAUST Repository

    Magana-Mora, Arturo; Bajic, Vladimir B.

    2017-01-01

    Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.

  2. Current path in light emitting diodes based on nanowire ensembles

    International Nuclear Information System (INIS)

    Limbach, F; Hauswald, C; Lähnemann, J; Wölz, M; Brandt, O; Trampert, A; Hanke, M; Jahn, U; Calarco, R; Geelhaar, L; Riechert, H

    2012-01-01

    Light emitting diodes (LEDs) have been fabricated using ensembles of free-standing (In, Ga)N/GaN nanowires (NWs) grown on Si substrates in the self-induced growth mode by molecular beam epitaxy. Electron-beam-induced current analysis, cathodoluminescence as well as biased μ-photoluminescence spectroscopy, transmission electron microscopy, and electrical measurements indicate that the electroluminescence of such LEDs is governed by the differences in the individual current densities of the single-NW LEDs operated in parallel, i.e. by the inhomogeneity of the current path in the ensemble LED. In addition, the optoelectronic characterization leads to the conclusion that these NWs exhibit N-polarity and that the (In, Ga)N quantum well states in the NWs are subject to a non-vanishing quantum confined Stark effect. (paper)

  3. Ensemble methods for handwritten digit recognition

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Liisberg, Christian; Salamon, P.

    1992-01-01

    Neural network ensembles are applied to handwritten digit recognition. The individual networks of the ensemble are combinations of sparse look-up tables (LUTs) with random receptive fields. It is shown that the consensus of a group of networks outperforms the best individual of the ensemble....... It is further shown that it is possible to estimate the ensemble performance as well as the learning curve on a medium-size database. In addition the authors present preliminary analysis of experiments on a large database and show that state-of-the-art performance can be obtained using the ensemble approach...... by optimizing the receptive fields. It is concluded that it is possible to improve performance significantly by introducing moderate-size ensembles; in particular, a 20-25% improvement has been found. The ensemble random LUTs, when trained on a medium-size database, reach a performance (without rejects) of 94...

  4. An ensemble based nonlinear orthogonal matching pursuit algorithm for sparse history matching of reservoir models

    KAUST Repository

    Fsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

    2013-01-01

    the dictionary, the solution is obtained by applying Tikhonov regularization. The proposed algorithm relies on approximate gradient estimation using an iterative stochastic ensemble method (ISEM). ISEM utilizes an ensemble of directional derivatives

  5. Hot complaint intelligent classification based on text mining

    Directory of Open Access Journals (Sweden)

    XIA Haifeng

    2013-10-01

    Full Text Available The complaint recognizer system plays an important role in making sure the correct classification of the hot complaint,improving the service quantity of telecommunications industry.The customers’ complaint in telecommunications industry has its special particularity which should be done in limited time,which cause the error in classification of hot complaint.The paper presents a model of complaint hot intelligent classification based on text mining,which can classify the hot complaint in the correct level of the complaint navigation.The examples show that the model can be efficient to classify the text of the complaint.

  6. Eigenfunction statistics of Wishart Brownian ensembles

    International Nuclear Information System (INIS)

    Shukla, Pragya

    2017-01-01

    We theoretically analyze the eigenfunction fluctuation measures for a Hermitian ensemble which appears as an intermediate state of the perturbation of a stationary ensemble by another stationary ensemble of Wishart (Laguerre) type. Similar to the perturbation by a Gaussian stationary ensemble, the measures undergo a diffusive dynamics in terms of the perturbation parameter but the energy-dependence of the fluctuations is different in the two cases. This may have important consequences for the eigenfunction dynamics as well as phase transition studies in many areas of complexity where Brownian ensembles appear. (paper)

  7. The Development of Target-Specific Pose Filter Ensembles To Boost Ligand Enrichment for Structure-Based Virtual Screening.

    Science.gov (United States)

    Xia, Jie; Hsieh, Jui-Hua; Hu, Huabin; Wu, Song; Wang, Xiang Simon

    2017-06-26

    Structure-based virtual screening (SBVS) has become an indispensable technique for hit identification at the early stage of drug discovery. However, the accuracy of current scoring functions is not high enough to confer success to every target and thus remains to be improved. Previously, we had developed binary pose filters (PFs) using knowledge derived from the protein-ligand interface of a single X-ray structure of a specific target. This novel approach had been validated as an effective way to improve ligand enrichment. Continuing from it, in the present work we attempted to incorporate knowledge collected from diverse protein-ligand interfaces of multiple crystal structures of the same target to build PF ensembles (PFEs). Toward this end, we first constructed a comprehensive data set to meet the requirements of ensemble modeling and validation. This set contains 10 diverse targets, 118 well-prepared X-ray structures of protein-ligand complexes, and large benchmarking actives/decoys sets. Notably, we designed a unique workflow of two-layer classifiers based on the concept of ensemble learning and applied it to the construction of PFEs for all of the targets. Through extensive benchmarking studies, we demonstrated that (1) coupling PFE with Chemgauss4 significantly improves the early enrichment of Chemgauss4 itself and (2) PFEs show greater consistency in boosting early enrichment and larger overall enrichment than our prior PFs. In addition, we analyzed the pairwise topological similarities among cognate ligands used to construct PFEs and found that it is the higher chemical diversity of the cognate ligands that leads to the improved performance of PFEs. Taken together, the results so far prove that the incorporation of knowledge from diverse protein-ligand interfaces by ensemble modeling is able to enhance the screening competence of SBVS scoring functions.

  8. Assessing an ensemble Kalman filter inference of Manning’s n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    KAUST Repository

    Siripatana, Adil

    2017-06-08

    Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning’s n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and

  9. Skill forecasting from different wind power ensemble prediction methods

    International Nuclear Information System (INIS)

    Pinson, Pierre; Nielsen, Henrik A; Madsen, Henrik; Kariniotakis, George

    2007-01-01

    This paper presents an investigation on alternative approaches to the providing of uncertainty estimates associated to point predictions of wind generation. Focus is given to skill forecasts in the form of prediction risk indices, aiming at giving a comprehensive signal on the expected level of forecast uncertainty. Ensemble predictions of wind generation are used as input. A proposal for the definition of prediction risk indices is given. Such skill forecasts are based on the dispersion of ensemble members for a single prediction horizon, or over a set of successive look-ahead times. It is shown on the test case of a Danish offshore wind farm how prediction risk indices may be related to several levels of forecast uncertainty (and energy imbalances). Wind power ensemble predictions are derived from the transformation of ECMWF and NCEP ensembles of meteorological variables to power, as well as by a lagged average approach alternative. The ability of risk indices calculated from the various types of ensembles forecasts to resolve among situations with different levels of uncertainty is discussed

  10. A proposed data base system for detection, classification and ...

    African Journals Online (AJOL)

    A proposed data base system for detection, classification and location of fault on electricity company of Ghana electrical distribution system. Isaac Owusu-Nyarko, Mensah-Ananoo Eugine. Abstract. No Abstract. Keywords: database, classification of fault, power, distribution system, SCADA, ECG. Full Text: EMAIL FULL TEXT ...

  11. Measuring social interaction in music ensembles.

    Science.gov (United States)

    Volpe, Gualtiero; D'Ausilio, Alessandro; Badino, Leonardo; Camurri, Antonio; Fadiga, Luciano

    2016-05-05

    Music ensembles are an ideal test-bed for quantitative analysis of social interaction. Music is an inherently social activity, and music ensembles offer a broad variety of scenarios which are particularly suitable for investigation. Small ensembles, such as string quartets, are deemed a significant example of self-managed teams, where all musicians contribute equally to a task. In bigger ensembles, such as orchestras, the relationship between a leader (the conductor) and a group of followers (the musicians) clearly emerges. This paper presents an overview of recent research on social interaction in music ensembles with a particular focus on (i) studies from cognitive neuroscience; and (ii) studies adopting a computational approach for carrying out automatic quantitative analysis of ensemble music performances. © 2016 The Author(s).

  12. Hydrologic-Process-Based Soil Texture Classifications for Improved Visualization of Landscape Function

    Science.gov (United States)

    Groenendyk, Derek G.; Ferré, Ty P.A.; Thorp, Kelly R.; Rice, Amy K.

    2015-01-01

    Soils lie at the interface between the atmosphere and the subsurface and are a key component that control ecosystem services, food production, and many other processes at the Earth’s surface. There is a long-established convention for identifying and mapping soils by texture. These readily available, georeferenced soil maps and databases are used widely in environmental sciences. Here, we show that these traditional soil classifications can be inappropriate, contributing to bias and uncertainty in applications from slope stability to water resource management. We suggest a new approach to soil classification, with a detailed example from the science of hydrology. Hydrologic simulations based on common meteorological conditions were performed using HYDRUS-1D, spanning textures identified by the United States Department of Agriculture soil texture triangle. We consider these common conditions to be: drainage from saturation, infiltration onto a drained soil, and combined infiltration and drainage events. Using a k-means clustering algorithm, we created soil classifications based on the modeled hydrologic responses of these soils. The hydrologic-process-based classifications were compared to those based on soil texture and a single hydraulic property, Ks. Differences in classifications based on hydrologic response versus soil texture demonstrate that traditional soil texture classification is a poor predictor of hydrologic response. We then developed a QGIS plugin to construct soil maps combining a classification with georeferenced soil data from the Natural Resource Conservation Service. The spatial patterns of hydrologic response were more immediately informative, much simpler, and less ambiguous, for use in applications ranging from trafficability to irrigation management to flood control. The ease with which hydrologic-process-based classifications can be made, along with the improved quantitative predictions of soil responses and visualization of landscape

  13. Impact of ensemble learning in the assessment of skeletal maturity.

    Science.gov (United States)

    Cunha, Pedro; Moura, Daniel C; Guevara López, Miguel Angel; Guerra, Conceição; Pinto, Daniela; Ramos, Isabel

    2014-09-01

    The assessment of the bone age, or skeletal maturity, is an important task in pediatrics that measures the degree of maturation of children's bones. Nowadays, there is no standard clinical procedure for assessing bone age and the most widely used approaches are the Greulich and Pyle and the Tanner and Whitehouse methods. Computer methods have been proposed to automatize the process; however, there is a lack of exploration about how to combine the features of the different parts of the hand, and how to take advantage of ensemble techniques for this purpose. This paper presents a study where the use of ensemble techniques for improving bone age assessment is evaluated. A new computer method was developed that extracts descriptors for each joint of each finger, which are then combined using different ensemble schemes for obtaining a final bone age value. Three popular ensemble schemes are explored in this study: bagging, stacking and voting. Best results were achieved by bagging with a rule-based regression (M5P), scoring a mean absolute error of 10.16 months. Results show that ensemble techniques improve the prediction performance of most of the evaluated regression algorithms, always achieving best or comparable to best results. Therefore, the success of the ensemble methods allow us to conclude that their use may improve computer-based bone age assessment, offering a scalable option for utilizing multiple regions of interest and combining their output.

  14. DroidEnsemble: Detecting Android Malicious Applications with Ensemble of String and Structural Static Features

    KAUST Repository

    Wang, Wei

    2018-05-11

    Android platform has dominated the Operating System of mobile devices. However, the dramatic increase of Android malicious applications (malapps) has caused serious software failures to Android system and posed a great threat to users. The effective detection of Android malapps has thus become an emerging yet crucial issue. Characterizing the behaviors of Android applications (apps) is essential to detecting malapps. Most existing work on detecting Android malapps was mainly based on string static features such as permissions and API usage extracted from apps. There also exists work on the detection of Android malapps with structural features, such as Control Flow Graph (CFG) and Data Flow Graph (DFG). As Android malapps have become increasingly polymorphic and sophisticated, using only one type of static features may result in false negatives. In this work, we propose DroidEnsemble that takes advantages of both string features and structural features to systematically and comprehensively characterize the static behaviors of Android apps and thus build a more accurate detection model for the detection of Android malapps. We extract each app’s string features, including permissions, hardware features, filter intents, restricted API calls, used permissions, code patterns, as well as structural features like function call graph. We then use three machine learning algorithms, namely, Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Random Forest (RF), to evaluate the performance of these two types of features and of their ensemble. In the experiments, We evaluate our methods and models with 1386 benign apps and 1296 malapps. Extensive experimental results demonstrate the effectiveness of DroidEnsemble. It achieves the detection accuracy as 95.8% with only string features and as 90.68% with only structural features. DroidEnsemble reaches the detection accuracy as 98.4% with the ensemble of both types of features, reducing 9 false positives and 12 false

  15. Ocean Predictability and Uncertainty Forecasts Using Local Ensemble Transfer Kalman Filter (LETKF)

    Science.gov (United States)

    Wei, M.; Hogan, P. J.; Rowley, C. D.; Smedstad, O. M.; Wallcraft, A. J.; Penny, S. G.

    2017-12-01

    Ocean predictability and uncertainty are studied with an ensemble system that has been developed based on the US Navy's operational HYCOM using the Local Ensemble Transfer Kalman Filter (LETKF) technology. One of the advantages of this method is that the best possible initial analysis states for the HYCOM forecasts are provided by the LETKF which assimilates operational observations using ensemble method. The background covariance during this assimilation process is implicitly supplied with the ensemble avoiding the difficult task of developing tangent linear and adjoint models out of HYCOM with the complicated hybrid isopycnal vertical coordinate for 4D-VAR. The flow-dependent background covariance from the ensemble will be an indispensable part in the next generation hybrid 4D-Var/ensemble data assimilation system. The predictability and uncertainty for the ocean forecasts are studied initially for the Gulf of Mexico. The results are compared with another ensemble system using Ensemble Transfer (ET) method which has been used in the Navy's operational center. The advantages and disadvantages are discussed.

  16. Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

    CSIR Research Space (South Africa)

    Duma, M

    2013-09-01

    Full Text Available of missing data, with a decline in performance as the amount of missing data increases. Wagner et al.18 presented a study aimed at constructing a multimodal, ensemble of classifiers for emotion recog- nition with missing values in one or multiple... classification accuracies of 55%, which includes certain generic fusion schemes and emotion adapted strategies like arousal, valence and cross-axis. There are four kinds of missing data mechanisms found in the literature, namely missing at random (MAR), miss...

  17. On the contribution of local feedback mechanisms to the range of climate sensitivity in two GCM ensembles

    Energy Technology Data Exchange (ETDEWEB)

    Webb, M.J.; Senior, C.A.; Sexton, D.M.H.; Ingram, W.J.; Williams, K.D.; Ringer, M.A. [Hadley Centre for Climate Prediction and Research, Met Office, Exeter (United Kingdom); McAvaney, B.J.; Colman, R. [Bureau of Meteorology Research Centre (BMRC), Melbourne (Australia); Soden, B.J. [University of Miami, Rosenstiel School for Marine and Atmospheric Science, Miami, FL (United States); Gudgel, R.; Knutson, T. [Geophysical Fluid Dynamics Laboratory (GFDL), Princeton, NJ (United States); Emori, S.; Ogura, T. [National Institute for Environmental Studies (NIES), Tsukuba (Japan); Tsushima, Y. [Japan Agency for Marine-Earth Science and Technology, Frontier Research Center for Global Change (FRCGC), Kanagawa (Japan); Andronova, N. [University of Michigan, Department of Atmospheric, Oceanic and Space Sciences, Ann Arbor, MI (United States); Li, B. [University of Illinois at Urbana-Champaign (UIUC), Department of Atmospheric Sciences, Urbana, IL (United States); Musat, I.; Bony, S. [Institut Pierre Simon Laplace (IPSL), Paris (France); Taylor, K.E. [Program for Climate Model Diagnosis and Intercomparison (PCMDI), Livermore, CA (United States)

    2006-07-15

    Global and local feedback analysis techniques have been applied to two ensembles of mixed layer equilibrium CO{sub 2} doubling climate change experiments, from the CFMIP (Cloud Feedback Model Intercomparison Project) and QUMP (Quantifying Uncertainty in Model Predictions) projects. Neither of these new ensembles shows evidence of a statistically significant change in the ensemble mean or variance in global mean climate sensitivity when compared with the results from the mixed layer models quoted in the Third Assessment Report of the IPCC. Global mean feedback analysis of these two ensembles confirms the large contribution made by inter-model differences in cloud feedbacks to those in climate sensitivity in earlier studies; net cloud feedbacks are responsible for 66% of the inter-model variance in the total feedback in the CFMIP ensemble and 85% in the QUMP ensemble. The ensemble mean global feedback components are all statistically indistinguishable between the two ensembles, except for the clear-sky shortwave feedback which is stronger in the CFMIP ensemble. While ensemble variances of the shortwave cloud feedback and both clear-sky feedback terms are larger in CFMIP, there is considerable overlap in the cloud feedback ranges; QUMP spans 80% or more of the CFMIP ranges in longwave and shortwave cloud feedback. We introduce a local cloud feedback classification system which distinguishes different types of cloud feedbacks on the basis of the relative strengths of their longwave and shortwave components, and interpret these in terms of responses of different cloud types diagnosed by the International Satellite Cloud Climatology Project simulator. In the CFMIP ensemble, areas where low-top cloud changes constitute the largest cloud response are responsible for 59% of the contribution from cloud feedback to the variance in the total feedback. A similar figure is found for the QUMP ensemble. Areas of positive low cloud feedback (associated with reductions in low level

  18. Granular loess classification based

    International Nuclear Information System (INIS)

    Browzin, B.S.

    1985-01-01

    This paper discusses how loess might be identified by two index properties: the granulometric composition and the dry unit weight. These two indices are necessary but not always sufficient for identification of loess. On the basis of analyses of samples from three continents, it was concluded that the 0.01-0.5-mm fraction deserves the name loessial fraction. Based on the loessial fraction concept, a granulometric classification of loess is proposed. A triangular chart is used to classify loess

  19. Joys of Community Ensemble Playing: The Case of the Happy Roll Elastic Ensemble in Taiwan

    Science.gov (United States)

    Hsieh, Yuan-Mei; Kao, Kai-Chi

    2012-01-01

    The Happy Roll Elastic Ensemble (HREE) is a community music ensemble supported by Tainan Culture Centre in Taiwan. With enjoyment and friendship as its primary goals, it aims to facilitate the joys of ensemble playing and the spirit of social networking. This article highlights the key aspects of HREE's development in its first two years…

  20. Ensemble Data Mining Methods

    Science.gov (United States)

    Oza, Nikunj C.

    2004-01-01

    Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

  1. Tailored Random Graph Ensembles

    International Nuclear Information System (INIS)

    Roberts, E S; Annibale, A; Coolen, A C C

    2013-01-01

    Tailored graph ensembles are a developing bridge between biological networks and statistical mechanics. The aim is to use this concept to generate a suite of rigorous tools that can be used to quantify and compare the topology of cellular signalling networks, such as protein-protein interaction networks and gene regulation networks. We calculate exact and explicit formulae for the leading orders in the system size of the Shannon entropies of random graph ensembles constrained with degree distribution and degree-degree correlation. We also construct an ergodic detailed balance Markov chain with non-trivial acceptance probabilities which converges to a strictly uniform measure and is based on edge swaps that conserve all degrees. The acceptance probabilities can be generalized to define Markov chains that target any alternative desired measure on the space of directed or undirected graphs, in order to generate graphs with more sophisticated topological features.

  2. Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics

    Science.gov (United States)

    Lazarus, S. M.; Holman, B. P.; Splitt, M. E.

    2017-12-01

    A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.

  3. Failure diagnosis using deep belief learning based health state classification

    International Nuclear Information System (INIS)

    Tamilselvan, Prasanna; Wang, Pingfeng

    2013-01-01

    Effective health diagnosis provides multifarious benefits such as improved safety, improved reliability and reduced costs for operation and maintenance of complex engineered systems. This paper presents a novel multi-sensor health diagnosis method using deep belief network (DBN). DBN has recently become a popular approach in machine learning for its promised advantages such as fast inference and the ability to encode richer and higher order network structures. The DBN employs a hierarchical structure with multiple stacked restricted Boltzmann machines and works through a layer by layer successive learning process. The proposed multi-sensor health diagnosis methodology using DBN based state classification can be structured in three consecutive stages: first, defining health states and preprocessing sensory data for DBN training and testing; second, developing DBN based classification models for diagnosis of predefined health states; third, validating DBN classification models with testing sensory dataset. Health diagnosis using DBN based health state classification technique is compared with four existing diagnosis techniques. Benchmark classification problems and two engineering health diagnosis applications: aircraft engine health diagnosis and electric power transformer health diagnosis are employed to demonstrate the efficacy of the proposed approach

  4. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    Science.gov (United States)

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  5. Ensemble Manifold Rank Preserving for Acceleration-Based Human Activity Recognition.

    Science.gov (United States)

    Tao, Dapeng; Jin, Lianwen; Yuan, Yuan; Xue, Yang

    2016-06-01

    With the rapid development of mobile devices and pervasive computing technologies, acceleration-based human activity recognition, a difficult yet essential problem in mobile apps, has received intensive attention recently. Different acceleration signals for representing different activities or even a same activity have different attributes, which causes troubles in normalizing the signals. We thus cannot directly compare these signals with each other, because they are embedded in a nonmetric space. Therefore, we present a nonmetric scheme that retains discriminative and robust frequency domain information by developing a novel ensemble manifold rank preserving (EMRP) algorithm. EMRP simultaneously considers three aspects: 1) it encodes the local geometry using the ranking order information of intraclass samples distributed on local patches; 2) it keeps the discriminative information by maximizing the margin between samples of different classes; and 3) it finds the optimal linear combination of the alignment matrices to approximate the intrinsic manifold lied in the data. Experiments are conducted on the South China University of Technology naturalistic 3-D acceleration-based activity dataset and the naturalistic mobile-devices based human activity dataset to demonstrate the robustness and effectiveness of the new nonmetric scheme for acceleration-based human activity recognition.

  6. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben

    Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based on the po......Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based...

  7. Improving Climate Projections Using "Intelligent" Ensembles

    Science.gov (United States)

    Baker, Noel C.; Taylor, Patrick C.

    2015-01-01

    Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and

  8. Color Independent Components Based SIFT Descriptors for Object/Scene Classification

    Science.gov (United States)

    Ai, Dan-Ni; Han, Xian-Hua; Ruan, Xiang; Chen, Yen-Wei

    In this paper, we present a novel color independent components based SIFT descriptor (termed CIC-SIFT) for object/scene classification. We first learn an efficient color transformation matrix based on independent component analysis (ICA), which is adaptive to each category in a database. The ICA-based color transformation can enhance contrast between the objects and the background in an image. Then we compute CIC-SIFT descriptors over all three transformed color independent components. Since the ICA-based color transformation can boost the objects and suppress the background, the proposed CIC-SIFT can extract more effective and discriminative local features for object/scene classification. The comparison is performed among seven SIFT descriptors, and the experimental classification results show that our proposed CIC-SIFT is superior to other conventional SIFT descriptors.

  9. Object-Based Classification as an Alternative Approach to the Traditional Pixel-Based Classification to Identify Potential Habitat of the Grasshopper Sparrow

    Science.gov (United States)

    Jobin, Benoît; Labrecque, Sandra; Grenier, Marcelle; Falardeau, Gilles

    2008-01-01

    The traditional method of identifying wildlife habitat distribution over large regions consists of pixel-based classification of satellite images into a suite of habitat classes used to select suitable habitat patches. Object-based classification is a new method that can achieve the same objective based on the segmentation of spectral bands of the image creating homogeneous polygons with regard to spatial or spectral characteristics. The segmentation algorithm does not solely rely on the single pixel value, but also on shape, texture, and pixel spatial continuity. The object-based classification is a knowledge base process where an interpretation key is developed using ground control points and objects are assigned to specific classes according to threshold values of determined spectral and/or spatial attributes. We developed a model using the eCognition software to identify suitable habitats for the Grasshopper Sparrow, a rare and declining species found in southwestern Québec. The model was developed in a region with known breeding sites and applied on other images covering adjacent regions where potential breeding habitats may be present. We were successful in locating potential habitats in areas where dairy farming prevailed but failed in an adjacent region covered by a distinct Landsat scene and dominated by annual crops. We discuss the added value of this method, such as the possibility to use the contextual information associated to objects and the ability to eliminate unsuitable areas in the segmentation and land cover classification processes, as well as technical and logistical constraints. A series of recommendations on the use of this method and on conservation issues of Grasshopper Sparrow habitat is also provided.

  10. Comparison of hand-craft feature based SVM and CNN based deep learning framework for automatic polyp classification.

    Science.gov (United States)

    Younghak Shin; Balasingham, Ilangko

    2017-07-01

    Colonoscopy is a standard method for screening polyps by highly trained physicians. Miss-detected polyps in colonoscopy are potential risk factor for colorectal cancer. In this study, we investigate an automatic polyp classification framework. We aim to compare two different approaches named hand-craft feature method and convolutional neural network (CNN) based deep learning method. Combined shape and color features are used for hand craft feature extraction and support vector machine (SVM) method is adopted for classification. For CNN approach, three convolution and pooling based deep learning framework is used for classification purpose. The proposed framework is evaluated using three public polyp databases. From the experimental results, we have shown that the CNN based deep learning framework shows better classification performance than the hand-craft feature based methods. It achieves over 90% of classification accuracy, sensitivity, specificity and precision.

  11. Ensembl 2002: accommodating comparative genomics.

    Science.gov (United States)

    Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

    2003-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

  12. Accurate crop classification using hierarchical genetic fuzzy rule-based systems

    Science.gov (United States)

    Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.

    2014-10-01

    This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.

  13. Microcanonical ensemble and algebra of conserved generators for generalized quantum dynamics

    International Nuclear Information System (INIS)

    Adler, S.L.; Horwitz, L.P.

    1996-01-01

    It has recently been shown, by application of statistical mechanical methods to determine the canonical ensemble governing the equilibrium distribution of operator initial values, that complex quantum field theory can emerge as a statistical approximation to an underlying generalized quantum dynamics. This result was obtained by an argument based on a Ward identity analogous to the equipartition theorem of classical statistical mechanics. We construct here a microcanonical ensemble which forms the basis of this canonical ensemble. This construction enables us to define the microcanonical entropy and free energy of the field configuration of the equilibrium distribution and to study the stability of the canonical ensemble. We also study the algebraic structure of the conserved generators from which the microcanonical and canonical ensembles are constructed, and the flows they induce on the phase space. copyright 1996 American Institute of Physics

  14. Hierarchical structure for audio-video based semantic classification of sports video sequences

    Science.gov (United States)

    Kolekar, M. H.; Sengupta, S.

    2005-07-01

    A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.

  15. SQL based cardiovascular ultrasound image classification.

    Science.gov (United States)

    Nandagopalan, S; Suryanarayana, Adiga B; Sudarshan, T S B; Chandrashekar, Dhanalakshmi; Manjunath, C N

    2013-01-01

    This paper proposes a novel method to analyze and classify the cardiovascular ultrasound echocardiographic images using Naïve-Bayesian model via database OLAP-SQL. Efficient data mining algorithms based on tightly-coupled model is used to extract features. Three algorithms are proposed for classification namely Naïve-Bayesian Classifier for Discrete variables (NBCD) with SQL, NBCD with OLAP-SQL, and Naïve-Bayesian Classifier for Continuous variables (NBCC) using OLAP-SQL. The proposed model is trained with 207 patient images containing normal and abnormal categories. Out of the three proposed algorithms, a high classification accuracy of 96.59% was achieved from NBCC which is better than the earlier methods.

  16. Design and implementation based on the classification protection vulnerability scanning system

    International Nuclear Information System (INIS)

    Wang Chao; Lu Zhigang; Liu Baoxu

    2010-01-01

    With the application and spread of the classification protection, Network Security Vulnerability Scanning should consider the efficiency and the function expansion. It proposes a kind of a system vulnerability from classification protection, and elaborates the design and implementation of a vulnerability scanning system based on vulnerability classification plug-in technology and oriented classification protection. According to the experiment, the application of classification protection has good adaptability and salability with the system, and it also approves the efficiency of scanning. (authors)

  17. Analysis of composition-based metagenomic classification.

    Science.gov (United States)

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in

  18. Scalable quantum information processing with atomic ensembles and flying photons

    International Nuclear Information System (INIS)

    Mei Feng; Yu Yafei; Feng Mang; Zhang Zhiming

    2009-01-01

    We present a scheme for scalable quantum information processing with atomic ensembles and flying photons. Using the Rydberg blockade, we encode the qubits in the collective atomic states, which could be manipulated fast and easily due to the enhanced interaction in comparison to the single-atom case. We demonstrate that our proposed gating could be applied to generation of two-dimensional cluster states for measurement-based quantum computation. Moreover, the atomic ensembles also function as quantum repeaters useful for long-distance quantum state transfer. We show the possibility of our scheme to work in bad cavity or in weak coupling regime, which could much relax the experimental requirement. The efficient coherent operations on the ensemble qubits enable our scheme to be switchable between quantum computation and quantum communication using atomic ensembles.

  19. Group-Based Active Learning of Classification Models.

    Science.gov (United States)

    Luo, Zhipeng; Hauskrecht, Milos

    2017-05-01

    Learning of classification models from real-world data often requires additional human expert effort to annotate the data. However, this process can be rather costly and finding ways of reducing the human annotation effort is critical for this task. The objective of this paper is to develop and study new ways of providing human feedback for efficient learning of classification models by labeling groups of examples. Briefly, unlike traditional active learning methods that seek feedback on individual examples, we develop a new group-based active learning framework that solicits label information on groups of multiple examples. In order to describe groups in a user-friendly way, conjunctive patterns are used to compactly represent groups. Our empirical study on 12 UCI data sets demonstrates the advantages and superiority of our approach over both classic instance-based active learning work, as well as existing group-based active-learning methods.

  20. Ensemble singular vectors and their use as additive inflation in EnKF

    Directory of Open Access Journals (Sweden)

    Shu-Chih Yang

    2015-07-01

    Full Text Available Given an ensemble of forecasts, it is possible to determine the leading ensemble singular vector (ESV, that is, the linear combination of the forecasts that, given the choice of the perturbation norm and forecast interval, will maximise the growth of the perturbations. Because the ESV indicates the directions of the fastest growing forecast errors, we explore the potential of applying the leading ESVs in ensemble Kalman filter (EnKF for correcting fast-growing errors. The ESVs are derived based on a quasi-geostrophic multi-level channel model, and data assimilation experiments are carried out under framework of the local ensemble transform Kalman filter. We confirm that even during the early spin-up starting with random initial conditions, the final ESVs of the first analysis with a 12-h window are strongly related to the background errors. Since initial ensemble singular vectors (IESVs grow much faster than Lyapunov Vectors (LVs, and the final ensemble singular vectors (FESVs are close to convergence to leading LVs, perturbations based on leading IESVs grow faster than those based on FESVs, and are therefore preferable as additive inflation. The IESVs are applied in the EnKF framework for constructing flow-dependent additive perturbations to inflate the analysis ensemble. Compared with using random perturbations as additive inflation, a positive impact from using ESVs is found especially in areas with large growing errors. When an EnKF is ‘cold-started’ from random perturbations and poor initial condition, results indicate that using the ESVs as additive inflation has the advantage of correcting large errors so that the spin-up of the EnKF can be accelerated.

  1. A variational ensemble scheme for noisy image data assimilation

    Science.gov (United States)

    Yang, Yin; Robinson, Cordelia; Heitz, Dominique; Mémin, Etienne

    2014-05-01

    Data assimilation techniques aim at recovering a system state variables trajectory denoted as X, along time from partially observed noisy measurements of the system denoted as Y. These procedures, which couple dynamics and noisy measurements of the system, fulfill indeed a twofold objective. On one hand, they provide a denoising - or reconstruction - procedure of the data through a given model framework and on the other hand, they provide estimation procedures for unknown parameters of the dynamics. A standard variational data assimilation problem can be formulated as the minimization of the following objective function with respect to the initial discrepancy, η, from the background initial guess: δ« J(η(x)) = 1∥Xb (x) - X (t ,x)∥2 + 1 tf∥H(X (t,x ))- Y (t,x)∥2dt. 2 0 0 B 2 t0 R (1) where the observation operator H links the state variable and the measurements. The cost function can be interpreted as the log likelihood function associated to the a posteriori distribution of the state given the past history of measurements and the background. In this work, we aim at studying ensemble based optimal control strategies for data assimilation. Such formulation nicely combines the ingredients of ensemble Kalman filters and variational data assimilation (4DVar). It is also formulated as the minimization of the objective function (1), but similarly to ensemble filter, it introduces in its objective function an empirical ensemble-based background-error covariance defined as: B ≡ )(Xb - )T>. (2) Thus, it works in an off-line smoothing mode rather than on the fly like sequential filters. Such resulting ensemble variational data assimilation technique corresponds to a relatively new family of methods [1,2,3]. It presents two main advantages: first, it does not require anymore to construct the adjoint of the dynamics tangent linear operator, which is a considerable advantage with respect to the method's implementation, and second, it enables the handling of a flow

  2. Generation of Exotic Quantum States of a Cold Atomic Ensemble

    DEFF Research Database (Denmark)

    Christensen, Stefan Lund

    Over the last decades quantum effects have become more and more controllable, leading to the implementations of various quantum information protocols. These protocols are all based on utilizing quantum correlation. In this thesis we consider how states of an atomic ensemble with such correlations...... can be created and characterized. First we consider a spin-squeezed state. This state is generated by performing quantum non-demolition measurements of the atomic population difference. We show a spectroscopically relevant noise reduction of -1.7dB, the ensemble is in a many-body entangled state...... — a nanofiber based light-atom interface. Using a dual-frequency probing method we measure and prepare an ensemble with a sub-Poissonian atom number distribution. This is a first step towards the implementation of more exotic quantum states....

  3. A novel hybrid ensemble learning paradigm for nuclear energy consumption forecasting

    International Nuclear Information System (INIS)

    Tang, Ling; Yu, Lean; Wang, Shuai; Li, Jianping; Wang, Shouyang

    2012-01-01

    Highlights: ► A hybrid ensemble learning paradigm integrating EEMD and LSSVR is proposed. ► The hybrid ensemble method is useful to predict time series with high volatility. ► The ensemble method can be used for both one-step and multi-step ahead forecasting. - Abstract: In this paper, a novel hybrid ensemble learning paradigm integrating ensemble empirical mode decomposition (EEMD) and least squares support vector regression (LSSVR) is proposed for nuclear energy consumption forecasting, based on the principle of “decomposition and ensemble”. This hybrid ensemble learning paradigm is formulated specifically to address difficulties in modeling nuclear energy consumption, which has inherently high volatility, complexity and irregularity. In the proposed hybrid ensemble learning paradigm, EEMD, as a competitive decomposition method, is first applied to decompose original data of nuclear energy consumption (i.e. a difficult task) into a number of independent intrinsic mode functions (IMFs) of original data (i.e. some relatively easy subtasks). Then LSSVR, as a powerful forecasting tool, is implemented to predict all extracted IMFs independently. Finally, these predicted IMFs are aggregated into an ensemble result as final prediction, using another LSSVR. For illustration and verification purposes, the proposed learning paradigm is used to predict nuclear energy consumption in China. Empirical results demonstrate that the novel hybrid ensemble learning paradigm can outperform some other popular forecasting models in both level prediction and directional forecasting, indicating that it is a promising tool to predict complex time series with high volatility and irregularity.

  4. The semantic similarity ensemble

    Directory of Open Access Journals (Sweden)

    Andrea Ballatore

    2013-12-01

    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  5. Assessing a robust ensemble-based Kalman filter for efficient ecosystem data assimilation of the Cretan Sea

    KAUST Repository

    Triantafyllou, George N.; Hoteit, Ibrahim; Luo, Xiaodong; Tsiaras, Kostas P.; Petihakis, George

    2013-01-01

    An application of an ensemble-based robust filter for data assimilation into an ecosystem model of the Cretan Sea is presented and discussed. The ecosystem model comprises two on-line coupled sub-models: the Princeton Ocean Model (POM) and the European Regional Seas Ecosystem Model (ERSEM). The filtering scheme is based on the Singular Evolutive Interpolated Kalman (SEIK) filter which is implemented with a time-local H∞ filtering strategy to enhance robustness and performances during periods of strong ecosystem variability. Assimilation experiments in the Cretan Sea indicate that robustness can be achieved in the SEIK filter by introducing an adaptive inflation scheme of the modes of the filter error covariance matrix. Twin-experiments are performed to evaluate the performance of the assimilation system and to study the benefits of using robust filtering in an ensemble filtering framework. Pseudo-observations of surface chlorophyll, extracted from a model reference run, were assimilated every two days. Simulation results suggest that the adaptive inflation scheme significantly improves the behavior of the SEIK filter during periods of strong ecosystem variability. © 2012 Elsevier B.V.

  6. Assessing a robust ensemble-based Kalman filter for efficient ecosystem data assimilation of the Cretan Sea

    KAUST Repository

    Triantafyllou, George N.

    2013-09-01

    An application of an ensemble-based robust filter for data assimilation into an ecosystem model of the Cretan Sea is presented and discussed. The ecosystem model comprises two on-line coupled sub-models: the Princeton Ocean Model (POM) and the European Regional Seas Ecosystem Model (ERSEM). The filtering scheme is based on the Singular Evolutive Interpolated Kalman (SEIK) filter which is implemented with a time-local H∞ filtering strategy to enhance robustness and performances during periods of strong ecosystem variability. Assimilation experiments in the Cretan Sea indicate that robustness can be achieved in the SEIK filter by introducing an adaptive inflation scheme of the modes of the filter error covariance matrix. Twin-experiments are performed to evaluate the performance of the assimilation system and to study the benefits of using robust filtering in an ensemble filtering framework. Pseudo-observations of surface chlorophyll, extracted from a model reference run, were assimilated every two days. Simulation results suggest that the adaptive inflation scheme significantly improves the behavior of the SEIK filter during periods of strong ecosystem variability. © 2012 Elsevier B.V.

  7. Fidelity estimation between two finite ensembles of unknown pure equatorial qubit states

    Energy Technology Data Exchange (ETDEWEB)

    Siomau, Michael, E-mail: siomau@physi.uni-heidelberg.de [Physikalisches Institut, Heidelberg Universitaet, D-69120 Heidelberg (Germany); Department of Theoretical Physics, Belarussian State University, 220030 Minsk (Belarus)

    2011-09-05

    Suppose, we are given two finite ensembles of pure qubit states, so that the qubits in each ensemble are prepared in identical (but unknown for us) states lying on the equator of the Bloch sphere. What is the best strategy to estimate fidelity between these two finite ensembles of qubit states? We discuss three possible strategies for the fidelity estimation. We show that the best strategy includes two stages: a specific unitary transformation on two ensembles and state estimation of the output states of this transformation. -- Highlights: → We search for the best strategy for the fidelity estimation. → A measurement-based, a cloning-based and a unified strategies are considered. → The last strategy includes a specific unitary transformation and state estimation. → The unified strategy is shown to be the best among the three.

  8. Multimodel GCM-RCM Ensemble-Based Projections of Temperature and Precipitation over West Africa for the Early 21st Century

    Directory of Open Access Journals (Sweden)

    I. Diallo

    2012-01-01

    Full Text Available Reliable climate change scenarios are critical for West Africa, whose economy relies mostly on agriculture and, in this regard, multimodel ensembles are believed to provide the most robust climate change information. Toward this end, we analyze and intercompare the performance of a set of four regional climate models (RCMs driven by two global climate models (GCMs (for a total of 4 different GCM-RCM pairs in simulating present day and future climate over West Africa. The results show that the individual RCM members as well as their ensemble employing the same driving fields exhibit different biases and show mixed results in terms of outperforming the GCM simulation of seasonal temperature and precipitation, indicating a substantial sensitivity of RCMs to regional and local processes. These biases are reduced and GCM simulations improved upon by averaging all four RCM simulations, suggesting that multi-model RCM ensembles based on different driving GCMs help to compensate systematic errors from both the nested and the driving models. This confirms the importance of the multi-model approach for improving robustness of climate change projections. Illustrative examples of such ensemble reveal that the western Sahel undergoes substantial drying in future climate projections mostly due to a decrease in peak monsoon rainfall.

  9. Improving wave forecasting by integrating ensemble modelling and machine learning

    Science.gov (United States)

    O'Donncha, F.; Zhang, Y.; James, S. C.

    2017-12-01

    Modern smart-grid networks use technologies to instantly relay information on supply and demand to support effective decision making. Integration of renewable-energy resources with these systems demands accurate forecasting of energy production (and demand) capacities. For wave-energy converters, this requires wave-condition forecasting to enable estimates of energy production. Current operational wave forecasting systems exhibit substantial errors with wave-height RMSEs of 40 to 60 cm being typical, which limits the reliability of energy-generation predictions thereby impeding integration with the distribution grid. In this study, we integrate physics-based models with statistical learning aggregation techniques that combine forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The Simulating Waves Nearshore physics-based model is used to compute wind- and currents-augmented waves in the Monterey Bay area. Ensembles are developed based on multiple simulations perturbing input data (wave characteristics supplied at the model boundaries and winds) to the model. A learning-aggregation technique uses past observations and past model forecasts to calculate a weight for each model. The aggregated forecasts are compared to observation data to quantify the performance of the model ensemble and aggregation techniques. The appropriately weighted ensemble model outperforms an individual ensemble member with regard to forecasting wave conditions.

  10. Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

    Science.gov (United States)

    Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

    Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.

  11. Time-dependent generalized Gibbs ensembles in open quantum systems

    Science.gov (United States)

    Lange, Florian; Lenarčič, Zala; Rosch, Achim

    2018-04-01

    Generalized Gibbs ensembles have been used as powerful tools to describe the steady state of integrable many-particle quantum systems after a sudden change of the Hamiltonian. Here, we demonstrate numerically that they can be used for a much broader class of problems. We consider integrable systems in the presence of weak perturbations which break both integrability and drive the system to a state far from equilibrium. Under these conditions, we show that the steady state and the time evolution on long timescales can be accurately described by a (truncated) generalized Gibbs ensemble with time-dependent Lagrange parameters, determined from simple rate equations. We compare the numerically exact time evolutions of density matrices for small systems with a theory based on block-diagonal density matrices (diagonal ensemble) and a time-dependent generalized Gibbs ensemble containing only a small number of approximately conserved quantities, using the one-dimensional Heisenberg model with perturbations described by Lindblad operators as an example.

  12. A new gammagraphic and functional-based classification for hyperthyroidism

    International Nuclear Information System (INIS)

    Sanchez, J.; Lamata, F.; Cerdan, R.; Agilella, V.; Gastaminza, R.; Abusada, R.; Gonzales, M.; Martinez, M.

    2000-01-01

    The absence of an universal classification for hyperthyroidism's (HT), give rise to inadequate interpretation of series and trials, and prevents decision making. We offer a tentative classification based on gammagraphic and functional findings. Clinical records from patients who underwent thyroidectomy in our Department since 1967 to 1997 were reviewed. Those with functional measurements of hyperthyroidism were considered. All were managed according to the same preestablished guidelines. HT was the surgical indication in 694 (27,1%) of the 2559 thyroidectomy. Based on gammagraphic studies, we classified HTs in: parenchymatous increased-uptake, which could be diffuse, diffuse with cold nodules or diffuse with at least one nodule, and nodular increased-uptake (Autonomous Functioning Thyroid Nodes-AFTN), divided into solitary AFTN or toxic adenoma and multiple AFTN o toxic multi-nodular goiter. This gammagraphic-based classification in useful and has high sensitivity to detect these nodules assessing their activity, allowing us to make therapeutic decision making and, in some cases, to choose surgical technique. (authors)

  13. Decision theory for discrimination-aware classification

    KAUST Repository

    Kamiran, Faisal

    2012-12-01

    Social discrimination (e.g., against females) arising from data mining techniques is a growing concern worldwide. In recent years, several methods have been proposed for making classifiers learned over discriminatory data discriminationaware. However, these methods suffer from two major shortcomings: (1) They require either modifying the discriminatory data or tweaking a specific classification algorithm and (2) They are not flexible w.r.t. discrimination control and multiple sensitive attribute handling. In this paper, we present two solutions for discrimination-aware classification that neither require data modification nor classifier tweaking. Our first and second solutions exploit, respectively, the reject option of probabilistic classifier(s) and the disagreement region of general classifier ensembles to reduce discrimination. We relate both solutions with decision theory for better understanding of the process. Our experiments using real-world datasets demonstrate that our solutions outperform existing state-ofthe-art methods, especially at low discrimination which is a significant advantage. The superior performance coupled with flexible control over discrimination and easy applicability to multiple sensitive attributes makes our solutions an important step forward in practical discrimination-aware classification. © 2012 IEEE.

  14. An ensemble prediction approach to weekly Dengue cases forecasting based on climatic and terrain conditions

    Directory of Open Access Journals (Sweden)

    Sougata Deb

    2017-11-01

    Full Text Available Introduction: Dengue fever has been one of the most concerning endemic diseases of recent times. Every year, 50-100 million people get infected by the dengue virus across the world. Historically, it has been most prevalent in Southeast Asia and the Pacific Islands. In recent years, frequent dengue epidemics have started occurring in Latin America as well. This study focused on assessing the impact of different short and long-term lagged climatic predictors on dengue cases. Additionally, it assessed the impact of building an ensemble model using multiple time series and regression models, in improving prediction accuracy. Materials and Methods: Experimental data were based on two Latin American cities, viz. San Juan (Puerto Rico and Iquitos (Peru. Due to weather and geographic differences, San Juan recorded higher dengue incidences than Iquitos. Using lagged cross-correlations, this study confirmed the impact of temperature and vegetation on the number of dengue cases for both cities, though in varied degrees and time lags. An ensemble of multiple predictive models using an elaborate set of derived predictors was built and validated. Results: The proposed ensemble prediction achieved a mean absolute error of 21.55, 4.26 points lower than the 25.81 obtained by a standard negative binomial model. Changes in climatic conditions and urbanization were found to be strong predictors as established empirically in other researches. Some of the predictors were new and informative, which have not been explored in any other relevant studies yet. Discussion and Conclusions: Two original contributions were made in this research. Firstly, a focused and extensive feature engineering aligned with the mosquito lifecycle. Secondly, a novel covariate pattern-matching based prediction approach using past time series trend of the predictor variables. Increased accuracy of the proposed model over the benchmark model proved the appropriateness of the analytical approach

  15. Musical ensembles in Ancient Mesapotamia

    NARCIS (Netherlands)

    Krispijn, T.J.H.; Dumbrill, R.; Finkel, I.

    2010-01-01

    Identification of musical instruments from ancient Mesopotamia by comparing musical ensembles attested in Sumerian and Akkadian texts with depicted ensembles. Lexicographical contributions to the Sumerian and Akkadian lexicon.

  16. Conservation of Mass and Preservation of Positivity with Ensemble-Type Kalman Filter Algorithms

    Science.gov (United States)

    Janjic, Tijana; Mclaughlin, Dennis; Cohn, Stephen E.; Verlaan, Martin

    2014-01-01

    This paper considers the incorporation of constraints to enforce physically based conservation laws in the ensemble Kalman filter. In particular, constraints are used to ensure that the ensemble members and the ensemble mean conserve mass and remain nonnegative through measurement updates. In certain situations filtering algorithms such as the ensemble Kalman filter (EnKF) and ensemble transform Kalman filter (ETKF) yield updated ensembles that conserve mass but are negative, even though the actual states must be nonnegative. In such situations if negative values are set to zero, or a log transform is introduced, the total mass will not be conserved. In this study, mass and positivity are both preserved by formulating the filter update as a set of quadratic programming problems that incorporate non-negativity constraints. Simple numerical experiments indicate that this approach can have a significant positive impact on the posterior ensemble distribution, giving results that are more physically plausible both for individual ensemble members and for the ensemble mean. In two examples, an update that includes a non-negativity constraint is able to properly describe the transport of a sharp feature (e.g., a triangle or cone). A number of implementation questions still need to be addressed, particularly the need to develop a computationally efficient quadratic programming update for large ensemble.

  17. PSO-Ensemble Demo Application

    DEFF Research Database (Denmark)

    2004-01-01

    Within the framework of the PSO-Ensemble project (FU2101) a demo application has been created. The application use ECMWF ensemble forecasts. Two instances of the application are running; one for Nysted Offshore and one for the total production (except Horns Rev) in the Eltra area. The output...

  18. Multilevel ensemble Kalman filter

    KAUST Repository

    Chernov, Alexey; Hoel, Haakon; Law, Kody; Nobile, Fabio; Tempone, Raul

    2016-01-01

    This work embeds a multilevel Monte Carlo (MLMC) sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF). In terms of computational cost vs. approximation error the asymptotic performance of the multilevel ensemble Kalman filter (MLEnKF) is superior to the EnKF s.

  19. Multilevel ensemble Kalman filter

    KAUST Repository

    Chernov, Alexey

    2016-01-06

    This work embeds a multilevel Monte Carlo (MLMC) sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF). In terms of computational cost vs. approximation error the asymptotic performance of the multilevel ensemble Kalman filter (MLEnKF) is superior to the EnKF s.

  20. Multi-Model Ensemble Wake Vortex Prediction

    Science.gov (United States)

    Koerner, Stephan; Holzaepfel, Frank; Ahmad, Nash'at N.

    2015-01-01

    Several multi-model ensemble methods are investigated for predicting wake vortex transport and decay. This study is a joint effort between National Aeronautics and Space Administration and Deutsches Zentrum fuer Luft- und Raumfahrt to develop a multi-model ensemble capability using their wake models. An overview of different multi-model ensemble methods and their feasibility for wake applications is presented. The methods include Reliability Ensemble Averaging, Bayesian Model Averaging, and Monte Carlo Simulations. The methodologies are evaluated using data from wake vortex field experiments.

  1. Assessment of robustness and significance of climate change signals for an ensemble of distribution-based scaled climate projections

    DEFF Research Database (Denmark)

    Seaby, Lauren Paige; Refsgaard, J.C.; Sonnenborg, T.O.

    2013-01-01

    An ensemble of 11 regional climate model (RCM) projections are analysed for Denmark from a hydrological modelling inputs perspective. Two bias correction approaches are applied: a relatively simple monthly delta change (DC) method and a more complex daily distribution-based scaling (DBS) method...

  2. Three-dimensional visualization of ensemble weather forecasts – Part 2: Forecasting warm conveyor belt situations for aircraft-based field campaigns

    Directory of Open Access Journals (Sweden)

    M. Rautenhaus

    2015-07-01

    Full Text Available We present the application of interactive three-dimensional (3-D visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 (THORPEX – North Atlantic Waveguide and Downstream Impact Experiment campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts (WCBs has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the European Centre for Medium Range Weather Forecasts (ECMWF ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and grid spacing of the forecast wind field. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (3 to 7 days before take-off.

  3. [Computer aided diagnosis model for lung tumor based on ensemble convolutional neural network].

    Science.gov (United States)

    Wang, Yuanyuan; Zhou, Tao; Lu, Huiling; Wu, Cuiying; Yang, Pengfei

    2017-08-01

    The convolutional neural network (CNN) could be used on computer-aided diagnosis of lung tumor with positron emission tomography (PET)/computed tomography (CT), which can provide accurate quantitative analysis to compensate for visual inertia and defects in gray-scale sensitivity, and help doctors diagnose accurately. Firstly, parameter migration method is used to build three CNNs (CT-CNN, PET-CNN, and PET/CT-CNN) for lung tumor recognition in CT, PET, and PET/CT image, respectively. Then, we aimed at CT-CNN to obtain the appropriate model parameters for CNN training through analysis the influence of model parameters such as epochs, batchsize and image scale on recognition rate and training time. Finally, three single CNNs are used to construct ensemble CNN, and then lung tumor PET/CT recognition was completed through relative majority vote method and the performance between ensemble CNN and single CNN was compared. The experiment results show that the ensemble CNN is better than single CNN on computer-aided diagnosis of lung tumor.

  4. Performance Evaluation of Frequency Transform Based Block Classification of Compound Image Segmentation Techniques

    Science.gov (United States)

    Selwyn, Ebenezer Juliet; Florinabel, D. Jemi

    2018-04-01

    Compound image segmentation plays a vital role in the compression of computer screen images. Computer screen images are images which are mixed with textual, graphical, or pictorial contents. In this paper, we present a comparison of two transform based block classification of compound images based on metrics like speed of classification, precision and recall rate. Block based classification approaches normally divide the compound images into fixed size blocks of non-overlapping in nature. Then frequency transform like Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are applied over each block. Mean and standard deviation are computed for each 8 × 8 block and are used as features set to classify the compound images into text/graphics and picture/background block. The classification accuracy of block classification based segmentation techniques are measured by evaluation metrics like precision and recall rate. Compound images of smooth background and complex background images containing text of varying size, colour and orientation are considered for testing. Experimental evidence shows that the DWT based segmentation provides significant improvement in recall rate and precision rate approximately 2.3% than DCT based segmentation with an increase in block classification time for both smooth and complex background images.

  5. A Hierarchical Method for Transient Stability Prediction of Power Systems Using the Confidence of a SVM-Based Ensemble Classifier

    Directory of Open Access Journals (Sweden)

    Yanzhen Zhou

    2016-09-01

    Full Text Available Machine learning techniques have been widely used in transient stability prediction of power systems. When using the post-fault dynamic responses, it is difficult to draw a definite conclusion about how long the duration of response data used should be in order to balance the accuracy and speed. Besides, previous studies have the problem of lacking consideration for the confidence level. To solve these problems, a hierarchical method for transient stability prediction based on the confidence of ensemble classifier using multiple support vector machines (SVMs is proposed. Firstly, multiple datasets are generated by bootstrap sampling, then features are randomly picked up to compress the datasets. Secondly, the confidence indices are defined and multiple SVMs are built based on these generated datasets. By synthesizing the probabilistic outputs of multiple SVMs, the prediction results and confidence of the ensemble classifier will be obtained. Finally, different ensemble classifiers with different response times are built to construct different layers of the proposed hierarchical scheme. The simulation results show that the proposed hierarchical method can balance the accuracy and rapidity of the transient stability prediction. Moreover, the hierarchical method can reduce the misjudgments of unstable instances and cooperate with the time domain simulation to insure the security and stability of power systems.

  6. Single-particle model of a strongly driven, dense, nanoscale quantum ensemble

    Science.gov (United States)

    DiLoreto, C. S.; Rangan, C.

    2018-01-01

    We study the effects of interatomic interactions on the quantum dynamics of a dense, nanoscale, atomic ensemble driven by a strong electromagnetic field. We use a self-consistent, mean-field technique based on the pseudospectral time-domain method and a full, three-directional basis to solve the coupled Maxwell-Liouville equations. We find that interatomic interactions generate a decoherence in the state of an ensemble on a much faster time scale than the excited-state lifetime of individual atoms. We present a single-particle model of the driven, dense ensemble by incorporating interactions into a dephasing rate. This single-particle model reproduces the essential physics of the full simulation and is an efficient way of rapidly estimating the collective dynamics of a dense ensemble.

  7. Ensemble stacking mitigates biases in inference of synaptic connectivity

    Directory of Open Access Journals (Sweden)

    Brendan Chambers

    2018-03-01

    Full Text Available A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches. Mapping the routing of spikes through local circuitry is crucial for understanding neocortical computation. Under appropriate experimental conditions, these maps can be used to infer likely patterns of synaptic recruitment, linking activity to underlying anatomical connections. Such inferences help to reveal the synaptic implementation of population dynamics and computation. We compare a number of standard functional measures to infer underlying connectivity. We find that regularization impacts measures

  8. Model dependence and its effect on ensemble projections in CMIP5

    Science.gov (United States)

    Abramowitz, G.; Bishop, C.

    2013-12-01

    Conceptually, the notion of model dependence within climate model ensembles is relatively simple - modelling groups share a literature base, parametrisations, data sets and even model code - the potential for dependence in sampling different climate futures is clear. How though can this conceptual problem inform a practical solution that demonstrably improves the ensemble mean and ensemble variance as an estimate of system uncertainty? While some research has already focused on error correlation or error covariance as a candidate to improve ensemble mean estimates, a complete definition of independence must at least implicitly subscribe to an ensemble interpretation paradigm, such as the 'truth-plus-error', 'indistinguishable', or more recently 'replicate Earth' paradigm. Using a definition of model dependence based on error covariance within the replicate Earth paradigm, this presentation will show that accounting for dependence in surface air temperature gives cooler projections in CMIP5 - by as much as 20% globally in some RCPs - although results differ significantly for each RCP, especially regionally. The fact that the change afforded by accounting for dependence across different RCPs is different is not an inconsistent result. Different numbers of submissions to each RCP by different modelling groups mean that differences in projections from different RCPs are not entirely about RCP forcing conditions - they also reflect different sampling strategies.

  9. Noodles: a tool for visualization of numerical weather model ensemble uncertainty.

    Science.gov (United States)

    Sanyal, Jibonananda; Zhang, Song; Dyer, Jamie; Mercer, Andrew; Amburn, Philip; Moorhead, Robert J

    2010-01-01

    Numerical weather prediction ensembles are routinely used for operational weather forecasting. The members of these ensembles are individual simulations with either slightly perturbed initial conditions or different model parameterizations, or occasionally both. Multi-member ensemble output is usually large, multivariate, and challenging to interpret interactively. Forecast meteorologists are interested in understanding the uncertainties associated with numerical weather prediction; specifically variability between the ensemble members. Currently, visualization of ensemble members is mostly accomplished through spaghetti plots of a single mid-troposphere pressure surface height contour. In order to explore new uncertainty visualization methods, the Weather Research and Forecasting (WRF) model was used to create a 48-hour, 18 member parameterization ensemble of the 13 March 1993 "Superstorm". A tool was designed to interactively explore the ensemble uncertainty of three important weather variables: water-vapor mixing ratio, perturbation potential temperature, and perturbation pressure. Uncertainty was quantified using individual ensemble member standard deviation, inter-quartile range, and the width of the 95% confidence interval. Bootstrapping was employed to overcome the dependence on normality in the uncertainty metrics. A coordinated view of ribbon and glyph-based uncertainty visualization, spaghetti plots, iso-pressure colormaps, and data transect plots was provided to two meteorologists for expert evaluation. They found it useful in assessing uncertainty in the data, especially in finding outliers in the ensemble run and therefore avoiding the WRF parameterizations that lead to these outliers. Additionally, the meteorologists could identify spatial regions where the uncertainty was significantly high, allowing for identification of poorly simulated storm environments and physical interpretation of these model issues.

  10. Application of NARR-based NLDAS Ensemble Simulations to Continental-Scale Drought Monitoring

    Science.gov (United States)

    Alonge, C. J.; Cosgrove, B. A.

    2008-05-01

    Government estimates indicate that droughts cause billions of dollars of damage to agricultural interests each year. More effective identification of droughts would directly benefit decision makers, and would allow for the more efficient allocation of resources that might mitigate the event. Land data assimilation systems, with their high quality representations of soil moisture, present an ideal platform for drought monitoring, and offer many advantages over traditional modeling systems. The recently released North American Regional Reanalysis (NARR) covers the NLDAS domain and provides all fields necessary to force the NLDAS for 27 years. This presents an ideal opportunity to combine NARR and NLDAS resources into an effective real-time drought monitor. Toward this end, our project seeks to validate and explore the NARR's suitability as a base for drought monitoring applications - both in terms of data set length and accuracy. Along the same lines, the project will examine the impact of the use of different (longer) LDAS model climatologies on drought monitoring, and will explore the advantages of ensemble simulations versus single model simulations in drought monitoring activities. We also plan to produce a NARR- and observation-based high quality 27 year, 1/8th degree, 3-hourly, land surface and meteorological forcing data sets. An investigation of the best way to force an LDAS-type system will also be made, with traditional NLDAS and NLDASE forcing options explored. This presentation will focus on an overview of the drought monitoring project, and will include a summary of recent progress. Developments include the generation of forcing data sets, ensemble LSM output, and production of model-based drought indices over the entire NLDAS domain. Project forcing files use 32km NARR model output as a data backbone, and include observed precipitation (blended CPC gauge, PRISM gauge, Stage II, HPD, and CMORPH) and a GOES-based bias correction of downward solar

  11. Monthly ENSO Forecast Skill and Lagged Ensemble Size

    Science.gov (United States)

    Trenary, L.; DelSole, T.; Tippett, M. K.; Pegion, K.

    2018-04-01

    The mean square error (MSE) of a lagged ensemble of monthly forecasts of the Niño 3.4 index from the Climate Forecast System (CFSv2) is examined with respect to ensemble size and configuration. Although the real-time forecast is initialized 4 times per day, it is possible to infer the MSE for arbitrary initialization frequency and for burst ensembles by fitting error covariances to a parametric model and then extrapolating to arbitrary ensemble size and initialization frequency. Applying this method to real-time forecasts, we find that the MSE consistently reaches a minimum for a lagged ensemble size between one and eight days, when four initializations per day are included. This ensemble size is consistent with the 8-10 day lagged ensemble configuration used operationally. Interestingly, the skill of both ensemble configurations is close to the estimated skill of the infinite ensemble. The skill of the weighted, lagged, and burst ensembles are found to be comparable. Certain unphysical features of the estimated error growth were tracked down to problems with the climatology and data discontinuities.

  12. Classification of pulmonary pathology from breath sounds using the wavelet packet transform and an extreme learning machine.

    Science.gov (United States)

    Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian; Huliraj, N; Revadi, S S

    2017-06-08

    Auscultation is a medical procedure used for the initial diagnosis and assessment of lung and heart diseases. From this perspective, we propose assessing the performance of the extreme learning machine (ELM) classifiers for the diagnosis of pulmonary pathology using breath sounds. Energy and entropy features were extracted from the breath sound using the wavelet packet transform. The statistical significance of the extracted features was evaluated by one-way analysis of variance (ANOVA). The extracted features were inputted into the ELM classifier. The maximum classification accuracies obtained for the conventional validation (CV) of the energy and entropy features were 97.36% and 98.37%, respectively, whereas the accuracies obtained for the cross validation (CRV) of the energy and entropy features were 96.80% and 97.91%, respectively. In addition, maximum classification accuracies of 98.25% and 99.25% were obtained for the CV and CRV of the ensemble features, respectively. The results indicate that the classification accuracy obtained with the ensemble features was higher than those obtained with the energy and entropy features.

  13. Graph-Based Semi-Supervised Hyperspectral Image Classification Using Spatial Information

    Science.gov (United States)

    Jamshidpour, N.; Homayouni, S.; Safari, A.

    2017-09-01

    Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.

  14. GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

    Directory of Open Access Journals (Sweden)

    N. Jamshidpour

    2017-09-01

    Full Text Available Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.

  15. Object-Based Change Detection in Urban Areas from High Spatial Resolution Images Based on Multiple Features and Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Xin Wang

    2018-02-01

    Full Text Available To improve the accuracy of change detection in urban areas using bi-temporal high-resolution remote sensing images, a novel object-based change detection scheme combining multiple features and ensemble learning is proposed in this paper. Image segmentation is conducted to determine the objects in bi-temporal images separately. Subsequently, three kinds of object features, i.e., spectral, shape and texture, are extracted. Using the image differencing process, a difference image is generated and used as the input for nonlinear supervised classifiers, including k-nearest neighbor, support vector machine, extreme learning machine and random forest. Finally, the results of multiple classifiers are integrated using an ensemble rule called weighted voting to generate the final change detection result. Experimental results of two pairs of real high-resolution remote sensing datasets demonstrate that the proposed approach outperforms the traditional methods in terms of overall accuracy and generates change detection maps with a higher number of homogeneous regions in urban areas. Moreover, the influences of segmentation scale and the feature selection strategy on the change detection performance are also analyzed and discussed.

  16. Key-phrase based classification of public health web pages.

    Science.gov (United States)

    Dolamic, Ljiljana; Boyer, Célia

    2013-01-01

    This paper describes and evaluates the public health web pages classification model based on key phrase extraction and matching. Easily extendible both in terms of new classes as well as the new language this method proves to be a good solution for text classification faced with the total lack of training data. To evaluate the proposed solution we have used a small collection of public health related web pages created by a double blind manual classification. Our experiments have shown that by choosing the adequate threshold value the desired value for either precision or recall can be achieved.

  17. Precision bounds for gradient magnetometry with atomic ensembles

    Science.gov (United States)

    Apellaniz, Iagoba; Urizar-Lanz, Iñigo; Zimborás, Zoltán; Hyllus, Philipp; Tóth, Géza

    2018-05-01

    We study gradient magnetometry with an ensemble of atoms with arbitrary spin. We calculate precision bounds for estimating the gradient of the magnetic field based on the quantum Fisher information. For quantum states that are invariant under homogeneous magnetic fields, we need to measure a single observable to estimate the gradient. On the other hand, for states that are sensitive to homogeneous fields, a simultaneous measurement is needed, as the homogeneous field must also be estimated. We prove that for the cases studied in this paper, such a measurement is feasible. We present a method to calculate precision bounds for gradient estimation with a chain of atoms or with two spatially separated atomic ensembles. We also consider a single atomic ensemble with an arbitrary density profile, where the atoms cannot be addressed individually, and which is a very relevant case for experiments. Our model can take into account even correlations between particle positions. While in most of the discussion we consider an ensemble of localized particles that are classical with respect to their spatial degree of freedom, we also discuss the case of gradient metrology with a single Bose-Einstein condensate.

  18. New technique for ensemble dressing combining Multimodel SuperEnsemble and precipitation PDF

    Science.gov (United States)

    Cane, D.; Milelli, M.

    2009-09-01

    The Multimodel SuperEnsemble technique (Krishnamurti et al., Science 285, 1548-1550, 1999) is a postprocessing method for the estimation of weather forecast parameters reducing direct model output errors. It differs from other ensemble analysis techniques by the use of an adequate weighting of the input forecast models to obtain a combined estimation of meteorological parameters. Weights are calculated by least-square minimization of the difference between the model and the observed field during a so-called training period. Although it can be applied successfully on the continuous parameters like temperature, humidity, wind speed and mean sea level pressure (Cane and Milelli, Meteorologische Zeitschrift, 15, 2, 2006), the Multimodel SuperEnsemble gives good results also when applied on the precipitation, a parameter quite difficult to handle with standard post-processing methods. Here we present our methodology for the Multimodel precipitation forecasts applied on a wide spectrum of results over Piemonte very dense non-GTS weather station network. We will focus particularly on an accurate statistical method for bias correction and on the ensemble dressing in agreement with the observed precipitation forecast-conditioned PDF. Acknowledgement: this work is supported by the Italian Civil Defence Department.

  19. Multimodel hydrological ensemble forecasts for the Baskatong catchment in Canada using the TIGGE database.

    Science.gov (United States)

    Tito Arandia Martinez, Fabian

    2014-05-01

    Adequate uncertainty assessment is an important issue in hydrological modelling. An important issue for hydropower producers is to obtain ensemble forecasts which truly grasp the uncertainty linked to upcoming streamflows. If properly assessed, this uncertainty can lead to optimal reservoir management and energy production (ex. [1]). The meteorological inputs to the hydrological model accounts for an important part of the total uncertainty in streamflow forecasting. Since the creation of the THORPEX initiative and the TIGGE database, access to meteorological ensemble forecasts from nine agencies throughout the world have been made available. This allows for hydrological ensemble forecasts based on multiple meteorological ensemble forecasts. Consequently, both the uncertainty linked to the architecture of the meteorological model and the uncertainty linked to the initial condition of the atmosphere can be accounted for. The main objective of this work is to show that a weighted combination of meteorological ensemble forecasts based on different atmospheric models can lead to improved hydrological ensemble forecasts, for horizons from one to ten days. This experiment is performed for the Baskatong watershed, a head subcatchment of the Gatineau watershed in the province of Quebec, in Canada. Baskatong watershed is of great importance for hydro-power production, as it comprises the main reservoir for the Gatineau watershed, on which there are six hydropower plants managed by Hydro-Québec. Since the 70's, they have been using pseudo ensemble forecast based on deterministic meteorological forecasts to which variability derived from past forecasting errors is added. We use a combination of meteorological ensemble forecasts from different models (precipitation and temperature) as the main inputs for hydrological model HSAMI ([2]). The meteorological ensembles from eight of the nine agencies available through TIGGE are weighted according to their individual performance and

  20. Data Pre-Analysis and Ensemble of Various Artificial Neural Networks for Monthly Streamflow Forecasting

    Directory of Open Access Journals (Sweden)

    Jianzhong Zhou

    2018-05-01

    Full Text Available This paper introduces three artificial neural network (ANN architectures for monthly streamflow forecasting: a radial basis function network, an extreme learning machine, and the Elman network. Three ensemble techniques, a simple average ensemble, a weighted average ensemble, and an ANN-based ensemble, were used to combine the outputs of the individual ANN models. The objective was to highlight the performance of the general regression neural network-based ensemble technique (GNE through an improvement of monthly streamflow forecasting accuracy. Before the construction of an ANN model, data preanalysis techniques, such as empirical wavelet transform (EWT, were exploited to eliminate the oscillations of the streamflow series. Additionally, a theory of chaos phase space reconstruction was used to select the most relevant and important input variables for forecasting. The proposed GNE ensemble model has been applied for the mean monthly streamflow observation data from the Wudongde hydrological station in the Jinsha River Basin, China. Comparisons and analysis of this study have demonstrated that the denoised streamflow time series was less disordered and unsystematic than was suggested by the original time series according to chaos theory. Thus, EWT can be adopted as an effective data preanalysis technique for the prediction of monthly streamflow. Concurrently, the GNE performed better when compared with other ensemble techniques.

  1. The Study of Land Use Classification Based on SPOT6 High Resolution Data

    OpenAIRE

    Wu Song; Jiang Qigang

    2016-01-01

    A method is carried out to quick classification extract of the type of land use in agricultural areas, which is based on the spot6 high resolution remote sensing classification data and used of the good nonlinear classification ability of support vector machine. The results show that the spot6 high resolution remote sensing classification data can realize land classification efficiently, the overall classification accuracy reached 88.79% and Kappa factor is 0.8632 which means that the classif...

  2. Rough set classification based on quantum logic

    Science.gov (United States)

    Hassan, Yasser F.

    2017-11-01

    By combining the advantages of quantum computing and soft computing, the paper shows that rough sets can be used with quantum logic for classification and recognition systems. We suggest the new definition of rough set theory as quantum logic theory. Rough approximations are essential elements in rough set theory, the quantum rough set model for set-valued data directly construct set approximation based on a kind of quantum similarity relation which is presented here. Theoretical analyses demonstrate that the new model for quantum rough sets has new type of decision rule with less redundancy which can be used to give accurate classification using principles of quantum superposition and non-linear quantum relations. To our knowledge, this is the first attempt aiming to define rough sets in representation of a quantum rather than logic or sets. The experiments on data-sets have demonstrated that the proposed model is more accuracy than the traditional rough sets in terms of finding optimal classifications.

  3. Multivariate localization methods for ensemble Kalman filtering

    KAUST Repository

    Roh, S.; Jun, M.; Szunyogh, I.; Genton, Marc G.

    2015-01-01

    the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function

  4. Organizational Data Classification Based on the Importance Concept of Complex Networks.

    Science.gov (United States)

    Carneiro, Murillo Guimaraes; Zhao, Liang

    2017-08-01

    Data classification is a common task, which can be performed by both computers and human beings. However, a fundamental difference between them can be observed: computer-based classification considers only physical features (e.g., similarity, distance, or distribution) of input data; by contrast, brain-based classification takes into account not only physical features, but also the organizational structure of data. In this paper, we figure out the data organizational structure for classification using complex networks constructed from training data. Specifically, an unlabeled instance is classified by the importance concept characterized by Google's PageRank measure of the underlying data networks. Before a test data instance is classified, a network is constructed from vector-based data set and the test instance is inserted into the network in a proper manner. To this end, we also propose a measure, called spatio-structural differential efficiency, to combine the physical and topological features of the input data. Such a method allows for the classification technique to capture a variety of data patterns using the unique importance measure. Extensive experiments demonstrate that the proposed technique has promising predictive performance on the detection of heart abnormalities.

  5. Verification of Ensemble Forecasts for the New York City Operations Support Tool

    Science.gov (United States)

    Day, G.; Schaake, J. C.; Thiemann, M.; Draijer, S.; Wang, L.

    2012-12-01

    The New York City water supply system operated by the Department of Environmental Protection (DEP) serves nine million people. It covers 2,000 square miles of portions of the Catskill, Delaware, and Croton watersheds, and it includes nineteen reservoirs and three controlled lakes. DEP is developing an Operations Support Tool (OST) to support its water supply operations and planning activities. OST includes historical and real-time data, a model of the water supply system complete with operating rules, and lake water quality models developed to evaluate alternatives for managing turbidity in the New York City Catskill reservoirs. OST will enable DEP to manage turbidity in its unfiltered system while satisfying its primary objective of meeting the City's water supply needs, in addition to considering secondary objectives of maintaining ecological flows, supporting fishery and recreation releases, and mitigating downstream flood peaks. The current version of OST relies on statistical forecasts of flows in the system based on recent observed flows. To improve short-term decision making, plans are being made to transition to National Weather Service (NWS) ensemble forecasts based on hydrologic models that account for short-term weather forecast skill, longer-term climate information, as well as the hydrologic state of the watersheds and recent observed flows. To ensure that the ensemble forecasts are unbiased and that the ensemble spread reflects the actual uncertainty of the forecasts, a statistical model has been developed to post-process the NWS ensemble forecasts to account for hydrologic model error as well as any inherent bias and uncertainty in initial model states, meteorological data and forecasts. The post-processor is designed to produce adjusted ensemble forecasts that are consistent with the DEP historical flow sequences that were used to develop the system operating rules. A set of historical hindcasts that is representative of the real-time ensemble

  6. Combined Kernel-Based BDT-SMO Classification of Hyperspectral Fused Images

    Directory of Open Access Journals (Sweden)

    Fenghua Huang

    2014-01-01

    Full Text Available To solve the poor generalization and flexibility problems that single kernel SVM classifiers have while classifying combined spectral and spatial features, this paper proposed a solution to improve the classification accuracy and efficiency of hyperspectral fused images: (1 different radial basis kernel functions (RBFs are employed for spectral and textural features, and a new combined radial basis kernel function (CRBF is proposed by combining them in a weighted manner; (2 the binary decision tree-based multiclass SMO (BDT-SMO is used in the classification of hyperspectral fused images; (3 experiments are carried out, where the single radial basis function- (SRBF- based BDT-SMO classifier and the CRBF-based BDT-SMO classifier are used, respectively, to classify the land usages of hyperspectral fused images, and genetic algorithms (GA are used to optimize the kernel parameters of the classifiers. The results show that, compared with SRBF, CRBF-based BDT-SMO classifiers display greater classification accuracy and efficiency.

  7. Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.

    Science.gov (United States)

    Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw

    2018-07-01

    Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Pathological Bases for a Robust Application of Cancer Molecular Classification

    Directory of Open Access Journals (Sweden)

    Salvador J. Diaz-Cano

    2015-04-01

    Full Text Available Any robust classification system depends on its purpose and must refer to accepted standards, its strength relying on predictive values and a careful consideration of known factors that can affect its reliability. In this context, a molecular classification of human cancer must refer to the current gold standard (histological classification and try to improve it with key prognosticators for metastatic potential, staging and grading. Although organ-specific examples have been published based on proteomics, transcriptomics and genomics evaluations, the most popular approach uses gene expression analysis as a direct correlate of cellular differentiation, which represents the key feature of the histological classification. RNA is a labile molecule that varies significantly according with the preservation protocol, its transcription reflect the adaptation of the tumor cells to the microenvironment, it can be passed through mechanisms of intercellular transference of genetic information (exosomes, and it is exposed to epigenetic modifications. More robust classifications should be based on stable molecules, at the genetic level represented by DNA to improve reliability, and its analysis must deal with the concept of intratumoral heterogeneity, which is at the origin of tumor progression and is the byproduct of the selection process during the clonal expansion and progression of neoplasms. The simultaneous analysis of multiple DNA targets and next generation sequencing offer the best practical approach for an analytical genomic classification of tumors.

  9. Representing and Reasoning with the Internet of Things: a Modular Rule-Based Model for Ensembles of Context-Aware Smart Things

    Directory of Open Access Journals (Sweden)

    S. W. Loke

    2016-03-01

    Full Text Available Context-aware smart things are capable of computational behaviour based on sensing the physical world, inferring context from the sensed data, and acting on the sensed context. A collection of such things can form what we call a thing-ensemble, when they have the ability to communicate with one another (over a short range network such as Bluetooth, or the Internet, i.e. the Internet of Things (IoT concept, sense each other, and when each of them might play certain roles with respect to each other. Each smart thing in a thing-ensemble might have its own context-aware behaviours which when integrated with other smart things yield behaviours that are not straightforward to reason with. We present Sigma, a language of operators, inspired from modular logic programming, for specifying and reasoning with combined behaviours among smart things in a thing-ensemble. We show numerous examples of the use of Sigma for describing a range of behaviours over a diverse range of thing-ensembles, from sensor networks to smart digital frames, demonstrating the versatility of our approach. We contend that our operator approach abstracts away low-level communication and protocol details, and allows systems of context-aware things to be designed and built in a compositional and incremental manner.

  10. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  11. Notes on Well-Posed, Ensemble Averaged Conservation Equations for Multiphase, Multi-Component, and Multi-Material Flows

    International Nuclear Information System (INIS)

    Ray A. Berry

    2005-01-01

    ensemble averaging can also be used to produce the governing equation systems. In fact volume and time averaging can be viewed as special cases of ensemble averaging. Ensemble averaging is beginning to gain some notice, for example the general-purpose multi-material flow simulation code CFDLib under continuing developed at the Los Alamos National Laboratory [Kashiwa and Rauenzahn 1994] is based on an ensemble averaged formulation. The purpose of this short note is to give an introduction to the ensemble averaging methodology and to show how ensemble averaged balance equations and entropy inequality can be obtained from the microscopic balances. It then details some seven-equation, two-pressure, two-velocity hyperbolic, well-posed models for two-phase flows. Lastly, a simple example is presented of a model in which the flow consists of two barotropic fluids with no phase change in which an equilibrium pressure equation is obtained in the spirit of pressure-based methods of computational fluid dynamics

  12. Classification of high resolution imagery based on fusion of multiscale texture features

    International Nuclear Information System (INIS)

    Liu, Jinxiu; Liu, Huiping; Lv, Ying; Xue, Xiaojuan

    2014-01-01

    In high resolution data classification process, combining texture features with spectral bands can effectively improve the classification accuracy. However, the window size which is difficult to choose is regarded as an important factor influencing overall classification accuracy in textural classification and current approaches to image texture analysis only depend on a single moving window which ignores different scale features of various land cover types. In this paper, we propose a new method based on the fusion of multiscale texture features to overcome these problems. The main steps in new method include the classification of fixed window size spectral/textural images from 3×3 to 15×15 and comparison of all the posterior possibility values for every pixel, as a result the biggest probability value is given to the pixel and the pixel belongs to a certain land cover type automatically. The proposed approach is tested on University of Pavia ROSIS data. The results indicate that the new method improve the classification accuracy compared to results of methods based on fixed window size textural classification

  13. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  14. Locality-preserving sparse representation-based classification in hyperspectral imagery

    Science.gov (United States)

    Gao, Lianru; Yu, Haoyang; Zhang, Bing; Li, Qingting

    2016-10-01

    This paper proposes to combine locality-preserving projections (LPP) and sparse representation (SR) for hyperspectral image classification. The LPP is first used to reduce the dimensionality of all the training and testing data by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold, where the high-dimensional data lies. Then, SR codes the projected testing pixels as sparse linear combinations of all the training samples to classify the testing pixels by evaluating which class leads to the minimum approximation error. The integration of LPP and SR represents an innovative contribution to the literature. The proposed approach, called locality-preserving SR-based classification, addresses the imbalance between high dimensionality of hyperspectral data and the limited number of training samples. Experimental results on three real hyperspectral data sets demonstrate that the proposed approach outperforms the original counterpart, i.e., SR-based classification.

  15. AN ADABOOST OPTIMIZED CCFIS BASED CLASSIFICATION MODEL FOR BREAST CANCER DETECTION

    Directory of Open Access Journals (Sweden)

    CHANDRASEKAR RAVI

    2017-06-01

    Full Text Available Classification is a Data Mining technique used for building a prototype of the data behaviour, using which an unseen data can be classified into one of the defined classes. Several researchers have proposed classification techniques but most of them did not emphasis much on the misclassified instances and storage space. In this paper, a classification model is proposed that takes into account the misclassified instances and storage space. The classification model is efficiently developed using a tree structure for reducing the storage complexity and uses single scan of the dataset. During the training phase, Class-based Closed Frequent ItemSets (CCFIS were mined from the training dataset in the form of a tree structure. The classification model has been developed using the CCFIS and a similarity measure based on Longest Common Subsequence (LCS. Further, the Particle Swarm Optimization algorithm is applied on the generated CCFIS, which assigns weights to the itemsets and their associated classes. Most of the classifiers are correctly classifying the common instances but they misclassify the rare instances. In view of that, AdaBoost algorithm has been used to boost the weights of the misclassified instances in the previous round so as to include them in the training phase to classify the rare instances. This improves the accuracy of the classification model. During the testing phase, the classification model is used to classify the instances of the test dataset. Breast Cancer dataset from UCI repository is used for experiment. Experimental analysis shows that the accuracy of the proposed classification model outperforms the PSOAdaBoost-Sequence classifier by 7% superior to other approaches like Naïve Bayes Classifier, Support Vector Machine Classifier, Instance Based Classifier, ID3 Classifier, J48 Classifier, etc.

  16. Ensemble method: Community detection based on game theory

    Science.gov (United States)

    Zhang, Xia; Xia, Zhengyou; Xu, Shengwu; Wang, J. D.

    2014-08-01

    Timely and cost-effective analytics over social network has emerged as a key ingredient for success in many businesses and government endeavors. Community detection is an active research area of relevance to analyze online social network. The problem of selecting a particular community detection algorithm is crucial if the aim is to unveil the community structure of a network. The choice of a given methodology could affect the outcome of the experiments because different algorithms have different advantages and depend on tuning specific parameters. In this paper, we propose a community division model based on the notion of game theory, which can combine advantages of previous algorithms effectively to get a better community classification result. By making experiments on some standard dataset, it verifies that our community detection model based on game theory is valid and better.

  17. Ensemble Forecasts with Useful Skill-Spread Relationships for African meningitis and Asia Streamflow Forecasting

    Science.gov (United States)

    Hopson, T. M.

    2014-12-01

    One potential benefit of an ensemble prediction system (EPS) is its capacity to forecast its own forecast error through the ensemble spread-error relationship. In practice, an EPS is often quite limited in its ability to represent the variable expectation of forecast error through the variable dispersion of the ensemble, and perhaps more fundamentally, in its ability to provide enough variability in the ensembles dispersion to make the skill-spread relationship even potentially useful (irrespective of whether the EPS is well-calibrated or not). In this paper we examine the ensemble skill-spread relationship of an ensemble constructed from the TIGGE (THORPEX Interactive Grand Global Ensemble) dataset of global forecasts and a combination of multi-model and post-processing approaches. Both of the multi-model and post-processing techniques are based on quantile regression (QR) under a step-wise forward selection framework leading to ensemble forecasts with both good reliability and sharpness. The methodology utilizes the ensemble's ability to self-diagnose forecast instability to produce calibrated forecasts with informative skill-spread relationships. A context for these concepts is provided by assessing the constructed ensemble in forecasting district-level humidity impacting the incidence of meningitis in the meningitis belt of Africa, and in forecasting flooding events in the Brahmaputra and Ganges basins of South Asia.

  18. Effect of land model ensemble versus coupled model ensemble on the simulation of precipitation climatology and variability

    Science.gov (United States)

    Wei, Jiangfeng; Dirmeyer, Paul A.; Yang, Zong-Liang; Chen, Haishan

    2017-10-01

    Through a series of model simulations with an atmospheric general circulation model coupled to three different land surface models, this study investigates the impacts of land model ensembles and coupled model ensemble on precipitation simulation. It is found that coupling an ensemble of land models to an atmospheric model has a very minor impact on the improvement of precipitation climatology and variability, but a simple ensemble average of the precipitation from three individually coupled land-atmosphere models produces better results, especially for precipitation variability. The generally weak impact of land processes on precipitation should be the main reason that the land model ensembles do not improve precipitation simulation. However, if there are big biases in the land surface model or land surface data set, correcting them could improve the simulated climate, especially for well-constrained regional climate simulations.

  19. ENSEMBLE methods to reconcile disparate national long range dispersion forecasts

    OpenAIRE

    Mikkelsen, Torben; Galmarini, S.; Bianconi, R.; French, S.

    2003-01-01

    ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparatenational forecasts for long-range dispersion. ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an a...

  20. Rainfall downscaling of weekly ensemble forecasts using self-organising maps

    Directory of Open Access Journals (Sweden)

    Masamichi Ohba

    2016-03-01

    Full Text Available This study presents an application of self-organising maps (SOMs to downscaling medium-range ensemble forecasts and probabilistic prediction of local precipitation in Japan. SOM was applied to analyse and connect the relationship between atmospheric patterns over Japan and local high-resolution precipitation data. Multiple SOM was simultaneously employed on four variables derived from the JRA-55 reanalysis over the area of study (south-western Japan, and a two-dimensional lattice of weather patterns (WPs was obtained. Weekly ensemble forecasts can be downscaled to local precipitation using the obtained multiple SOM. The downscaled precipitation is derived by the five SOM lattices based on the WPs of the global model ensemble forecasts for a particular day in 2009–2011. Because this method effectively handles the stochastic uncertainties from the large number of ensemble members, a probabilistic local precipitation is easily and quickly obtained from the ensemble forecasts. This downscaling of ensemble forecasts provides results better than those from a 20-km global spectral model (i.e. capturing the relatively detailed precipitation distribution over the region. To capture the effect of the detailed pattern differences in each SOM node, a statistical model is additionally concreted for each SOM node. The predictability skill of the ensemble forecasts is significantly improved under the neural network-statistics hybrid-downscaling technique, which then brings a much better skill score than the traditional method. It is expected that the results of this study will provide better guidance to the user community and contribute to the future development of dam-management models.

  1. A multi-scale ensemble-based framework for forecasting compound coastal-riverine flooding: The Hackensack-Passaic watershed and Newark Bay

    Science.gov (United States)

    Saleh, F.; Ramaswamy, V.; Wang, Y.; Georgas, N.; Blumberg, A.; Pullen, J.

    2017-12-01

    Estuarine regions can experience compound impacts from coastal storm surge and riverine flooding. The challenges in forecasting flooding in such areas are multi-faceted due to uncertainties associated with meteorological drivers and interactions between hydrological and coastal processes. The objective of this work is to evaluate how uncertainties from meteorological predictions propagate through an ensemble-based flood prediction framework and translate into uncertainties in simulated inundation extents. A multi-scale framework, consisting of hydrologic, coastal and hydrodynamic models, was used to simulate two extreme flood events at the confluence of the Passaic and Hackensack rivers and Newark Bay. The events were Hurricane Irene (2011), a combination of inland flooding and coastal storm surge, and Hurricane Sandy (2012) where coastal storm surge was the dominant component. The hydrodynamic component of the framework was first forced with measured streamflow and ocean water level data to establish baseline inundation extents with the best available forcing data. The coastal and hydrologic models were then forced with meteorological predictions from 21 ensemble members of the Global Ensemble Forecast System (GEFS) to retrospectively represent potential future conditions up to 96 hours prior to the events. Inundation extents produced by the hydrodynamic model, forced with the 95th percentile of the ensemble-based coastal and hydrologic boundary conditions, were in good agreement with baseline conditions for both events. The USGS reanalysis of Hurricane Sandy inundation extents was encapsulated between the 50th and 95th percentile of the forecasted inundation extents, and that of Hurricane Irene was similar but with caveats associated with data availability and reliability. This work highlights the importance of accounting for meteorological uncertainty to represent a range of possible future inundation extents at high resolution (∼m).

  2. Ligand and structure-based classification models for Prediction of P-glycoprotein inhibitors

    DEFF Research Database (Denmark)

    Klepsch, Freya; Poongavanam, Vasanthanathan; Ecker, Gerhard Franz

    2014-01-01

    an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and non-inhibitors, correctly predicting 73/75 % of the external test set compounds. Classification based on the docking experiments using the scoring function Chem...

  3. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

    Science.gov (United States)

    Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

    2015-01-01

    Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  4. Polarimetric SAR image classification based on discriminative dictionary learning model

    Science.gov (United States)

    Sang, Cheng Wei; Sun, Hong

    2018-03-01

    Polarimetric SAR (PolSAR) image classification is one of the important applications of PolSAR remote sensing. It is a difficult high-dimension nonlinear mapping problem, the sparse representations based on learning overcomplete dictionary have shown great potential to solve such problem. The overcomplete dictionary plays an important role in PolSAR image classification, however for PolSAR image complex scenes, features shared by different classes will weaken the discrimination of learned dictionary, so as to degrade classification performance. In this paper, we propose a novel overcomplete dictionary learning model to enhance the discrimination of dictionary. The learned overcomplete dictionary by the proposed model is more discriminative and very suitable for PolSAR classification.

  5. Assessing an ensemble Kalman filter inference of Manning’s n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    KAUST Repository

    Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

    2017-01-01

    an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model

  6. Pre- and post-processing of hydro-meteorological ensembles for the Norwegian flood forecasting system in 145 basins.

    Science.gov (United States)

    Jahr Hegdahl, Trine; Steinsland, Ingelin; Merete Tallaksen, Lena; Engeland, Kolbjørn

    2016-04-01

    Probabilistic flood forecasting has an added value for decision making. The Norwegian flood forecasting service is based on a flood forecasting model that run for 145 basins. Covering all of Norway the basins differ in both size and hydrological regime. Currently the flood forecasting is based on deterministic meteorological forecasts, and an auto-regressive procedure is used to achieve probabilistic forecasts. An alternative approach is to use meteorological and hydrological ensemble forecasts to quantify the uncertainty in forecasted streamflow. The hydrological ensembles are based on forcing a hydrological model with meteorological ensemble forecasts of precipitation and temperature. However, the ensembles of precipitation are often biased and the spread is too small, especially for the shortest lead times, i.e. they are not calibrated. These properties will, to some extent, propagate to hydrological ensembles, that most likely will be uncalibrated as well. Pre- and post-processing methods are commonly used to obtain calibrated meteorological and hydrological ensembles respectively. Quantitative studies showing the effect of the combined processing of the meteorological (pre-processing) and the hydrological (post-processing) ensembles are however few. The aim of this study is to evaluate the influence of pre- and post-processing on the skill of streamflow predictions, and we will especially investigate if the forecasting skill depends on lead-time, basin size and hydrological regime. This aim is achieved by applying the 51 medium-range ensemble forecast of precipitation and temperature provided by the European Center of Medium-Range Weather Forecast (ECMWF). These ensembles are used as input to the operational Norwegian flood forecasting model, both raw and pre-processed. Precipitation ensembles are calibrated using a zero-adjusted gamma distribution. Temperature ensembles are calibrated using a Gaussian distribution and altitude corrected by a constant gradient

  7. Clustered iterative stochastic ensemble method for multi-modal calibration of subsurface flow models

    KAUST Repository

    Elsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

    2013-01-01

    estimation. ISEM is augmented with a clustering step based on k-means algorithm to form sub-ensembles. These sub-ensembles are used to explore different parts of the search space. Clusters are updated at regular intervals of the algorithm to allow merging

  8. An object-oriented classification method of high resolution imagery based on improved AdaTree

    International Nuclear Information System (INIS)

    Xiaohe, Zhang; Liang, Zhai; Jixian, Zhang; Huiyong, Sang

    2014-01-01

    With the popularity of the application using high spatial resolution remote sensing image, more and more studies paid attention to object-oriented classification on image segmentation as well as automatic classification after image segmentation. This paper proposed a fast method of object-oriented automatic classification. First, edge-based or FNEA-based segmentation was used to identify image objects and the values of most suitable attributes of image objects for classification were calculated. Then a certain number of samples from the image objects were selected as training data for improved AdaTree algorithm to get classification rules. Finally, the image objects could be classified easily using these rules. In the AdaTree, we mainly modified the final hypothesis to get classification rules. In the experiment with WorldView2 image, the result of the method based on AdaTree showed obvious accuracy and efficient improvement compared with the method based on SVM with the kappa coefficient achieving 0.9242

  9. New concept of statistical ensembles

    International Nuclear Information System (INIS)

    Gorenstein, M.I.

    2009-01-01

    An extension of the standard concept of the statistical ensembles is suggested. Namely, the statistical ensembles with extensive quantities fluctuating according to an externally given distribution is introduced. Applications in the statistical models of multiple hadron production in high energy physics are discussed.

  10. Building an asynchronous web-based tool for machine learning classification.

    Science.gov (United States)

    Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila

    2002-01-01

    Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

  11. Application of In-Segment Multiple Sampling in Object-Based Classification

    Directory of Open Access Journals (Sweden)

    Nataša Đurić

    2014-12-01

    Full Text Available When object-based analysis is applied to very high-resolution imagery, pixels within the segments reveal large spectral inhomogeneity; their distribution can be considered complex rather than normal. When normality is violated, the classification methods that rely on the assumption of normally distributed data are not as successful or accurate. It is hard to detect normality violations in small samples. The segmentation process produces segments that vary highly in size; samples can be very big or very small. This paper investigates whether the complexity within the segment can be addressed using multiple random sampling of segment pixels and multiple calculations of similarity measures. In order to analyze the effect sampling has on classification results, statistics and probability value equations of non-parametric two-sample Kolmogorov-Smirnov test and parametric Student’s t-test are selected as similarity measures in the classification process. The performance of both classifiers was assessed on a WorldView-2 image for four land cover classes (roads, buildings, grass and trees and compared to two commonly used object-based classifiers—k-Nearest Neighbor (k-NN and Support Vector Machine (SVM. Both proposed classifiers showed a slight improvement in the overall classification accuracies and produced more accurate classification maps when compared to the ground truth image.

  12. Lidar-based individual tree species classification using convolutional neural network

    Science.gov (United States)

    Mizoguchi, Tomohiro; Ishii, Akira; Nakamura, Hiroyuki; Inoue, Tsuyoshi; Takamatsu, Hisashi

    2017-06-01

    Terrestrial lidar is commonly used for detailed documentation in the field of forest inventory investigation. Recent improvements of point cloud processing techniques enabled efficient and precise computation of an individual tree shape parameters, such as breast-height diameter, height, and volume. However, tree species are manually specified by skilled workers to date. Previous works for automatic tree species classification mainly focused on aerial or satellite images, and few works have been reported for classification techniques using ground-based sensor data. Several candidate sensors can be considered for classification, such as RGB or multi/hyper spectral cameras. Above all candidates, we use terrestrial lidar because it can obtain high resolution point cloud in the dark forest. We selected bark texture for the classification criteria, since they clearly represent unique characteristics of each tree and do not change their appearance under seasonable variation and aged deterioration. In this paper, we propose a new method for automatic individual tree species classification based on terrestrial lidar using Convolutional Neural Network (CNN). The key component is the creation step of a depth image which well describe the characteristics of each species from a point cloud. We focus on Japanese cedar and cypress which cover the large part of domestic forest. Our experimental results demonstrate the effectiveness of our proposed method.

  13. The Ensembl genome database project.

    Science.gov (United States)

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  14. Changing Histopathological Diagnostics by Genome-Based Tumor Classification

    Directory of Open Access Journals (Sweden)

    Michael Kloth

    2014-05-01

    Full Text Available Traditionally, tumors are classified by histopathological criteria, i.e., based on their specific morphological appearances. Consequently, current therapeutic decisions in oncology are strongly influenced by histology rather than underlying molecular or genomic aberrations. The increase of information on molecular changes however, enabled by the Human Genome Project and the International Cancer Genome Consortium as well as the manifold advances in molecular biology and high-throughput sequencing techniques, inaugurated the integration of genomic information into disease classification. Furthermore, in some cases it became evident that former classifications needed major revision and adaption. Such adaptations are often required by understanding the pathogenesis of a disease from a specific molecular alteration, using this molecular driver for targeted and highly effective therapies. Altogether, reclassifications should lead to higher information content of the underlying diagnoses, reflecting their molecular pathogenesis and resulting in optimized and individual therapeutic decisions. The objective of this article is to summarize some particularly important examples of genome-based classification approaches and associated therapeutic concepts. In addition to reviewing disease specific markers, we focus on potentially therapeutic or predictive markers and the relevance of molecular diagnostics in disease monitoring.

  15. ENSEMBLE methods to reconcile disparate national long range dispersion forecasts

    DEFF Research Database (Denmark)

    Mikkelsen, Torben; Galmarini, S.; Bianconi, R.

    2003-01-01

    ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparatenational forecasts for long-range dispersion...... emergency and meteorological forecasting centres, which may choose to integrate them directly intooperational emergency information systems, or possibly use them as a basis for future system development.......ENSEMBLE is a web-based decision support system for real-time exchange and evaluation of national long-range dispersion forecasts of nuclear releases with cross-boundary consequences. The system is developed with the purpose to reconcile among disparatenational forecasts for long-range dispersion....... ENSEMBLE addresses the problem of achieving a common coherent strategy across European national emergency management when national long-range dispersion forecasts differ from one another during an accidentalatmospheric release of radioactive material. A series of new decision-making “ENSEMBLE” procedures...

  16. Layered Ensemble Architecture for Time Series Forecasting.

    Science.gov (United States)

    Rahman, Md Mustafizur; Islam, Md Monirul; Murase, Kazuyuki; Yao, Xin

    2016-01-01

    Time series forecasting (TSF) has been widely used in many application areas such as science, engineering, and finance. The phenomena generating time series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past values, termed lag, for forecasting. This paper proposes a layered ensemble architecture (LEA) for TSF problems. Our LEA consists of two layers, each of which uses an ensemble of multilayer perceptron (MLP) networks. While the first ensemble layer tries to find an appropriate lag, the second ensemble layer employs the obtained lag for forecasting. Unlike most previous work on TSF, the proposed architecture considers both accuracy and diversity of the individual networks in constructing an ensemble. LEA trains different networks in the ensemble by using different training sets with an aim of maintaining diversity among the networks. However, it uses the appropriate lag and combines the best trained networks to construct the ensemble. This indicates LEAs emphasis on accuracy of the networks. The proposed architecture has been tested extensively on time series data of neural network (NN)3 and NN5 competitions. It has also been tested on several standard benchmark time series data. In terms of forecasting accuracy, our experimental results have revealed clearly that LEA is better than other ensemble and nonensemble methods.

  17. Good and Bad Neighborhood Approximations for Outlier Detection Ensembles

    DEFF Research Database (Denmark)

    Kirner, Evelyn; Schubert, Erich; Zimek, Arthur

    2017-01-01

    Outlier detection methods have used approximate neighborhoods in filter-refinement approaches. Outlier detection ensembles have used artificially obfuscated neighborhoods to achieve diverse ensemble members. Here we argue that outlier detection models could be based on approximate neighborhoods...... in the first place, thus gaining in both efficiency and effectiveness. It depends, however, on the type of approximation, as only some seem beneficial for the task of outlier detection, while no (large) benefit can be seen for others. In particular, we argue that space-filling curves are beneficial...

  18. Pathological Brain Detection Using Weiner Filtering, 2D-Discrete Wavelet Transform, Probabilistic PCA, and Random Subspace Ensemble Classifier

    Directory of Open Access Journals (Sweden)

    Debesh Jha

    2017-01-01

    Full Text Available Accurate diagnosis of pathological brain images is important for patient care, particularly in the early phase of the disease. Although numerous studies have used machine-learning techniques for the computer-aided diagnosis (CAD of pathological brain, previous methods encountered challenges in terms of the diagnostic efficiency owing to deficiencies in the choice of proper filtering techniques, neuroimaging biomarkers, and limited learning models. Magnetic resonance imaging (MRI is capable of providing enhanced information regarding the soft tissues, and therefore MR images are included in the proposed approach. In this study, we propose a new model that includes Wiener filtering for noise reduction, 2D-discrete wavelet transform (2D-DWT for feature extraction, probabilistic principal component analysis (PPCA for dimensionality reduction, and a random subspace ensemble (RSE classifier along with the K-nearest neighbors (KNN algorithm as a base classifier to classify brain images as pathological or normal ones. The proposed methods provide a significant improvement in classification results when compared to other studies. Based on 5×5 cross-validation (CV, the proposed method outperforms 21 state-of-the-art algorithms in terms of classification accuracy, sensitivity, and specificity for all four datasets used in the study.

  19. Non-Boltzmann Ensembles and Monte Carlo Simulations

    International Nuclear Information System (INIS)

    Murthy, K. P. N.

    2016-01-01

    Boltzmann sampling based on Metropolis algorithm has been extensively used for simulating a canonical ensemble and for calculating macroscopic properties of a closed system at desired temperatures. An estimate of a mechanical property, like energy, of an equilibrium system, is made by averaging over a large number microstates generated by Boltzmann Monte Carlo methods. This is possible because we can assign a numerical value for energy to each microstate. However, a thermal property like entropy, is not easily accessible to these methods. The reason is simple. We can not assign a numerical value for entropy, to a microstate. Entropy is not a property associated with any single microstate. It is a collective property of all the microstates. Toward calculating entropy and other thermal properties, a non-Boltzmann Monte Carlo technique called Umbrella sampling was proposed some forty years ago. Umbrella sampling has since undergone several metamorphoses and we have now, multi-canonical Monte Carlo, entropic sampling, flat histogram methods, Wang-Landau algorithm etc . This class of methods generates non-Boltzmann ensembles which are un-physical. However, physical quantities can be calculated as follows. First un-weight a microstates of the entropic ensemble; then re-weight it to the desired physical ensemble. Carry out weighted average over the entropic ensemble to estimate physical quantities. In this talk I shall tell you of the most recent non- Boltzmann Monte Carlo method and show how to calculate free energy for a few systems. We first consider estimation of free energy as a function of energy at different temperatures to characterize phase transition in an hairpin DNA in the presence of an unzipping force. Next we consider free energy as a function of order parameter and to this end we estimate density of states g ( E , M ), as a function of both energy E , and order parameter M . This is carried out in two stages. We estimate g ( E ) in the first stage

  20. Sparse Representation Based Binary Hypothesis Model for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Yidong Tang

    2016-01-01

    Full Text Available The sparse representation based classifier (SRC and its kernel version (KSRC have been employed for hyperspectral image (HSI classification. However, the state-of-the-art SRC often aims at extended surface objects with linear mixture in smooth scene and assumes that the number of classes is given. Considering the small target with complex background, a sparse representation based binary hypothesis (SRBBH model is established in this paper. In this model, a query pixel is represented in two ways, which are, respectively, by background dictionary and by union dictionary. The background dictionary is composed of samples selected from the local dual concentric window centered at the query pixel. Thus, for each pixel the classification issue becomes an adaptive multiclass classification problem, where only the number of desired classes is required. Furthermore, the kernel method is employed to improve the interclass separability. In kernel space, the coding vector is obtained by using kernel-based orthogonal matching pursuit (KOMP algorithm. Then the query pixel can be labeled by the characteristics of the coding vectors. Instead of directly using the reconstruction residuals, the different impacts the background dictionary and union dictionary have on reconstruction are used for validation and classification. It enhances the discrimination and hence improves the performance.

  1. A generalized polynomial chaos based ensemble Kalman filter with high accuracy

    International Nuclear Information System (INIS)

    Li Jia; Xiu Dongbin

    2009-01-01

    As one of the most adopted sequential data assimilation methods in many areas, especially those involving complex nonlinear dynamics, the ensemble Kalman filter (EnKF) has been under extensive investigation regarding its properties and efficiency. Compared to other variants of the Kalman filter (KF), EnKF is straightforward to implement, as it employs random ensembles to represent solution states. This, however, introduces sampling errors that affect the accuracy of EnKF in a negative manner. Though sampling errors can be easily reduced by using a large number of samples, in practice this is undesirable as each ensemble member is a solution of the system of state equations and can be time consuming to compute for large-scale problems. In this paper we present an efficient EnKF implementation via generalized polynomial chaos (gPC) expansion. The key ingredients of the proposed approach involve (1) solving the system of stochastic state equations via the gPC methodology to gain efficiency; and (2) sampling the gPC approximation of the stochastic solution with an arbitrarily large number of samples, at virtually no additional computational cost, to drastically reduce the sampling errors. The resulting algorithm thus achieves a high accuracy at reduced computational cost, compared to the classical implementations of EnKF. Numerical examples are provided to verify the convergence property and accuracy improvement of the new algorithm. We also prove that for linear systems with Gaussian noise, the first-order gPC Kalman filter method is equivalent to the exact Kalman filter.

  2. Rotationally invariant family of Levy-like random matrix ensembles

    International Nuclear Information System (INIS)

    Choi, Jinmyung; Muttalib, K A

    2009-01-01

    We introduce a family of rotationally invariant random matrix ensembles characterized by a parameter λ. While λ = 1 corresponds to well-known critical ensembles, we show that λ ≠ 1 describes 'Levy-like' ensembles, characterized by power-law eigenvalue densities. For λ > 1 the density is bounded, as in Gaussian ensembles, but λ < 1 describes ensembles characterized by densities with long tails. In particular, the model allows us to evaluate, in terms of a novel family of orthogonal polynomials, the eigenvalue correlations for Levy-like ensembles. These correlations differ qualitatively from those in either the Gaussian or the critical ensembles. (fast track communication)

  3. The development of a classification schema for arts-based approaches to knowledge translation.

    Science.gov (United States)

    Archibald, Mandy M; Caine, Vera; Scott, Shannon D

    2014-10-01

    Arts-based approaches to knowledge translation are emerging as powerful interprofessional strategies with potential to facilitate evidence uptake, communication, knowledge, attitude, and behavior change across healthcare provider and consumer groups. These strategies are in the early stages of development. To date, no classification system for arts-based knowledge translation exists, which limits development and understandings of effectiveness in evidence syntheses. We developed a classification schema of arts-based knowledge translation strategies based on two mechanisms by which these approaches function: (a) the degree of precision in key message delivery, and (b) the degree of end-user participation. We demonstrate how this classification is necessary to explore how context, time, and location shape arts-based knowledge translation strategies. Classifying arts-based knowledge translation strategies according to their core attributes extends understandings of the appropriateness of these approaches for various healthcare settings and provider groups. The classification schema developed may enhance understanding of how, where, and for whom arts-based knowledge translation approaches are effective, and enable theorizing of essential knowledge translation constructs, such as the influence of context, time, and location on utilization strategies. The classification schema developed may encourage systematic inquiry into the effectiveness of these approaches in diverse interprofessional contexts. © 2014 Sigma Theta Tau International.

  4. Modeling polydispersive ensembles of diamond nanoparticles

    International Nuclear Information System (INIS)

    Barnard, Amanda S

    2013-01-01

    While significant progress has been made toward production of monodispersed samples of a variety of nanoparticles, in cases such as diamond nanoparticles (nanodiamonds) a significant degree of polydispersivity persists, so scaling-up of laboratory applications to industrial levels has its challenges. In many cases, however, monodispersivity is not essential for reliable application, provided that the inevitable uncertainties are just as predictable as the functional properties. As computational methods of materials design are becoming more widespread, there is a growing need for robust methods for modeling ensembles of nanoparticles, that capture the structural complexity characteristic of real specimens. In this paper we present a simple statistical approach to modeling of ensembles of nanoparticles, and apply it to nanodiamond, based on sets of individual simulations that have been carefully selected to describe specific structural sources that are responsible for scattering of fundamental properties, and that are typically difficult to eliminate experimentally. For the purposes of demonstration we show how scattering in the Fermi energy and the electronic band gap are related to different structural variations (sources), and how these results can be combined strategically to yield statistically significant predictions of the properties of an entire ensemble of nanodiamonds, rather than merely one individual ‘model’ particle or a non-representative sub-set. (paper)

  5. Research on Remote Sensing Image Classification Based on Feature Level Fusion

    Science.gov (United States)

    Yuan, L.; Zhu, G.

    2018-04-01

    Remote sensing image classification, as an important direction of remote sensing image processing and application, has been widely studied. However, in the process of existing classification algorithms, there still exists the phenomenon of misclassification and missing points, which leads to the final classification accuracy is not high. In this paper, we selected Sentinel-1A and Landsat8 OLI images as data sources, and propose a classification method based on feature level fusion. Compare three kind of feature level fusion algorithms (i.e., Gram-Schmidt spectral sharpening, Principal Component Analysis transform and Brovey transform), and then select the best fused image for the classification experimental. In the classification process, we choose four kinds of image classification algorithms (i.e. Minimum distance, Mahalanobis distance, Support Vector Machine and ISODATA) to do contrast experiment. We use overall classification precision and Kappa coefficient as the classification accuracy evaluation criteria, and the four classification results of fused image are analysed. The experimental results show that the fusion effect of Gram-Schmidt spectral sharpening is better than other methods. In four kinds of classification algorithms, the fused image has the best applicability to Support Vector Machine classification, the overall classification precision is 94.01 % and the Kappa coefficients is 0.91. The fused image with Sentinel-1A and Landsat8 OLI is not only have more spatial information and spectral texture characteristics, but also enhances the distinguishing features of the images. The proposed method is beneficial to improve the accuracy and stability of remote sensing image classification.

  6. Diversity in random subspacing ensembles

    NARCIS (Netherlands)

    Tsymbal, A.; Pechenizkiy, M.; Cunningham, P.; Kambayashi, Y.; Mohania, M.K.; Wöß, W.

    2004-01-01

    Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are

  7. Squeezing of Collective Excitations in Spin Ensembles

    DEFF Research Database (Denmark)

    Kraglund Andersen, Christian; Mølmer, Klaus

    2012-01-01

    We analyse the possibility to create two-mode spin squeezed states of two separate spin ensembles by inverting the spins in one ensemble and allowing spin exchange between the ensembles via a near resonant cavity field. We investigate the dynamics of the system using a combination of numerical an...

  8. Urban runoff forecasting with ensemble weather predictions

    DEFF Research Database (Denmark)

    Pedersen, Jonas Wied; Courdent, Vianney Augustin Thomas; Vezzaro, Luca

    This research shows how ensemble weather forecasts can be used to generate urban runoff forecasts up to 53 hours into the future. The results highlight systematic differences between ensemble members that needs to be accounted for when these forecasts are used in practice.......This research shows how ensemble weather forecasts can be used to generate urban runoff forecasts up to 53 hours into the future. The results highlight systematic differences between ensemble members that needs to be accounted for when these forecasts are used in practice....

  9. An Ensemble Based Evolutionary Approach to the Class Imbalance Problem with Applications in CBIR

    Directory of Open Access Journals (Sweden)

    Aun Irtaza

    2018-03-01

    Full Text Available In order to lower the dependence on textual annotations for image searches, the content based image retrieval (CBIR has become a popular topic in computer vision. A wide range of CBIR applications consider classification techniques, such as artificial neural networks (ANN, support vector machines (SVM, etc. to understand the query image content to retrieve relevant output. However, in multi-class search environments, the retrieval results are far from optimal due to overlapping semantics amongst subjects of various classes. The classification through multiple classifiers generate better results, but as the number of negative examples increases due to highly correlated semantic classes, classification bias occurs towards the negative class, hence, the combination of the classifiers become even more unstable particularly in one-against-all classification scenarios. In order to resolve this issue, a genetic algorithm (GA based classifier comity learning (GCCL method is presented in this paper to generate stable classifiers by combining ANN with SVMs through asymmetric and symmetric bagging. The proposed approach resolves the classification disagreement amongst different classifiers and also resolves the class imbalance problem in CBIR. Once the stable classifiers are generated, the query image is presented to the trained model to understand the underlying semantic content of the query image for association with the precise semantic class. Afterwards, the feature similarity is computed within the obtained class to generate the semantic response of the system. The experiments reveal that the proposed method outperforms various state-of-the-art methods and significantly improves the image retrieval performance.

  10. Path planning in uncertain flow fields using ensemble method

    KAUST Repository

    Wang, Tong

    2016-08-20

    An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.

  11. An iterative stochastic ensemble method for parameter estimation of subsurface flow models

    International Nuclear Information System (INIS)

    Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim

    2013-01-01

    Parameter estimation for subsurface flow models is an essential step for maximizing the value of numerical simulations for future prediction and the development of effective control strategies. We propose the iterative stochastic ensemble method (ISEM) as a general method for parameter estimation based on stochastic estimation of gradients using an ensemble of directional derivatives. ISEM eliminates the need for adjoint coding and deals with the numerical simulator as a blackbox. The proposed method employs directional derivatives within a Gauss–Newton iteration. The update equation in ISEM resembles the update step in ensemble Kalman filter, however the inverse of the output covariance matrix in ISEM is regularized using standard truncated singular value decomposition or Tikhonov regularization. We also investigate the performance of a set of shrinkage based covariance estimators within ISEM. The proposed method is successfully applied on several nonlinear parameter estimation problems for subsurface flow models. The efficiency of the proposed algorithm is demonstrated by the small size of utilized ensembles and in terms of error convergence rates

  12. An iterative stochastic ensemble method for parameter estimation of subsurface flow models

    KAUST Repository

    Elsheikh, Ahmed H.

    2013-06-01

    Parameter estimation for subsurface flow models is an essential step for maximizing the value of numerical simulations for future prediction and the development of effective control strategies. We propose the iterative stochastic ensemble method (ISEM) as a general method for parameter estimation based on stochastic estimation of gradients using an ensemble of directional derivatives. ISEM eliminates the need for adjoint coding and deals with the numerical simulator as a blackbox. The proposed method employs directional derivatives within a Gauss-Newton iteration. The update equation in ISEM resembles the update step in ensemble Kalman filter, however the inverse of the output covariance matrix in ISEM is regularized using standard truncated singular value decomposition or Tikhonov regularization. We also investigate the performance of a set of shrinkage based covariance estimators within ISEM. The proposed method is successfully applied on several nonlinear parameter estimation problems for subsurface flow models. The efficiency of the proposed algorithm is demonstrated by the small size of utilized ensembles and in terms of error convergence rates. © 2013 Elsevier Inc.

  13. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    Science.gov (United States)

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  14. Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

    Energy Technology Data Exchange (ETDEWEB)

    Baraldi, Piero, E-mail: piero.baraldi@polimi.i [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Razavi-Far, Roozbeh [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Zio, Enrico [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Ecole Centrale Paris-Supelec, Paris (France)

    2011-04-15

    An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.

  15. Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

    International Nuclear Information System (INIS)

    Baraldi, Piero; Razavi-Far, Roozbeh; Zio, Enrico

    2011-01-01

    An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.

  16. Semantic Document Image Classification Based on Valuable Text Pattern

    Directory of Open Access Journals (Sweden)

    Hossein Pourghassem

    2011-01-01

    Full Text Available Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.

  17. An Adaptive Approach to Mitigate Background Covariance Limitations in the Ensemble Kalman Filter

    KAUST Repository

    Song, Hajoon

    2010-07-01

    A new approach is proposed to address the background covariance limitations arising from undersampled ensembles and unaccounted model errors in the ensemble Kalman filter (EnKF). The method enhances the representativeness of the EnKF ensemble by augmenting it with new members chosen adaptively to add missing information that prevents the EnKF from fully fitting the data to the ensemble. The vectors to be added are obtained by back projecting the residuals of the observation misfits from the EnKF analysis step onto the state space. The back projection is done using an optimal interpolation (OI) scheme based on an estimated covariance of the subspace missing from the ensemble. In the experiments reported here, the OI uses a preselected stationary background covariance matrix, as in the hybrid EnKF–three-dimensional variational data assimilation (3DVAR) approach, but the resulting correction is included as a new ensemble member instead of being added to all existing ensemble members. The adaptive approach is tested with the Lorenz-96 model. The hybrid EnKF–3DVAR is used as a benchmark to evaluate the performance of the adaptive approach. Assimilation experiments suggest that the new adaptive scheme significantly improves the EnKF behavior when it suffers from small size ensembles and neglected model errors. It was further found to be competitive with the hybrid EnKF–3DVAR approach, depending on ensemble size and data coverage.

  18. Ship Classification with High Resolution TerraSAR-X Imagery Based on Analytic Hierarchy Process

    Directory of Open Access Journals (Sweden)

    Zhi Zhao

    2013-01-01

    Full Text Available Ship surveillance using space-borne synthetic aperture radar (SAR, taking advantages of high resolution over wide swaths and all-weather working capability, has attracted worldwide attention. Recent activity in this field has concentrated mainly on the study of ship detection, but the classification is largely still open. In this paper, we propose a novel ship classification scheme based on analytic hierarchy process (AHP in order to achieve better performance. The main idea is to apply AHP on both feature selection and classification decision. On one hand, the AHP based feature selection constructs a selection decision problem based on several feature evaluation measures (e.g., discriminability, stability, and information measure and provides objective criteria to make comprehensive decisions for their combinations quantitatively. On the other hand, we take the selected feature sets as the input of KNN classifiers and fuse the multiple classification results based on AHP, in which the feature sets’ confidence is taken into account when the AHP based classification decision is made. We analyze the proposed classification scheme and demonstrate its results on a ship dataset that comes from TerraSAR-X SAR images.

  19. Texture-based classification of different gastric tumors at contrast-enhanced CT

    Energy Technology Data Exchange (ETDEWEB)

    Ba-Ssalamah, Ahmed, E-mail: ahmed.ba-ssalamah@meduniwien.ac.at [Department of Radiology, Medical University of Vienna (Austria); Muin, Dina; Schernthaner, Ruediger; Kulinna-Cosentini, Christiana; Bastati, Nina [Department of Radiology, Medical University of Vienna (Austria); Stift, Judith [Department of Pathology, Medical University of Vienna (Austria); Gore, Richard [Department of Radiology, University of Chicago Pritzker School of Medicine, Chicago, IL (United States); Mayerhoefer, Marius E. [Department of Radiology, Medical University of Vienna (Austria)

    2013-10-01

    Purpose: To determine the feasibility of texture analysis for the classification of gastric adenocarcinoma, lymphoma, and gastrointestinal stromal tumors on contrast-enhanced hydrodynamic-MDCT images. Materials and methods: The arterial phase scans of 47 patients with adenocarcinoma (AC) and a histologic tumor grade of [AC-G1, n = 4, G1, n = 4; AC-G2, n = 7; AC-G3, n = 16]; GIST, n = 15; and lymphoma, n = 5, and the venous phase scans of 48 patients with AC-G1, n = 3; AC-G2, n = 6; AC-G3, n = 14; GIST, n = 17; lymphoma, n = 8, were retrospectively reviewed. Based on regions of interest, texture analysis was performed, and features derived from the gray-level histogram, run-length and co-occurrence matrix, absolute gradient, autoregressive model, and wavelet transform were calculated. Fisher coefficients, probability of classification error, average correlation coefficients, and mutual information coefficients were used to create combinations of texture features that were optimized for tumor differentiation. Linear discriminant analysis in combination with a k-nearest neighbor classifier was used for tumor classification. Results: On arterial-phase scans, texture-based lesion classification was highly successful in differentiating between AC and lymphoma, and GIST and lymphoma, with misclassification rates of 3.1% and 0%, respectively. On venous-phase scans, texture-based classification was slightly less successful for AC vs. lymphoma (9.7% misclassification) and GIST vs. lymphoma (8% misclassification), but enabled the differentiation between AC and GIST (10% misclassification), and between the different grades of AC (4.4% misclassification). No texture feature combination was able to adequately distinguish between all three tumor types. Conclusion: Classification of different gastric tumors based on textural information may aid radiologists in establishing the correct diagnosis, at least in cases where the differential diagnosis can be narrowed down to two

  20. Texture-based classification of different gastric tumors at contrast-enhanced CT

    International Nuclear Information System (INIS)

    Ba-Ssalamah, Ahmed; Muin, Dina; Schernthaner, Ruediger; Kulinna-Cosentini, Christiana; Bastati, Nina; Stift, Judith; Gore, Richard; Mayerhoefer, Marius E.

    2013-01-01

    Purpose: To determine the feasibility of texture analysis for the classification of gastric adenocarcinoma, lymphoma, and gastrointestinal stromal tumors on contrast-enhanced hydrodynamic-MDCT images. Materials and methods: The arterial phase scans of 47 patients with adenocarcinoma (AC) and a histologic tumor grade of [AC-G1, n = 4, G1, n = 4; AC-G2, n = 7; AC-G3, n = 16]; GIST, n = 15; and lymphoma, n = 5, and the venous phase scans of 48 patients with AC-G1, n = 3; AC-G2, n = 6; AC-G3, n = 14; GIST, n = 17; lymphoma, n = 8, were retrospectively reviewed. Based on regions of interest, texture analysis was performed, and features derived from the gray-level histogram, run-length and co-occurrence matrix, absolute gradient, autoregressive model, and wavelet transform were calculated. Fisher coefficients, probability of classification error, average correlation coefficients, and mutual information coefficients were used to create combinations of texture features that were optimized for tumor differentiation. Linear discriminant analysis in combination with a k-nearest neighbor classifier was used for tumor classification. Results: On arterial-phase scans, texture-based lesion classification was highly successful in differentiating between AC and lymphoma, and GIST and lymphoma, with misclassification rates of 3.1% and 0%, respectively. On venous-phase scans, texture-based classification was slightly less successful for AC vs. lymphoma (9.7% misclassification) and GIST vs. lymphoma (8% misclassification), but enabled the differentiation between AC and GIST (10% misclassification), and between the different grades of AC (4.4% misclassification). No texture feature combination was able to adequately distinguish between all three tumor types. Conclusion: Classification of different gastric tumors based on textural information may aid radiologists in establishing the correct diagnosis, at least in cases where the differential diagnosis can be narrowed down to two

  1. HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source.

    Science.gov (United States)

    Wan, Shixiang; Duan, Yucong; Zou, Quan

    2017-09-01

    Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Multilevel ensemble Kalman filtering

    KAUST Repository

    Hoel, Haakon

    2016-01-08

    The ensemble Kalman filter (EnKF) is a sequential filtering method that uses an ensemble of particle paths to estimate the means and covariances required by the Kalman filter by the use of sample moments, i.e., the Monte Carlo method. EnKF is often both robust and efficient, but its performance may suffer in settings where the computational cost of accurate simulations of particles is high. The multilevel Monte Carlo method (MLMC) is an extension of classical Monte Carlo methods which by sampling stochastic realizations on a hierarchy of resolutions may reduce the computational cost of moment approximations by orders of magnitude. In this work we have combined the ideas of MLMC and EnKF to construct the multilevel ensemble Kalman filter (MLEnKF) for the setting of finite dimensional state and observation spaces. The main ideas of this method is to compute particle paths on a hierarchy of resolutions and to apply multilevel estimators on the ensemble hierarchy of particles to compute Kalman filter means and covariances. Theoretical results and a numerical study of the performance gains of MLEnKF over EnKF will be presented. Some ideas on the extension of MLEnKF to settings with infinite dimensional state spaces will also be presented.

  3. Multilevel ensemble Kalman filtering

    KAUST Repository

    Hoel, Haakon; Chernov, Alexey; Law, Kody; Nobile, Fabio; Tempone, Raul

    2016-01-01

    The ensemble Kalman filter (EnKF) is a sequential filtering method that uses an ensemble of particle paths to estimate the means and covariances required by the Kalman filter by the use of sample moments, i.e., the Monte Carlo method. EnKF is often both robust and efficient, but its performance may suffer in settings where the computational cost of accurate simulations of particles is high. The multilevel Monte Carlo method (MLMC) is an extension of classical Monte Carlo methods which by sampling stochastic realizations on a hierarchy of resolutions may reduce the computational cost of moment approximations by orders of magnitude. In this work we have combined the ideas of MLMC and EnKF to construct the multilevel ensemble Kalman filter (MLEnKF) for the setting of finite dimensional state and observation spaces. The main ideas of this method is to compute particle paths on a hierarchy of resolutions and to apply multilevel estimators on the ensemble hierarchy of particles to compute Kalman filter means and covariances. Theoretical results and a numerical study of the performance gains of MLEnKF over EnKF will be presented. Some ideas on the extension of MLEnKF to settings with infinite dimensional state spaces will also be presented.

  4. Finding Snowmageddon: Detecting and quantifying northeastern U.S. snowstorms in a multi-decadal global climate ensemble

    Science.gov (United States)

    Zarzycki, C. M.

    2017-12-01

    The northeastern coast of the United States is particularly vulnerable to impacts from extratropical cyclones during winter months, which produce heavy precipitation, high winds, and coastal flooding. These impacts are amplified by the proximity of major population centers to common storm tracks and include risks to health and welfare, massive transportation disruption, lost spending productivity, power outages, and structural damage. Historically, understanding regional snowfall in climate models has generally centered around seasonal mean climatologies even though major impacts typically occur at the scales of hours to days. To quantify discrete snowstorms at the event level, we describe a new objective detection algorithm for gridded data based on the Regional Snowfall Index (RSI) produced by NOAA's National Centers for Environmental Information. The algorithm uses 6-hourly precipitation to collocate storm-integrated snowfall with population density to produce a distribution of snowstorms with societally relevant impacts. The algorithm is tested on the Community Earth System Model (CESM) Large Ensemble Project (LENS) data. Present day distributions of snowfall events is well-replicated within the ensemble. We discuss classification sensitivities to assumptions made in determining precipitation phase and snow water equivalent. We also explore projected reductions in mid-century and end-of-century snowstorms due to changes in snowfall rates and precipitation phase, as well as highlight potential improvements in storm representation from refined horizontal resolution in model simulations.

  5. Sequential Ensembles Tolerant to Synthetic Aperture Radar (SAR Soil Moisture Retrieval Errors

    Directory of Open Access Journals (Sweden)

    Ju Hyoung Lee

    2016-04-01

    Full Text Available Due to complicated and undefined systematic errors in satellite observation, data assimilation integrating model states with satellite observations is more complicated than field measurements-based data assimilation at a local scale. In the case of Synthetic Aperture Radar (SAR soil moisture, the systematic errors arising from uncertainties in roughness conditions are significant and unavoidable, but current satellite bias correction methods do not resolve the problems very well. Thus, apart from the bias correction process of satellite observation, it is important to assess the inherent capability of satellite data assimilation in such sub-optimal but more realistic observational error conditions. To this end, time-evolving sequential ensembles of the Ensemble Kalman Filter (EnKF is compared with stationary ensemble of the Ensemble Optimal Interpolation (EnOI scheme that does not evolve the ensembles over time. As the sensitivity analysis demonstrated that the surface roughness is more sensitive to the SAR retrievals than measurement errors, it is a scope of this study to monitor how data assimilation alters the effects of roughness on SAR soil moisture retrievals. In results, two data assimilation schemes all provided intermediate values between SAR overestimation, and model underestimation. However, under the same SAR observational error conditions, the sequential ensembles approached a calibrated model showing the lowest Root Mean Square Error (RMSE, while the stationary ensemble converged towards the SAR observations exhibiting the highest RMSE. As compared to stationary ensembles, sequential ensembles have a better tolerance to SAR retrieval errors. Such inherent nature of EnKF suggests an operational merit as a satellite data assimilation system, due to the limitation of bias correction methods currently available.

  6. Classification of right-hand grasp movement based on EMOTIV Epoc+

    Science.gov (United States)

    Tobing, T. A. M. L.; Prawito, Wijaya, S. K.

    2017-07-01

    Combinations of BCT elements for right-hand grasp movement have been obtained, providing the average value of their classification accuracy. The aim of this study is to find a suitable combination for best classification accuracy of right-hand grasp movement based on EEG headset, EMOTIV Epoc+. There are three movement classifications: grasping hand, relax, and opening hand. These classifications take advantage of Event-Related Desynchronization (ERD) phenomenon that makes it possible to differ relaxation, imagery, and movement state from each other. The combinations of elements are the usage of Independent Component Analysis (ICA), spectrum analysis by Fast Fourier Transform (FFT), maximum mu and beta power with their frequency as features, and also classifier Probabilistic Neural Network (PNN) and Radial Basis Function (RBF). The average values of classification accuracy are ± 83% for training and ± 57% for testing. To have a better understanding of the signal quality recorded by EMOTIV Epoc+, the result of classification accuracy of left or right-hand grasping movement EEG signal (provided by Physionet) also be given, i.e.± 85% for training and ± 70% for testing. The comparison of accuracy value from each combination, experiment condition, and external EEG data are provided for the purpose of value analysis of classification accuracy.

  7. Statistical ensembles for money and debt

    Science.gov (United States)

    Viaggiu, Stefano; Lionetto, Andrea; Bargigli, Leonardo; Longo, Michele

    2012-10-01

    We build a statistical ensemble representation of two economic models describing respectively, in simplified terms, a payment system and a credit market. To this purpose we adopt the Boltzmann-Gibbs distribution where the role of the Hamiltonian is taken by the total money supply (i.e. including money created from debt) of a set of interacting economic agents. As a result, we can read the main thermodynamic quantities in terms of monetary ones. In particular, we define for the credit market model a work term which is related to the impact of monetary policy on credit creation. Furthermore, with our formalism we recover and extend some results concerning the temperature of an economic system, previously presented in the literature by considering only the monetary base as a conserved quantity. Finally, we study the statistical ensemble for the Pareto distribution.

  8. Towards automated human gait disease classification using phase space representation of intrinsic mode functions

    Science.gov (United States)

    Pratiher, Sawon; Patra, Sayantani; Pratiher, Souvik

    2017-06-01

    A novel analytical methodology for segregating healthy and neurological disorders from gait patterns is proposed by employing a set of oscillating components called intrinsic mode functions (IMF's). These IMF's are generated by the Empirical Mode Decomposition of the gait time series and the Hilbert transformed analytic signal representation forms the complex plane trace of the elliptical shaped analytic IMFs. The area measure and the relative change in the centroid position of the polygon formed by the Convex Hull of these analytic IMF's are taken as the discriminative features. Classification accuracy of 79.31% with Ensemble learning based Adaboost classifier validates the adequacy of the proposed methodology for a computer aided diagnostic (CAD) system for gait pattern identification. Also, the efficacy of several potential biomarkers like Bandwidth of Amplitude Modulation and Frequency Modulation IMF's and it's Mean Frequency from the Fourier-Bessel expansion from each of these analytic IMF's has been discussed for its potency in diagnosis of gait pattern identification and classification.

  9. Ensemble empirical mode decomposition based fluorescence spectral noise reduction for low concentration PAHs

    Science.gov (United States)

    Wang, Shu-tao; Yang, Xue-ying; Kong, De-ming; Wang, Yu-tian

    2017-11-01

    A new noise reduction method based on ensemble empirical mode decomposition (EEMD) is proposed to improve the detection effect for fluorescence spectra. Polycyclic aromatic hydrocarbons (PAHs) pollutants, as a kind of important current environmental pollution source, are highly oncogenic. Using the fluorescence spectroscopy method, the PAHs pollutants can be detected. However, instrument will produce noise in the experiment. Weak fluorescent signals can be affected by noise, so we propose a way to denoise and improve the detection effect. Firstly, we use fluorescence spectrometer to detect PAHs to obtain fluorescence spectra. Subsequently, noises are reduced by EEMD algorithm. Finally, the experiment results show the proposed method is feasible.

  10. One-day-ahead streamflow forecasting via super-ensembles of several neural network architectures based on the Multi-Level Diversity Model

    Science.gov (United States)

    Brochero, Darwin; Hajji, Islem; Pina, Jasson; Plana, Queralt; Sylvain, Jean-Daniel; Vergeynst, Jenna; Anctil, Francois

    2015-04-01

    Theories about generalization error with ensembles are mainly based on the diversity concept, which promotes resorting to many members of different properties to support mutually agreeable decisions. Kuncheva (2004) proposed the Multi Level Diversity Model (MLDM) to promote diversity in model ensembles, combining different data subsets, input subsets, models, parameters, and including a combiner level in order to optimize the final ensemble. This work tests the hypothesis about the minimisation of the generalization error with ensembles of Neural Network (NN) structures. We used the MLDM to evaluate two different scenarios: (i) ensembles from a same NN architecture, and (ii) a super-ensemble built by a combination of sub-ensembles of many NN architectures. The time series used correspond to the 12 basins of the MOdel Parameter Estimation eXperiment (MOPEX) project that were used by Duan et al. (2006) and Vos (2013) as benchmark. Six architectures are evaluated: FeedForward NN (FFNN) trained with the Levenberg Marquardt algorithm (Hagan et al., 1996), FFNN trained with SCE (Duan et al., 1993), Recurrent NN trained with a complex method (Weins et al., 2008), Dynamic NARX NN (Leontaritis and Billings, 1985), Echo State Network (ESN), and leak integrator neuron (L-ESN) (Lukosevicius and Jaeger, 2009). Each architecture performs separately an Input Variable Selection (IVS) according to a forward stepwise selection (Anctil et al., 2009) using mean square error as objective function. Post-processing by Predictor Stepwise Selection (PSS) of the super-ensemble has been done following the method proposed by Brochero et al. (2011). IVS results showed that the lagged stream flow, lagged precipitation, and Standardized Precipitation Index (SPI) (McKee et al., 1993) were the most relevant variables. They were respectively selected as one of the firsts three selected variables in 66, 45, and 28 of the 72 scenarios. A relationship between aridity index (Arora, 2002) and NN

  11. Bayesian ensemble refinement by replica simulations and reweighting

    Science.gov (United States)

    Hummer, Gerhard; Köfinger, Jürgen

    2015-12-01

    We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.

  12. Lessons from Climate Modeling on the Design and Use of Ensembles for Crop Modeling

    Science.gov (United States)

    Wallach, Daniel; Mearns, Linda O.; Ruane, Alexander C.; Roetter, Reimund P.; Asseng, Senthold

    2016-01-01

    Working with ensembles of crop models is a recent but important development in crop modeling which promises to lead to better uncertainty estimates for model projections and predictions, better predictions using the ensemble mean or median, and closer collaboration within the modeling community. There are numerous open questions about the best way to create and analyze such ensembles. Much can be learned from the field of climate modeling, given its much longer experience with ensembles. We draw on that experience to identify questions and make propositions that should help make ensemble modeling with crop models more rigorous and informative. The propositions include defining criteria for acceptance of models in a crop MME, exploring criteria for evaluating the degree of relatedness of models in a MME, studying the effect of number of models in the ensemble, development of a statistical model of model sampling, creation of a repository for MME results, studies of possible differential weighting of models in an ensemble, creation of single model ensembles based on sampling from the uncertainty distribution of parameter values or inputs specifically oriented toward uncertainty estimation, the creation of super ensembles that sample more than one source of uncertainty, the analysis of super ensemble results to obtain information on total uncertainty and the separate contributions of different sources of uncertainty and finally further investigation of the use of the multi-model mean or median as a predictor.

  13. Sparse Representation Based Multi-Instance Learning for Breast Ultrasound Image Classification

    Directory of Open Access Journals (Sweden)

    Lu Bing

    2017-01-01

    Full Text Available We propose a novel method based on sparse representation for breast ultrasound image classification under the framework of multi-instance learning (MIL. After image enhancement and segmentation, concentric circle is used to extract the global and local features for improving the accuracy in diagnosis and prediction. The classification problem of ultrasound image is converted to sparse representation based MIL problem. Each instance of a bag is represented as a sparse linear combination of all basis vectors in the dictionary, and then the bag is represented by one feature vector which is obtained via sparse representations of all instances within the bag. The sparse and MIL problem is further converted to a conventional learning problem that is solved by relevance vector machine (RVM. Results of single classifiers are combined to be used for classification. Experimental results on the breast cancer datasets demonstrate the superiority of the proposed method in terms of classification accuracy as compared with state-of-the-art MIL methods.

  14. Sparse Representation Based Multi-Instance Learning for Breast Ultrasound Image Classification.

    Science.gov (United States)

    Bing, Lu; Wang, Wei

    2017-01-01

    We propose a novel method based on sparse representation for breast ultrasound image classification under the framework of multi-instance learning (MIL). After image enhancement and segmentation, concentric circle is used to extract the global and local features for improving the accuracy in diagnosis and prediction. The classification problem of ultrasound image is converted to sparse representation based MIL problem. Each instance of a bag is represented as a sparse linear combination of all basis vectors in the dictionary, and then the bag is represented by one feature vector which is obtained via sparse representations of all instances within the bag. The sparse and MIL problem is further converted to a conventional learning problem that is solved by relevance vector machine (RVM). Results of single classifiers are combined to be used for classification. Experimental results on the breast cancer datasets demonstrate the superiority of the proposed method in terms of classification accuracy as compared with state-of-the-art MIL methods.

  15. Simultaneous escaping of explicit and hidden free energy barriers: application of the orthogonal space random walk strategy in generalized ensemble based conformational sampling.

    Science.gov (United States)

    Zheng, Lianqing; Chen, Mengen; Yang, Wei

    2009-06-21

    To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.

  16. A DIMENSION REDUCTION-BASED METHOD FOR CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA

    Directory of Open Access Journals (Sweden)

    B. Abbasi

    2015-12-01

    Full Text Available The existence of various natural objects such as grass, trees, and rivers along with artificial manmade features such as buildings and roads, make it difficult to classify ground objects. Consequently using single data or simple classification approach cannot improve classification results in object identification. Also, using of a variety of data from different sensors; increase the accuracy of spatial and spectral information. In this paper, we proposed a classification algorithm on joint use of hyperspectral and Lidar (Light Detection and Ranging data based on dimension reduction. First, some feature extraction techniques are applied to achieve more information from Lidar and hyperspectral data. Also Principal component analysis (PCA and Minimum Noise Fraction (MNF have been utilized to reduce the dimension of spectral features. The number of 30 features containing the most information of the hyperspectral images is considered for both PCA and MNF. In addition, Normalized Difference Vegetation Index (NDVI has been measured to highlight the vegetation. Furthermore, the extracted features from Lidar data calculated based on relation between every pixel of data and surrounding pixels in local neighbourhood windows. The extracted features are based on the Grey Level Co-occurrence Matrix (GLCM matrix. In second step, classification is operated in all features which obtained by MNF, PCA, NDVI and GLCM and trained by class samples. After this step, two classification maps are obtained by SVM classifier with MNF+NDVI+GLCM features and PCA+NDVI+GLCM features, respectively. Finally, the classified images are fused together to create final classification map by decision fusion based majority voting strategy.

  17. Ensemble Clustering using Semidefinite Programming with Applications.

    Science.gov (United States)

    Singh, Vikas; Mukherjee, Lopamudra; Peng, Jiming; Xu, Jinhui

    2010-05-01

    In this paper, we study the ensemble clustering problem, where the input is in the form of multiple clustering solutions. The goal of ensemble clustering algorithms is to aggregate the solutions into one solution that maximizes the agreement in the input ensemble. We obtain several new results for this problem. Specifically, we show that the notion of agreement under such circumstances can be better captured using a 2D string encoding rather than a voting strategy, which is common among existing approaches. Our optimization proceeds by first constructing a non-linear objective function which is then transformed into a 0-1 Semidefinite program (SDP) using novel convexification techniques. This model can be subsequently relaxed to a polynomial time solvable SDP. In addition to the theoretical contributions, our experimental results on standard machine learning and synthetic datasets show that this approach leads to improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. In addition, we identify several new application scenarios for this problem. These include combining multiple image segmentations and generating tissue maps from multiple-channel Diffusion Tensor brain images to identify the underlying structure of the brain.

  18. A non-destructive surface burn detection method for ferrous metals based on acoustic emission and ensemble empirical mode decomposition: from laser simulation to grinding process

    International Nuclear Information System (INIS)

    Yang, Zhensheng; Wu, Haixi; Yu, Zhonghua; Huang, Youfang

    2014-01-01

    Grinding is usually done in the final finishing of a component. As a result, the surface quality of finished products, e.g., surface roughness, hardness and residual stress, are affected by the grinding procedure. However, the lack of methods for monitoring of grinding makes it difficult to control the quality of the process. This paper focuses on the monitoring approaches for the surface burn phenomenon in grinding. A non-destructive burn detection method based on acoustic emission (AE) and ensemble empirical mode decomposition (EEMD) was proposed for this purpose. To precisely extract the AE features caused by phase transformation during burn formation, artificial burn was produced to mimic grinding burn by means of laser irradiation, since laser-induced burn involves less mechanical and electrical noise. The burn formation process was monitored by an AE sensor. The frequency band ranging from 150 to 400 kHz was believed to be related to surface burn formation in the laser irradiation process. The burn-sensitive frequency band was further used to instruct feature extraction during the grinding process based on EEMD. Linear classification results evidenced a distinct margin between samples with and without surface burn. This work provides a practical means for grinding burn detection. (paper)

  19. Vessel-guided airway segmentation based on voxel classification

    DEFF Research Database (Denmark)

    Lo, Pechin Chien Pau; Sporring, Jon; Ashraf, Haseem

    2008-01-01

    This paper presents a method for improving airway tree segmentation using vessel orientation information. We use the fact that an airway branch is always accompanied by an artery, with both structures having similar orientations. This work is based on a  voxel classification airway segmentation...... method proposed previously. The probability of a voxel belonging to the airway, from the voxel classification method, is augmented with an orientation similarity measure as a criterion for region growing. The orientation similarity measure of a voxel indicates how similar is the orientation...... of the surroundings of a voxel, estimated based on a tube model, is to that of a neighboring vessel. The proposed method is tested on 20 CT images from different subjects selected randomly from a lung cancer screening study. Length of the airway branches from the results of the proposed method are significantly...

  20. Yarn-dyed fabric defect classification based on convolutional neural network

    Science.gov (United States)

    Jing, Junfeng; Dong, Amei; Li, Pengfei; Zhang, Kaibing

    2017-09-01

    Considering that manual inspection of the yarn-dyed fabric can be time consuming and inefficient, we propose a yarn-dyed fabric defect classification method by using a convolutional neural network (CNN) based on a modified AlexNet. CNN shows powerful ability in performing feature extraction and fusion by simulating the learning mechanism of human brain. The local response normalization layers in AlexNet are replaced by the batch normalization layers, which can enhance both the computational efficiency and classification accuracy. In the training process of the network, the characteristics of the defect are extracted step by step and the essential features of the image can be obtained from the fusion of the edge details with several convolution operations. Then the max-pooling layers, the dropout layers, and the fully connected layers are employed in the classification model to reduce the computation cost and extract more precise features of the defective fabric. Finally, the results of the defect classification are predicted by the softmax function. The experimental results show promising performance with an acceptable average classification rate and strong robustness on yarn-dyed fabric defect classification.