WorldWideScience

Sample records for ensemble classification based

  1. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

    Science.gov (United States)

    Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

    2017-06-01

    Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.

  2. Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis

    Directory of Open Access Journals (Sweden)

    Łukasz Augustyniak

    2015-12-01

    Full Text Available We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3–5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

  3. Tweet-based Target Market Classification Using Ensemble Method

    Directory of Open Access Journals (Sweden)

    Muhammad Adi Khairul Anshary

    2016-09-01

    Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.

  4. Ensemble Classification of Data Streams Based on Attribute Reduction and a Sliding Window

    Directory of Open Access Journals (Sweden)

    Yingchun Chen

    2018-04-01

    Full Text Available With the current increasing volume and dimensionality of data, traditional data classification algorithms are unable to satisfy the demands of practical classification applications of data streams. To deal with noise and concept drift in data streams, we propose an ensemble classification algorithm based on attribute reduction and a sliding window in this paper. Using mutual information, an approximate attribute reduction algorithm based on rough sets is used to reduce data dimensionality and increase the diversity of reduced results in the algorithm. A double-threshold concept drift detection method and a three-stage sliding window control strategy are introduced to improve the performance of the algorithm when dealing with both noise and concept drift. The classification precision is further improved by updating the base classifiers and their nonlinear weights. Experiments on synthetic datasets and actual datasets demonstrate the performance of the algorithm in terms of classification precision, memory use, and time efficiency.

  5. Short text sentiment classification based on feature extension and ensemble classifier

    Science.gov (United States)

    Liu, Yang; Zhu, Xie

    2018-05-01

    With the rapid development of Internet social media, excavating the emotional tendencies of the short text information from the Internet, the acquisition of useful information has attracted the attention of researchers. At present, the commonly used can be attributed to the rule-based classification and statistical machine learning classification methods. Although micro-blog sentiment analysis has made good progress, there still exist some shortcomings such as not highly accurate enough and strong dependence from sentiment classification effect. Aiming at the characteristics of Chinese short texts, such as less information, sparse features, and diverse expressions, this paper considers expanding the original text by mining related semantic information from the reviews, forwarding and other related information. First, this paper uses Word2vec to compute word similarity to extend the feature words. And then uses an ensemble classifier composed of SVM, KNN and HMM to analyze the emotion of the short text of micro-blog. The experimental results show that the proposed method can make good use of the comment forwarding information to extend the original features. Compared with the traditional method, the accuracy, recall and F1 value obtained by this method have been improved.

  6. Constructing Support Vector Machine Ensembles for Cancer Classification Based on Proteomic Profiling

    Institute of Scientific and Technical Information of China (English)

    Yong Mao; Xiao-Bo Zhou; Dao-Ying Pi; You-Xian Sun

    2005-01-01

    In this study, we present a constructive algorithm for training cooperative support vector machine ensembles (CSVMEs). CSVME combines ensemble architecture design with cooperative training for individual SVMs in ensembles. Unlike most previous studies on training ensembles, CSVME puts emphasis on both accuracy and collaboration among individual SVMs in an ensemble. A group of SVMs selected on the basis of recursive classifier elimination is used in CSVME, and the number of the individual SVMs selected to construct CSVME is determined by 10-fold cross-validation. This kind of SVME has been tested on two ovarian cancer datasets previously obtained by proteomic mass spectrometry. By combining several individual SVMs, the proposed method achieves better performance than the SVME of all base SVMs.

  7. An Efficient Ensemble Learning Method for Gene Microarray Classification

    Directory of Open Access Journals (Sweden)

    Alireza Osareh

    2013-01-01

    Full Text Available The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  8. Modality-Driven Classification and Visualization of Ensemble Variance

    Energy Technology Data Exchange (ETDEWEB)

    Bensema, Kevin; Gosink, Luke; Obermaier, Harald; Joy, Kenneth I.

    2016-10-01

    Advances in computational power now enable domain scientists to address conceptual and parametric uncertainty by running simulations multiple times in order to sufficiently sample the uncertain input space. While this approach helps address conceptual and parametric uncertainties, the ensemble datasets produced by this technique present a special challenge to visualization researchers as the ensemble dataset records a distribution of possible values for each location in the domain. Contemporary visualization approaches that rely solely on summary statistics (e.g., mean and variance) cannot convey the detailed information encoded in ensemble distributions that are paramount to ensemble analysis; summary statistics provide no information about modality classification and modality persistence. To address this problem, we propose a novel technique that classifies high-variance locations based on the modality of the distribution of ensemble predictions. Additionally, we develop a set of confidence metrics to inform the end-user of the quality of fit between the distribution at a given location and its assigned class. We apply a similar method to time-varying ensembles to illustrate the relationship between peak variance and bimodal or multimodal behavior. These classification schemes enable a deeper understanding of the behavior of the ensemble members by distinguishing between distributions that can be described by a single tendency and distributions which reflect divergent trends in the ensemble.

  9. Ensemble Deep Learning for Biomedical Time Series Classification

    Directory of Open Access Journals (Sweden)

    Lin-peng Jin

    2016-01-01

    Full Text Available Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost.

  10. Emotion Recognition of Weblog Sentences Based on an Ensemble Algorithm of Multi-label Classification and Word Emotions

    Science.gov (United States)

    Li, Ji; Ren, Fuji

    Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.

  11. Ensemble Classification of Alzheimer's Disease and Mild Cognitive Impairment Based on Complex Graph Measures from Diffusion Tensor Images

    Science.gov (United States)

    Ebadi, Ashkan; Dalboni da Rocha, Josué L.; Nagaraju, Dushyanth B.; Tovar-Moll, Fernanda; Bramati, Ivanei; Coutinho, Gabriel; Sitaram, Ranganatha; Rashidi, Parisa

    2017-01-01

    The human brain is a complex network of interacting regions. The gray matter regions of brain are interconnected by white matter tracts, together forming one integrative complex network. In this article, we report our investigation about the potential of applying brain connectivity patterns as an aid in diagnosing Alzheimer's disease and Mild Cognitive Impairment (MCI). We performed pattern analysis of graph theoretical measures derived from Diffusion Tensor Imaging (DTI) data representing structural brain networks of 45 subjects, consisting of 15 patients of Alzheimer's disease (AD), 15 patients of MCI, and 15 healthy subjects (CT). We considered pair-wise class combinations of subjects, defining three separate classification tasks, i.e., AD-CT, AD-MCI, and CT-MCI, and used an ensemble classification module to perform the classification tasks. Our ensemble framework with feature selection shows a promising performance with classification accuracy of 83.3% for AD vs. MCI, 80% for AD vs. CT, and 70% for MCI vs. CT. Moreover, our findings suggest that AD can be related to graph measures abnormalities at Brodmann areas in the sensorimotor cortex and piriform cortex. In this way, node redundancy coefficient and load centrality in the primary motor cortex were recognized as good indicators of AD in contrast to MCI. In general, load centrality, betweenness centrality, and closeness centrality were found to be the most relevant network measures, as they were the top identified features at different nodes. The present study can be regarded as a “proof of concept” about a procedure for the classification of MRI markers between AD dementia, MCI, and normal old individuals, due to the small and not well-defined groups of AD and MCI patients. Future studies with larger samples of subjects and more sophisticated patient exclusion criteria are necessary toward the development of a more precise technique for clinical diagnosis. PMID:28293162

  12. Global Optimization Ensemble Model for Classification Methods

    Science.gov (United States)

    Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

    2014-01-01

    Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382

  13. Global Optimization Ensemble Model for Classification Methods

    Directory of Open Access Journals (Sweden)

    Hina Anwar

    2014-01-01

    Full Text Available Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity.

  14. Classification and localization of early-stage Alzheimer's disease in magnetic resonance images using a patch-based classifier ensemble

    International Nuclear Information System (INIS)

    Simoes, Rita; Slump, Cornelis H.; Cappellen van Walsum, Anne-Marie van

    2014-01-01

    Classification methods have been proposed to detect Alzheimer's disease (AD) using magnetic resonance images. Most rely on features such as the shape/volume of brain structures that need to be defined a priori. In this work, we propose a method that does not require either the segmentation of specific brain regions or the nonlinear alignment to a template. Besides classification, we also analyze which brain regions are discriminative between a group of normal controls and a group of AD patients. We perform 3D texture analysis using Local Binary Patterns computed at local image patches in the whole brain, combined in a classifier ensemble. We evaluate our method in a publicly available database including very mild-to-mild AD subjects and healthy elderly controls. For the subject cohort including only mild AD subjects, the best results are obtained using a combination of large (30 x 30 x 30 and 40 x 40 x 40 voxels) patches. A spatial analysis on the best performing patches shows that these are located in the medial-temporal lobe and in the periventricular regions. When very mild AD subjects are included in the dataset, the small (10 x 10 x 10 voxels) patches perform best, with the most discriminative ones being located near the left hippocampus. We show that our method is able not only to perform accurate classification, but also to localize discriminative brain regions, which are in accordance with the medical literature. This is achieved without the need to segment-specific brain structures and without performing nonlinear registration to a template, indicating that the method may be suitable for a clinical implementation that can help to diagnose AD at an earlier stage.

  15. Acute leukemia classification by ensemble particle swarm model selection.

    Science.gov (United States)

    Escalante, Hugo Jair; Montes-y-Gómez, Manuel; González, Jesús A; Gómez-Gil, Pilar; Altamirano, Leopoldo; Reyes, Carlos A; Reta, Carolina; Rosales, Alejandro

    2012-07-01

    Acute leukemia is a malignant disease that affects a large proportion of the world population. Different types and subtypes of acute leukemia require different treatments. In order to assign the correct treatment, a physician must identify the leukemia type or subtype. Advanced and precise methods are available for identifying leukemia types, but they are very expensive and not available in most hospitals in developing countries. Thus, alternative methods have been proposed. An option explored in this paper is based on the morphological properties of bone marrow images, where features are extracted from medical images and standard machine learning techniques are used to build leukemia type classifiers. This paper studies the use of ensemble particle swarm model selection (EPSMS), which is an automated tool for the selection of classification models, in the context of acute leukemia classification. EPSMS is the application of particle swarm optimization to the exploration of the search space of ensembles that can be formed by heterogeneous classification models in a machine learning toolbox. EPSMS does not require prior domain knowledge and it is able to select highly accurate classification models without user intervention. Furthermore, specific models can be used for different classification tasks. We report experimental results for acute leukemia classification with real data and show that EPSMS outperformed the best results obtained using manually designed classifiers with the same data. The highest performance using EPSMS was of 97.68% for two-type classification problems and of 94.21% for more than two types problems. To the best of our knowledge, these are the best results reported for this data set. Compared with previous studies, these improvements were consistent among different type/subtype classification tasks, different features extracted from images, and different feature extraction regions. The performance improvements were statistically significant

  16. Random ensemble learning for EEG classification.

    Science.gov (United States)

    Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid

    2018-01-01

    Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. A hybrid ensemble learning approach to star-galaxy classification

    Science.gov (United States)

    Kim, Edward J.; Brunner, Robert J.; Carrasco Kind, Matias

    2015-10-01

    There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learning method based on random forest, an unsupervised machine learning method based on self-organizing maps, and a hierarchical Bayesian template-fitting method. Using data from the CFHTLenS survey (Canada-France-Hawaii Telescope Lensing Survey), we consider different scenarios: when a high-quality training set is available with spectroscopic labels from DEEP2 (Deep Extragalactic Evolutionary Probe Phase 2 ), SDSS (Sloan Digital Sky Survey), VIPERS (VIMOS Public Extragalactic Redshift Survey), and VVDS (VIMOS VLT Deep Survey), and when the demographics of sources in a low-quality training set do not match the demographics of objects in the test data set. We demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method in these scenarios. Thus, strategies that combine the predictions of different classifiers may prove to be optimal in currently ongoing and forthcoming photometric surveys, such as the Dark Energy Survey and the Large Synoptic Survey Telescope.

  18. A Noise-Assisted Data Analysis Method for Automatic EOG-Based Sleep Stage Classification Using Ensemble Learning.

    Science.gov (United States)

    Olesen, Alexander Neergaard; Christensen, Julie A E; Sorensen, Helge B D; Jennum, Poul J

    2016-08-01

    Reducing the number of recording modalities for sleep staging research can benefit both researchers and patients, under the condition that they provide as accurate results as conventional systems. This paper investigates the possibility of exploiting the multisource nature of the electrooculography (EOG) signals by presenting a method for automatic sleep staging using the complete ensemble empirical mode decomposition with adaptive noise algorithm, and a random forest classifier. It achieves a high overall accuracy of 82% and a Cohen's kappa of 0.74 indicating substantial agreement between automatic and manual scoring.

  19. A Noise-Assisted Data Analysis Method for Automatic EOG-Based Sleep Stage Classification Using Ensemble Learning

    DEFF Research Database (Denmark)

    Olesen, Alexander Neergaard; Christensen, Julie Anja Engelhard; Sørensen, Helge Bjarup Dissing

    2016-01-01

    Reducing the number of recording modalities for sleep staging research can benefit both researchers and patients, under the condition that they provide as accurate results as conventional systems. This paper investigates the possibility of exploiting the multisource nature of the electrooculography...... (EOG) signals by presenting a method for automatic sleep staging using the complete ensemble empirical mode decomposition with adaptive noise algorithm, and a random forest classifier. It achieves a high overall accuracy of 82% and a Cohen’s kappa of 0.74 indicating substantial agreement between...

  20. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    Directory of Open Access Journals (Sweden)

    Saadia Zahid

    2015-01-01

    Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

  1. A New Feature Ensemble with a Multistage Classification Scheme for Breast Cancer Diagnosis

    Directory of Open Access Journals (Sweden)

    Idil Isikli Esener

    2017-01-01

    Full Text Available A new and effective feature ensemble with a multistage classification is proposed to be implemented in a computer-aided diagnosis (CAD system for breast cancer diagnosis. A publicly available mammogram image dataset collected during the Image Retrieval in Medical Applications (IRMA project is utilized to verify the suggested feature ensemble and multistage classification. In achieving the CAD system, feature extraction is performed on the mammogram region of interest (ROI images which are preprocessed by applying a histogram equalization followed by a nonlocal means filtering. The proposed feature ensemble is formed by concatenating the local configuration pattern-based, statistical, and frequency domain features. The classification process of these features is implemented in three cases: a one-stage study, a two-stage study, and a three-stage study. Eight well-known classifiers are used in all cases of this multistage classification scheme. Additionally, the results of the classifiers that provide the top three performances are combined via a majority voting technique to improve the recognition accuracy on both two- and three-stage studies. A maximum of 85.47%, 88.79%, and 93.52% classification accuracies are attained by the one-, two-, and three-stage studies, respectively. The proposed multistage classification scheme is more effective than the single-stage classification for breast cancer diagnosis.

  2. Evaluating an ensemble classification approach for crop diversityverification in Danish greening subsidy control

    DEFF Research Database (Denmark)

    Chellasamy, Menaka; Ferre, Ty; Greve, Mogens Humlekrog

    2016-01-01

    Beginning in 2015, Danish farmers are obliged to meet specific crop diversification rules based on total land area and number of crops cultivated to be eligible for new greening subsidies. Hence, there is a need for the Danish government to extend their subsidy control system to verify farmers......’ declarations to war-rant greening payments under the new crop diversification rules. Remote Sensing (RS) technology has been used since 1992 to control farmers’ subsidies in Denmark. However, a proper RS-based approach is yet to be finalised to validate new crop diversity requirements designed for assessing...... compliance under the recent subsidy scheme (2014–2020); This study uses an ensemble classification approach(proposed by the authors in previous studies) for validating the crop diversity requirements of the new rules. The approach uses a neural network ensemble classification system with bi-temporal (spring...

  3. HIPPI: highly accurate protein family classification with ensembles of HMMs

    Directory of Open Access Journals (Sweden)

    Nam-phuong Nguyen

    2016-11-01

    Full Text Available Abstract Background Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. Results We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification. HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. Conclusion HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  4. Exploring diversity in ensemble classification: Applications in large area land cover mapping

    Science.gov (United States)

    Mellor, Andrew; Boukir, Samia

    2017-07-01

    Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area

  5. The Use of Artificial-Intelligence-Based Ensembles for Intrusion Detection: A Review

    Directory of Open Access Journals (Sweden)

    Gulshan Kumar

    2012-01-01

    Full Text Available In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different ensembles by considering different combination methods, training datasets, base classifiers, and many other factors. Artificial-intelligence-(AI- based techniques play prominent role in development of ensemble for intrusion detection (ID and have many benefits over other techniques. However, there is no comprehensive review of ensembles in general and AI-based ensembles for ID to examine and understand their current research status to solve the ID problem. Here, an updated review of ensembles and their taxonomies has been presented in general. The paper also presents the updated review of various AI-based ensembles for ID (in particular during last decade. The related studies of AI-based ensembles are compared by set of evaluation metrics driven from (1 architecture & approach followed; (2 different methods utilized in different phases of ensemble learning; (3 other measures used to evaluate classification performance of the ensembles. The paper also provides the future directions of the research in this area. The paper will help the better understanding of different directions in which research of ensembles has been done in general and specifically: field of intrusion detection systems (IDSs.

  6. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    Directory of Open Access Journals (Sweden)

    Ashlock Daniel

    2009-08-01

    Full Text Available Abstract Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

  7. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

    Science.gov (United States)

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-08-22

    Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

  8. An ensemble classification approach for improved Land use/cover change detection

    Science.gov (United States)

    Chellasamy, M.; Ferré, T. P. A.; Humlekrog Greve, M.; Larsen, R.; Chinnasamy, U.

    2014-11-01

    Change Detection (CD) methods based on post-classification comparison approaches are claimed to provide potentially reliable results. They are considered to be most obvious quantitative method in the analysis of Land Use Land Cover (LULC) changes which provides from - to change information. But, the performance of post-classification comparison approaches highly depends on the accuracy of classification of individual images used for comparison. Hence, we present a classification approach that produce accurate classified results which aids to obtain improved change detection results. Machine learning is a part of broader framework in change detection, where neural networks have drawn much attention. Neural network algorithms adaptively estimate continuous functions from input data without mathematical representation of output dependence on input. A common practice for classification is to use Multi-Layer-Perceptron (MLP) neural network with backpropogation learning algorithm for prediction. To increase the ability of learning and prediction, multiple inputs (spectral, texture, topography, and multi-temporal information) are generally stacked to incorporate diversity of information. On the other hand literatures claims backpropagation algorithm to exhibit weak and unstable learning in use of multiple inputs, while dealing with complex datasets characterized by mixed uncertainty levels. To address the problem of learning complex information, we propose an ensemble classification technique that incorporates multiple inputs for classification unlike traditional stacking of multiple input data. In this paper, we present an Endorsement Theory based ensemble classification that integrates multiple information, in terms of prediction probabilities, to produce final classification results. Three different input datasets are used in this study: spectral, texture and indices, from SPOT-4 multispectral imagery captured on 1998 and 2003. Each SPOT image is classified

  9. Village Building Identification Based on Ensemble Convolutional Neural Networks

    Science.gov (United States)

    Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei

    2017-01-01

    In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154

  10. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Wong G William

    2008-06-01

    Full Text Available Abstract Background Pancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer. Results This study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost. We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset. Conclusion In our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection

  11. Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

    Science.gov (United States)

    Sørensen, Lauge; Nielsen, Mads

    2018-05-15

    The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Filter Bank Regularized Common Spatial Pattern Ensemble for Small Sample Motor Imagery Classification.

    Science.gov (United States)

    Park, Sang-Hoon; Lee, David; Lee, Sang-Goog

    2018-02-01

    For the last few years, many feature extraction methods have been proposed based on biological signals. Among these, the brain signals have the advantage that they can be obtained, even by people with peripheral nervous system damage. Motor imagery electroencephalograms (EEG) are inexpensive to measure, offer a high temporal resolution, and are intuitive. Therefore, these have received a significant amount of attention in various fields, including signal processing, cognitive science, and medicine. The common spatial pattern (CSP) algorithm is a useful method for feature extraction from motor imagery EEG. However, performance degradation occurs in a small-sample setting (SSS), because the CSP depends on sample-based covariance. Since the active frequency range is different for each subject, it is also inconvenient to set the frequency range to be different every time. In this paper, we propose the feature extraction method based on a filter bank to solve these problems. The proposed method consists of five steps. First, motor imagery EEG is divided by a using filter bank. Second, the regularized CSP (R-CSP) is applied to the divided EEG. Third, we select the features according to mutual information based on the individual feature algorithm. Fourth, parameter sets are selected for the ensemble. Finally, we classify using ensemble based on features. The brain-computer interface competition III data set IVa is used to evaluate the performance of the proposed method. The proposed method improves the mean classification accuracy by 12.34%, 11.57%, 9%, 4.95%, and 4.47% compared with CSP, SR-CSP, R-CSP, filter bank CSP (FBCSP), and SR-FBCSP. Compared with the filter bank R-CSP ( , ), which is a parameter selection version of the proposed method, the classification accuracy is improved by 3.49%. In particular, the proposed method shows a large improvement in performance in the SSS.

  13. MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

    Science.gov (United States)

    Chen, Lei; Kamel, Mohamed S.

    2016-01-01

    In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

  14. Ensemble of classifiers based network intrusion detection system performance bound

    CSIR Research Space (South Africa)

    Mkuzangwe, Nenekazi NP

    2017-11-01

    Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...

  15. Ensemble based system for whole-slide prostate cancer probability mapping using color texture features.

    LENUS (Irish Health Repository)

    DiFranco, Matthew D

    2011-01-01

    We present a tile-based approach for producing clinically relevant probability maps of prostatic carcinoma in histological sections from radical prostatectomy. Our methodology incorporates ensemble learning for feature selection and classification on expert-annotated images. Random forest feature selection performed over varying training sets provides a subset of generalized CIEL*a*b* co-occurrence texture features, while sample selection strategies with minimal constraints reduce training data requirements to achieve reliable results. Ensembles of classifiers are built using expert-annotated tiles from training images, and scores for the probability of cancer presence are calculated from the responses of each classifier in the ensemble. Spatial filtering of tile-based texture features prior to classification results in increased heat-map coherence as well as AUC values of 95% using ensembles of either random forests or support vector machines. Our approach is designed for adaptation to different imaging modalities, image features, and histological decision domains.

  16. Ensemble candidate classification for the LOTAAS pulsar survey

    Science.gov (United States)

    Tan, C. M.; Lyon, R. J.; Stappers, B. W.; Cooper, S.; Hessels, J. W. T.; Kondratiev, V. I.; Michilli, D.; Sanidas, S.

    2018-03-01

    One of the biggest challenges arising from modern large-scale pulsar surveys is the number of candidates generated. Here, we implemented several improvements to the machine learning (ML) classifier previously used by the LOFAR Tied-Array All-Sky Survey (LOTAAS) to look for new pulsars via filtering the candidates obtained during periodicity searches. To assist the ML algorithm, we have introduced new features which capture the frequency and time evolution of the signal and improved the signal-to-noise calculation accounting for broad profiles. We enhanced the ML classifier by including a third class characterizing RFI instances, allowing candidates arising from RFI to be isolated, reducing the false positive return rate. We also introduced a new training data set used by the ML algorithm that includes a large sample of pulsars misclassified by the previous classifier. Lastly, we developed an ensemble classifier comprised of five different Decision Trees. Taken together these updates improve the pulsar recall rate by 2.5 per cent, while also improving the ability to identify pulsars with wide pulse profiles, often misclassified by the previous classifier. The new ensemble classifier is also able to reduce the percentage of false positive candidates identified from each LOTAAS pointing from 2.5 per cent (˜500 candidates) to 1.1 per cent (˜220 candidates).

  17. Structure D'Ensemble, Multiple Classification, Multiple Seriation and Amount of Irrelevant Information

    Science.gov (United States)

    Hamel, B. Remmo; Van Der Veer, M. A. A.

    1972-01-01

    A significant positive correlation between multiple classification was found, in testing 65 children aged 6 to 8 years, at the stage of concrete operations. This is interpreted as support for the existence of a structure d'ensemble of operational schemes in the period of concrete operations. (Authors)

  18. Ensemble-based Kalman Filters in Strongly Nonlinear Dynamics

    Institute of Scientific and Technical Information of China (English)

    Zhaoxia PU; Joshua HACKER

    2009-01-01

    This study examines the effectiveness of ensemble Kalman filters in data assimilation with the strongly nonlinear dynamics of the Lorenz-63 model, and in particular their use in predicting the regime transition that occurs when the model jumps from one basin of attraction to the other. Four configurations of the ensemble-based Kalman filtering data assimilation techniques, including the ensemble Kalman filter, ensemble adjustment Kalman filter, ensemble square root filter and ensemble transform Kalman filter, are evaluated with their ability in predicting the regime transition (also called phase transition) and also are compared in terms of their sensitivity to both observational and sampling errors. The sensitivity of each ensemble-based filter to the size of the ensemble is also examined.

  19. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    Science.gov (United States)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  20. A class of energy-based ensembles in Tsallis statistics

    International Nuclear Information System (INIS)

    Chandrashekar, R; Naina Mohammed, S S

    2011-01-01

    A comprehensive investigation is carried out on the class of energy-based ensembles. The eight ensembles are divided into two main classes. In the isothermal class of ensembles the individual members are at the same temperature. A unified framework is evolved to describe the four isothermal ensembles using the currently accepted third constraint formalism. The isothermal–isobaric, grand canonical and generalized ensembles are illustrated through a study of the classical nonrelativistic and extreme relativistic ideal gas models. An exact calculation is possible only in the case of the isothermal–isobaric ensemble. The study of the ideal gas models in the grand canonical and the generalized ensembles has been carried out using a perturbative procedure with the nonextensivity parameter (1 − q) as the expansion parameter. Though all the thermodynamic quantities have been computed up to a particular order in (1 − q) the procedure can be extended up to any arbitrary order in the expansion parameter. In the adiabatic class of ensembles the individual members of the ensemble have the same value of the heat function and a unified formulation to described all four ensembles is given. The nonrelativistic and the extreme relativistic ideal gases are studied in the isoenthalpic–isobaric ensemble, the adiabatic ensemble with number fluctuations and the adiabatic ensemble with number and particle fluctuations

  1. A target recognition method for maritime surveillance radars based on hybrid ensemble selection

    Science.gov (United States)

    Fan, Xueman; Hu, Shengliang; He, Jingbo

    2017-11-01

    In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.

  2. Polarimetric SAR Image Classification Using Multiple-feature Fusion and Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Sun Xun

    2016-12-01

    Full Text Available In this paper, we propose a supervised classification algorithm for Polarimetric Synthetic Aperture Radar (PolSAR images using multiple-feature fusion and ensemble learning. First, we extract different polarimetric features, including extended polarimetric feature space, Hoekman, Huynen, H/alpha/A, and fourcomponent scattering features of PolSAR images. Next, we randomly select two types of features each time from all feature sets to guarantee the reliability and diversity of later ensembles and use a support vector machine as the basic classifier for predicting classification results. Finally, we concatenate all prediction probabilities of basic classifiers as the final feature representation and employ the random forest method to obtain final classification results. Experimental results at the pixel and region levels show the effectiveness of the proposed algorithm.

  3. Cluster Ensemble-Based Image Segmentation

    Directory of Open Access Journals (Sweden)

    Xiaoru Wang

    2013-07-01

    Full Text Available Image segmentation is the foundation of computer vision applications. In this paper, we propose a new cluster ensemble-based image segmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories of images. Second, we exploit the PageRank idea from Internet applications and apply it to the image segmentation task. This can improve the final segmentation results by combining the spatial information of the image and the semantic similarity of regions. Our experiments on four public image databases validate the superiority of our algorithm over conventional single type of feature or multiple types of features-based algorithms, since our algorithm can fuse multiple types of features effectively for better segmentation results. Moreover, our method is also proved to be very competitive in comparison with other state-of-the-art segmentation algorithms.

  4. Online cross-validation-based ensemble learning.

    Science.gov (United States)

    Benkeser, David; Ju, Cheng; Lendle, Sam; van der Laan, Mark

    2018-01-30

    Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Sequential ensemble-based optimal design for parameter estimation: SEQUENTIAL ENSEMBLE-BASED OPTIMAL DESIGN

    Energy Technology Data Exchange (ETDEWEB)

    Man, Jun [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Zhang, Jiangjiang [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Li, Weixuan [Pacific Northwest National Laboratory, Richland Washington USA; Zeng, Lingzao [Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou China; Wu, Laosheng [Department of Environmental Sciences, University of California, Riverside California USA

    2016-10-01

    The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees of freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.

  6. Pitch Based Sound Classification

    DEFF Research Database (Denmark)

    Nielsen, Andreas Brinch; Hansen, Lars Kai; Kjems, U

    2006-01-01

    A sound classification model is presented that can classify signals into music, noise and speech. The model extracts the pitch of the signal using the harmonic product spectrum. Based on the pitch estimate and a pitch error measure, features are created and used in a probabilistic model with soft......-max output function. Both linear and quadratic inputs are used. The model is trained on 2 hours of sound and tested on publicly available data. A test classification error below 0.05 with 1 s classification windows is achieved. Further more it is shown that linear input performs as well as a quadratic......, and that even though classification gets marginally better, not much is achieved by increasing the window size beyond 1 s....

  7. An Improved Ensemble of Random Vector Functional Link Networks Based on Particle Swarm Optimization with Double Optimization Strategy.

    Science.gov (United States)

    Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang

    2016-01-01

    For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.

  8. Ensemble-Based Data Assimilation in Reservoir Characterization: A Review

    Directory of Open Access Journals (Sweden)

    Seungpil Jung

    2018-02-01

    Full Text Available This paper presents a review of ensemble-based data assimilation for strongly nonlinear problems on the characterization of heterogeneous reservoirs with different production histories. It concentrates on ensemble Kalman filter (EnKF and ensemble smoother (ES as representative frameworks, discusses their pros and cons, and investigates recent progress to overcome their drawbacks. The typical weaknesses of ensemble-based methods are non-Gaussian parameters, improper prior ensembles and finite population size. Three categorized approaches, to mitigate these limitations, are reviewed with recent accomplishments; improvement of Kalman gains, add-on of transformation functions, and independent evaluation of observed data. The data assimilation in heterogeneous reservoirs, applying the improved ensemble methods, is discussed on predicting unknown dynamic data in reservoir characterization.

  9. A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

    Directory of Open Access Journals (Sweden)

    Yongjun Piao

    2015-01-01

    Full Text Available Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.

  10. Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

    Directory of Open Access Journals (Sweden)

    Guofeng Wang

    2014-11-01

    Full Text Available Tool condition monitoring (TCM plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM, hidden Markov model (HMM and radius basis function (RBF are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.

  11. Granular loess classification based

    International Nuclear Information System (INIS)

    Browzin, B.S.

    1985-01-01

    This paper discusses how loess might be identified by two index properties: the granulometric composition and the dry unit weight. These two indices are necessary but not always sufficient for identification of loess. On the basis of analyses of samples from three continents, it was concluded that the 0.01-0.5-mm fraction deserves the name loessial fraction. Based on the loessial fraction concept, a granulometric classification of loess is proposed. A triangular chart is used to classify loess

  12. Using support vector machine ensembles for target audience classification on Twitter.

    Science.gov (United States)

    Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

    2015-01-01

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.

  13. Using support vector machine ensembles for target audience classification on Twitter.

    Directory of Open Access Journals (Sweden)

    Siaw Ling Lo

    Full Text Available The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA. A Support Vector Machine (SVM ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.

  14. Improved Classification by Non Iterative and Ensemble Classifiers in Motor Fault Diagnosis

    Directory of Open Access Journals (Sweden)

    PANIGRAHY, P. S.

    2018-02-01

    Full Text Available Data driven approach for multi-class fault diagnosis of induction motor using MCSA at steady state condition is a complex pattern classification problem. This investigation has exploited the built-in ensemble process of non-iterative classifiers to resolve the most challenging issues in this area, including bearing and stator fault detection. Non-iterative techniques exhibit with an average 15% of increased fault classification accuracy against their iterative counterparts. Particularly RF has shown outstanding performance even at less number of training samples and noisy feature space because of its distributive feature model. The robustness of the results, backed by the experimental verification shows that the non-iterative individual classifiers like RF is the optimum choice in the area of automatic fault diagnosis of induction motor.

  15. An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.

    Science.gov (United States)

    Tuarob, Suppawong; Tucker, Conrad S; Salathe, Marcel; Ram, Nilam

    2014-06-01

    The role of social media as a source of timely and massive information has become more apparent since the era of Web 2.0.Multiple studies illustrated the use of information in social media to discover biomedical and health-related knowledge.Most methods proposed in the literature employ traditional document classification techniques that represent a document as a bag of words.These techniques work well when documents are rich in text and conform to standard English; however, they are not optimal for social media data where sparsity and noise are norms.This paper aims to address the limitations posed by the traditional bag-of-word based methods and propose to use heterogeneous features in combination with ensemble machine learning techniques to discover health-related information, which could prove to be useful to multiple biomedical applications, especially those needing to discover health-related knowledge in large scale social media data.Furthermore, the proposed methodology could be generalized to discover different types of information in various kinds of textual data. Social media data is characterized by an abundance of short social-oriented messages that do not conform to standard languages, both grammatically and syntactically.The problem of discovering health-related knowledge in social media data streams is then transformed into a text classification problem, where a text is identified as positive if it is health-related and negative otherwise.We first identify the limitations of the traditional methods which train machines with N-gram word features, then propose to overcome such limitations by utilizing the collaboration of machine learning based classifiers, each of which is trained to learn a semantically different aspect of the data.The parameter analysis for tuning each classifier is also reported. Three data sets are used in this research.The first data set comprises of approximately 5000 hand-labeled tweets, and is used for cross validation of the

  16. A deep learning-based multi-model ensemble method for cancer prediction.

    Science.gov (United States)

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Flood Forecasting Based on TIGGE Precipitation Ensemble Forecast

    Directory of Open Access Journals (Sweden)

    Jinyin Ye

    2016-01-01

    Full Text Available TIGGE (THORPEX International Grand Global Ensemble was a major part of the THORPEX (Observing System Research and Predictability Experiment. It integrates ensemble precipitation products from all the major forecast centers in the world and provides systematic evaluation on the multimodel ensemble prediction system. Development of meteorologic-hydrologic coupled flood forecasting model and early warning model based on the TIGGE precipitation ensemble forecast can provide flood probability forecast, extend the lead time of the flood forecast, and gain more time for decision-makers to make the right decision. In this study, precipitation ensemble forecast products from ECMWF, NCEP, and CMA are used to drive distributed hydrologic model TOPX. We focus on Yi River catchment and aim to build a flood forecast and early warning system. The results show that the meteorologic-hydrologic coupled model can satisfactorily predict the flow-process of four flood events. The predicted occurrence time of peak discharges is close to the observations. However, the magnitude of the peak discharges is significantly different due to various performances of the ensemble prediction systems. The coupled forecasting model can accurately predict occurrence of the peak time and the corresponding risk probability of peak discharge based on the probability distribution of peak time and flood warning, which can provide users a strong theoretical foundation and valuable information as a promising new approach.

  18. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data

    Directory of Open Access Journals (Sweden)

    Connie Ko

    2016-08-01

    Full Text Available Recent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI is proposed that characterizes the diversity of data quality (Qi among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple and the validation samples contains four labels (pine; poplar; maple; and “unknown”. Classification accuracy improved from 72.8%; when training samples were

  19. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    Science.gov (United States)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  20. Discrimination between authentic and adulterated liquors by near-infrared spectroscopy and ensemble classification

    Science.gov (United States)

    Chen, Hui; Tan, Chao; Wu, Tong; Wang, Li; Zhu, Wanping

    2014-09-01

    Chinese liquor is one of the famous distilled spirits and counterfeit liquor is becoming a serious problem in the market. Especially, age liquor is facing the crisis of confidence because it is difficult for consumer to identify the marked age, which prompts unscrupulous traders to pose off low-grade liquors as high-grade liquors. An ideal method for authenticity confirmation of liquors should be non-invasive, non-destructive and timely. The combination of near-infrared spectroscopy with chemometrics proves to be a good way to reach these premises. A new strategy is proposed for classification and verification of the adulteration of liquors by using NIR spectroscopy and chemometric classification, i.e., ensemble support vector machines (SVM). Three measures, i.e., accuracy, sensitivity and specificity were used for performance evaluation. The results confirmed that the strategy can serve as a screening tool applied to verify adulteration of the liquor, that is, a prior step used to condition the sample to a deeper analysis only when a positive result for adulteration is obtained by the proposed methodology.

  1. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers

    Directory of Open Access Journals (Sweden)

    Zhibin Xiao

    2017-02-01

    Full Text Available Recognition of transportation modes can be used in different applications including human behavior research, transport management and traffic control. Previous work on transportation mode recognition has often relied on using multiple sensors or matching Geographic Information System (GIS information, which is not possible in many cases. In this paper, an approach based on ensemble learning is proposed to infer hybrid transportation modes using only Global Position System (GPS data. First, in order to distinguish between different transportation modes, we used a statistical method to generate global features and extract several local features from sub-trajectories after trajectory segmentation, before these features were combined in the classification stage. Second, to obtain a better performance, we used tree-based ensemble models (Random Forest, Gradient Boosting Decision Tree, and XGBoost instead of traditional methods (K-Nearest Neighbor, Decision Tree, and Support Vector Machines to classify the different transportation modes. The experiment results on the later have shown the efficacy of our proposed approach. Among them, the XGBoost model produced the best performance with a classification accuracy of 90.77% obtained on the GEOLIFE dataset, and we used a tree-based ensemble method to ensure accurate feature selection to reduce the model complexity.

  2. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...... the accuracy at the same time. The test example is classified using simpler and smaller model. The training examples in a particular cluster share the common vocabulary. At the time of clustering, we do not take into account the labels of the training examples. After the clusters have been created......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...

  3. Efficient Kernel-Based Ensemble Gaussian Mixture Filtering

    KAUST Repository

    Liu, Bo

    2015-11-11

    We consider the Bayesian filtering problem for data assimilation following the kernel-based ensemble Gaussian-mixture filtering (EnGMF) approach introduced by Anderson and Anderson (1999). In this approach, the posterior distribution of the system state is propagated with the model using the ensemble Monte Carlo method, providing a forecast ensemble that is then used to construct a prior Gaussian-mixture (GM) based on the kernel density estimator. This results in two update steps: a Kalman filter (KF)-like update of the ensemble members and a particle filter (PF)-like update of the weights, followed by a resampling step to start a new forecast cycle. After formulating EnGMF for any observational operator, we analyze the influence of the bandwidth parameter of the kernel function on the covariance of the posterior distribution. We then focus on two aspects: i) the efficient implementation of EnGMF with (relatively) small ensembles, where we propose a new deterministic resampling strategy preserving the first two moments of the posterior GM to limit the sampling error; and ii) the analysis of the effect of the bandwidth parameter on contributions of KF and PF updates and on the weights variance. Numerical results using the Lorenz-96 model are presented to assess the behavior of EnGMF with deterministic resampling, study its sensitivity to different parameters and settings, and evaluate its performance against ensemble KFs. The proposed EnGMF approach with deterministic resampling suggests improved estimates in all tested scenarios, and is shown to require less localization and to be less sensitive to the choice of filtering parameters.

  4. A comparative study of breast cancer diagnosis based on neural network ensemble via improved training algorithms.

    Science.gov (United States)

    Azami, Hamed; Escudero, Javier

    2015-08-01

    Breast cancer is one of the most common types of cancer in women all over the world. Early diagnosis of this kind of cancer can significantly increase the chances of long-term survival. Since diagnosis of breast cancer is a complex problem, neural network (NN) approaches have been used as a promising solution. Considering the low speed of the back-propagation (BP) algorithm to train a feed-forward NN, we consider a number of improved NN trainings for the Wisconsin breast cancer dataset: BP with momentum, BP with adaptive learning rate, BP with adaptive learning rate and momentum, Polak-Ribikre conjugate gradient algorithm (CGA), Fletcher-Reeves CGA, Powell-Beale CGA, scaled CGA, resilient BP (RBP), one-step secant and quasi-Newton methods. An NN ensemble, which is a learning paradigm to combine a number of NN outputs, is used to improve the accuracy of the classification task. Results demonstrate that NN ensemble-based classification methods have better performance than NN-based algorithms. The highest overall average accuracy is 97.68% obtained by NN ensemble trained by RBP for 50%-50% training-test evaluation method.

  5. A New Classification Approach Based on Multiple Classification Rules

    OpenAIRE

    Zhongmei Zhou

    2014-01-01

    A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when t...

  6. Visualization and classification of physiological failure modes in ensemble hemorrhage simulation

    Science.gov (United States)

    Zhang, Song; Pruett, William Andrew; Hester, Robert

    2015-01-01

    In an emergency situation such as hemorrhage, doctors need to predict which patients need immediate treatment and care. This task is difficult because of the diverse response to hemorrhage in human population. Ensemble physiological simulations provide a means to sample a diverse range of subjects and may have a better chance of containing the correct solution. However, to reveal the patterns and trends from the ensemble simulation is a challenging task. We have developed a visualization framework for ensemble physiological simulations. The visualization helps users identify trends among ensemble members, classify ensemble member into subpopulations for analysis, and provide prediction to future events by matching a new patient's data to existing ensembles. We demonstrated the effectiveness of the visualization on simulated physiological data. The lessons learned here can be applied to clinically-collected physiological data in the future.

  7. Improving ECG classification accuracy using an ensemble of neural network modules.

    Directory of Open Access Journals (Sweden)

    Mehrdad Javadi

    Full Text Available This paper illustrates the use of a combined neural network model based on Stacked Generalization method for classification of electrocardiogram (ECG beats. In conventional Stacked Generalization method, the combiner learns to map the base classifiers' outputs to the target data. We claim adding the input pattern to the base classifiers' outputs helps the combiner to obtain knowledge about the input space and as the result, performs better on the same task. Experimental results support our claim that the additional knowledge according to the input space, improves the performance of the proposed method which is called Modified Stacked Generalization. In particular, for classification of 14966 ECG beats that were not previously seen during training phase, the Modified Stacked Generalization method reduced the error rate for 12.41% in comparison with the best of ten popular classifier fusion methods including Max, Min, Average, Product, Majority Voting, Borda Count, Decision Templates, Weighted Averaging based on Particle Swarm Optimization and Stacked Generalization.

  8. Human resource recommendation algorithm based on ensemble learning and Spark

    Science.gov (United States)

    Cong, Zihan; Zhang, Xingming; Wang, Haoxiang; Xu, Hongjie

    2017-08-01

    Aiming at the problem of “information overload” in the human resources industry, this paper proposes a human resource recommendation algorithm based on Ensemble Learning. The algorithm considers the characteristics and behaviours of both job seeker and job features in the real business circumstance. Firstly, the algorithm uses two ensemble learning methods-Bagging and Boosting. The outputs from both learning methods are then merged to form user interest model. Based on user interest model, job recommendation can be extracted for users. The algorithm is implemented as a parallelized recommendation system on Spark. A set of experiments have been done and analysed. The proposed algorithm achieves significant improvement in accuracy, recall rate and coverage, compared with recommendation algorithms such as UserCF and ItemCF.

  9. Pattern Recognition of Momentary Mental Workload Based on Multi-Channel Electrophysiological Data and Ensemble Convolutional Neural Networks.

    Science.gov (United States)

    Zhang, Jianhua; Li, Sunan; Wang, Rubin

    2017-01-01

    In this paper, we deal with the Mental Workload (MWL) classification problem based on the measured physiological data. First we discussed the optimal depth (i.e., the number of hidden layers) and parameter optimization algorithms for the Convolutional Neural Networks (CNN). The base CNNs designed were tested according to five classification performance indices, namely Accuracy, Precision, F-measure, G-mean, and required training time. Then we developed an Ensemble Convolutional Neural Network (ECNN) to enhance the accuracy and robustness of the individual CNN model. For the ECNN design, three model aggregation approaches (weighted averaging, majority voting and stacking) were examined and a resampling strategy was used to enhance the diversity of individual CNN models. The results of MWL classification performance comparison indicated that the proposed ECNN framework can effectively improve MWL classification performance and is featured by entirely automatic feature extraction and MWL classification, when compared with traditional machine learning methods.

  10. Harmony Search Based Parameter Ensemble Adaptation for Differential Evolution

    Directory of Open Access Journals (Sweden)

    Rammohan Mallipeddi

    2013-01-01

    Full Text Available In differential evolution (DE algorithm, depending on the characteristics of the problem at hand and the available computational resources, different strategies combined with a different set of parameters may be effective. In addition, a single, well-tuned combination of strategies and parameters may not guarantee optimal performance because different strategies combined with different parameter settings can be appropriate during different stages of the evolution. Therefore, various adaptive/self-adaptive techniques have been proposed to adapt the DE strategies and parameters during the course of evolution. In this paper, we propose a new parameter adaptation technique for DE based on ensemble approach and harmony search algorithm (HS. In the proposed method, an ensemble of parameters is randomly sampled which form the initial harmony memory. The parameter ensemble evolves during the course of the optimization process by HS algorithm. Each parameter combination in the harmony memory is evaluated by testing them on the DE population. The performance of the proposed adaptation method is evaluated using two recently proposed strategies (DE/current-to-pbest/bin and DE/current-to-gr_best/bin as basic DE frameworks. Numerical results demonstrate the effectiveness of the proposed adaptation technique compared to the state-of-the-art DE based algorithms on a set of challenging test problems (CEC 2005.

  11. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

    Science.gov (United States)

    Stanescu, Ana; Caragea, Doina

    2015-01-01

    Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.

  12. IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.

    Science.gov (United States)

    Ali, Safdar; Majid, Abdul; Khan, Asifullah

    2014-04-01

    Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our

  13. Neural network ensemble based CAD system for focal liver lesions from B-mode ultrasound.

    Science.gov (United States)

    Virmani, Jitendra; Kumar, Vinod; Kalra, Naveen; Khandelwal, Niranjan

    2014-08-01

    A neural network ensemble (NNE) based computer-aided diagnostic (CAD) system to assist radiologists in differential diagnosis between focal liver lesions (FLLs), including (1) typical and atypical cases of Cyst, hemangioma (HEM) and metastatic carcinoma (MET) lesions, (2) small and large hepatocellular carcinoma (HCC) lesions, along with (3) normal (NOR) liver tissue is proposed in the present work. Expert radiologists, visualize the textural characteristics of regions inside and outside the lesions to differentiate between different FLLs, accordingly texture features computed from inside lesion regions of interest (IROIs) and texture ratio features computed from IROIs and surrounding lesion regions of interests (SROIs) are taken as input. Principal component analysis (PCA) is used for reducing the dimensionality of the feature space before classifier design. The first step of classification module consists of a five class PCA-NN based primary classifier which yields probability outputs for five liver image classes. The second step of classification module consists of ten binary PCA-NN based secondary classifiers for NOR/Cyst, NOR/HEM, NOR/HCC, NOR/MET, Cyst/HEM, Cyst/HCC, Cyst/MET, HEM/HCC, HEM/MET and HCC/MET classes. The probability outputs of five class PCA-NN based primary classifier is used to determine the first two most probable classes for a test instance, based on which it is directed to the corresponding binary PCA-NN based secondary classifier for crisp classification between two classes. By including the second step of the classification module, classification accuracy increases from 88.7 % to 95 %. The promising results obtained by the proposed system indicate its usefulness to assist radiologists in differential diagnosis of FLLs.

  14. Ensemble-based forecasting at Horns Rev: Ensemble conversion and kernel dressing

    DEFF Research Database (Denmark)

    Pinson, Pierre; Madsen, Henrik

    . The obtained ensemble forecasts of wind power are then converted into predictive distributions with an original adaptive kernel dressing method. The shape of the kernels is driven by a mean-variance model, the parameters of which are recursively estimated in order to maximize the overall skill of obtained...

  15. Utilising Tree-Based Ensemble Learning for Speaker Segmentation

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Tan, Zheng-Hua; Christensen, Mads Græsbøll

    2014-01-01

    In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used...... for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability. In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each...... tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant...

  16. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  17. Uncertainty visualization in HARDI based on ensembles of ODFs

    KAUST Repository

    Jiao, Fangxiang

    2012-02-01

    In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes. © 2012 IEEE.

  18. Uncertainty visualization in HARDI based on ensembles of ODFs

    KAUST Repository

    Jiao, Fangxiang; Phillips, Jeff M.; Gur, Yaniv; Johnson, Chris R.

    2012-01-01

    In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes. © 2012 IEEE.

  19. Ensemble-based Probabilistic Forecasting at Horns Rev

    DEFF Research Database (Denmark)

    Pinson, Pierre; Madsen, Henrik

    2009-01-01

    forecasting methodology. In a first stage, ensemble forecasts of meteorological variables are converted to power through a suitable power curve model. This modelemploys local polynomial regression, and is adoptively estimated with an orthogonal fitting method. The obtained ensemble forecasts of wind power...

  20. Ensemble-based prediction of RNA secondary structures.

    Science.gov (United States)

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between

  1. An Adjoint-Based Adaptive Ensemble Kalman Filter

    KAUST Repository

    Song, Hajoon

    2013-10-01

    A new hybrid ensemble Kalman filter/four-dimensional variational data assimilation (EnKF/4D-VAR) approach is introduced to mitigate background covariance limitations in the EnKF. The work is based on the adaptive EnKF (AEnKF) method, which bears a strong resemblance to the hybrid EnKF/three-dimensional variational data assimilation (3D-VAR) method. In the AEnKF, the representativeness of the EnKF ensemble is regularly enhanced with new members generated after back projection of the EnKF analysis residuals to state space using a 3D-VAR [or optimal interpolation (OI)] scheme with a preselected background covariance matrix. The idea here is to reformulate the transformation of the residuals as a 4D-VAR problem, constraining the new member with model dynamics and the previous observations. This should provide more information for the estimation of the new member and reduce dependence of the AEnKF on the assumed stationary background covariance matrix. This is done by integrating the analysis residuals backward in time with the adjoint model. Numerical experiments are performed with the Lorenz-96 model under different scenarios to test the new approach and to evaluate its performance with respect to the EnKF and the hybrid EnKF/3D-VAR. The new method leads to the least root-mean-square estimation errors as long as the linear assumption guaranteeing the stability of the adjoint model holds. It is also found to be less sensitive to choices of the assimilation system inputs and parameters.

  2. An Adjoint-Based Adaptive Ensemble Kalman Filter

    KAUST Repository

    Song, Hajoon; Hoteit, Ibrahim; Cornuelle, Bruce D.; Luo, Xiaodong; Subramanian, Aneesh C.

    2013-01-01

    A new hybrid ensemble Kalman filter/four-dimensional variational data assimilation (EnKF/4D-VAR) approach is introduced to mitigate background covariance limitations in the EnKF. The work is based on the adaptive EnKF (AEnKF) method, which bears a strong resemblance to the hybrid EnKF/three-dimensional variational data assimilation (3D-VAR) method. In the AEnKF, the representativeness of the EnKF ensemble is regularly enhanced with new members generated after back projection of the EnKF analysis residuals to state space using a 3D-VAR [or optimal interpolation (OI)] scheme with a preselected background covariance matrix. The idea here is to reformulate the transformation of the residuals as a 4D-VAR problem, constraining the new member with model dynamics and the previous observations. This should provide more information for the estimation of the new member and reduce dependence of the AEnKF on the assumed stationary background covariance matrix. This is done by integrating the analysis residuals backward in time with the adjoint model. Numerical experiments are performed with the Lorenz-96 model under different scenarios to test the new approach and to evaluate its performance with respect to the EnKF and the hybrid EnKF/3D-VAR. The new method leads to the least root-mean-square estimation errors as long as the linear assumption guaranteeing the stability of the adjoint model holds. It is also found to be less sensitive to choices of the assimilation system inputs and parameters.

  3. Ensemble Methods

    Science.gov (United States)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    proposed to explain the characteristics and the successful application of ensembles to different application domains. For instance, Allwein, Schapire, and Singer interpreted the improved generalization capabilities of ensembles of learning machines in the framework of large margin classifiers [4,177], Kleinberg in the context of stochastic discrimination theory [112], and Breiman and Friedman in the light of the bias-variance analysis borrowed from classical statistics [21,70]. Empirical studies showed that both in classification and regression problems, ensembles improve on single learning machines, and moreover large experimental studies compared the effectiveness of different ensemble methods on benchmark data sets [10,11,49,188]. The interest in this research area is motivated also by the availability of very fast computers and networks of workstations at a relatively low cost that allow the implementation and the experimentation of complex ensemble methods using off-the-shelf computer platforms. However, as explained in Section 26.2 there are deeper reasons to use ensembles of learning machines, motivated by the intrinsic characteristics of the ensemble methods. The main aim of this chapter is to introduce ensemble methods and to provide an overview and a bibliography of the main areas of research, without pretending to be exhaustive or to explain the detailed characteristics of each ensemble method. The paper is organized as follows. In the next section, the main theoretical and practical reasons for combining multiple learners are introduced. Section 26.3 depicts the main taxonomies on ensemble methods proposed in the literature. In Section 26.4 and 26.5, we present an overview of the main supervised ensemble methods reported in the literature, adopting a simple taxonomy, originally proposed in Ref. [201]. Applications of ensemble methods are only marginally considered, but a specific section on some relevant applications of ensemble methods in astronomy and

  4. Ensemble Clustering Classification Applied to Competing SVM and One-Class Classifiers Exemplified by Plant MicroRNAs Data

    Directory of Open Access Journals (Sweden)

    Yousef Malik

    2016-12-01

    Full Text Available The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN. In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.

  5. On evaluation of ensemble precipitation forecasts with observation-based ensembles

    Directory of Open Access Journals (Sweden)

    S. Jaun

    2007-04-01

    Full Text Available Spatial interpolation of precipitation data is uncertain. How important is this uncertainty and how can it be considered in evaluation of high-resolution probabilistic precipitation forecasts? These questions are discussed by experimental evaluation of the COSMO consortium's limited-area ensemble prediction system COSMO-LEPS. The applied performance measure is the often used Brier skill score (BSS. The observational references in the evaluation are (a analyzed rain gauge data by ordinary Kriging and (b ensembles of interpolated rain gauge data by stochastic simulation. This permits the consideration of either a deterministic reference (the event is observed or not with 100% certainty or a probabilistic reference that makes allowance for uncertainties in spatial averaging. The evaluation experiments show that the evaluation uncertainties are substantial even for the large area (41 300 km2 of Switzerland with a mean rain gauge distance as good as 7 km: the one- to three-day precipitation forecasts have skill decreasing with forecast lead time but the one- and two-day forecast performances differ not significantly.

  6. Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.

    Science.gov (United States)

    Ali, Safdar; Majid, Abdul; Javed, Syed Gibran; Sattar, Mohsin

    2016-06-01

    Early prediction of breast cancer is important for effective treatment and survival. We developed an effective Cost-Sensitive Classifier with GentleBoost Ensemble (Can-CSC-GBE) for the classification of breast cancer using protein amino acid features. In this work, first, discriminant information of the protein sequences related to breast tissue is extracted. Then, the physicochemical properties hydrophobicity and hydrophilicity of amino acids are employed to generate molecule descriptors in different feature spaces. For comparison, we obtained results by combining Cost-Sensitive learning with conventional ensemble of AdaBoostM1 and Bagging. The proposed Can-CSC-GBE system has effectively reduced the misclassification costs and thereby improved the overall classification performance. Our novel approach has highlighted promising results as compared to the state-of-the-art ensemble approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Developing an Ensemble Prediction System based on COSMO-DE

    Science.gov (United States)

    Theis, S.; Gebhardt, C.; Buchhold, M.; Ben Bouallègue, Z.; Ohl, R.; Paulat, M.; Peralta, C.

    2010-09-01

    The numerical weather prediction model COSMO-DE is a configuration of the COSMO model with a horizontal grid size of 2.8 km. It has been running operationally at DWD since 2007, it covers the area of Germany and produces forecasts with a lead time of 0-21 hours. The model COSMO-DE is convection-permitting, which means that it does without a parametrisation of deep convection and simulates deep convection explicitly. One aim is an improved forecast of convective heavy rain events. Convection-permitting models are in operational use at several weather services, but currently not in ensemble mode. It is expected that an ensemble system could reveal the advantages of a convection-permitting model even better. The probabilistic approach is necessary, because the explicit simulation of convective processes for more than a few hours cannot be viewed as a deterministic forecast anymore. This is due to the chaotic behaviour and short life cycle of the processes which are simulated explicitly now. In the framework of the project COSMO-DE-EPS, DWD is developing and implementing an ensemble prediction system (EPS) for the model COSMO-DE. The project COSMO-DE-EPS comprises the generation of ensemble members, as well as the verification and visualization of the ensemble forecasts and also statistical postprocessing. A pre-operational mode of the EPS with 20 ensemble members is foreseen to start in 2010. Operational use is envisaged to start in 2012, after an upgrade to 40 members and inclusion of statistical postprocessing. The presentation introduces the project COSMO-DE-EPS and describes the design of the ensemble as it is planned for the pre-operational mode. In particular, the currently implemented method for the generation of ensemble members will be explained and discussed. The method includes variations of initial conditions, lateral boundary conditions, and model physics. At present, pragmatic methods are applied which resemble the basic ideas of a multi-model approach

  8. Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

    Science.gov (United States)

    Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

    2018-01-01

    In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.

  9. CT-based injury classification

    International Nuclear Information System (INIS)

    Mirvis, S.E.; Whitley, N.O.; Vainright, J.; Gens, D.

    1988-01-01

    Review of preoperative abdominal CT scans obtained in adults after blunt trauma during a 2.5-year period demonstrated isolated or predominant liver injury in 35 patients and splenic injury in 33 patients. CT-based injury scores, consisting of five levels of hepatic injury and four levels of splenic injury, were correlated with clinical outcome and surgical findings. Hepatic injury grades I-III, present in 33 of 35 patients, were associated with successful nonsurgical management in 27 (82%) or with findings at celiotomy not requiring surgical intervention in four (12%). Higher grades of splenic injury generally required early operative intervention, but eight (36%) of 22 patients with initial grade III or IV injury were managed without surgery, while four (36%) of 11 patients with grade I or II injury required delayed celiotomy and splenectomy (three patients) or emergent rehospitalization (one patient). CT-based injury classification is useful in guiding the nonoperative management of blunt hepatic injury in hemodynamically stable adults but appears to be less reliable in predicting the outcome of blunt splenic injury

  10. Detecting Malware with an Ensemble Method Based on Deep Neural Network

    Directory of Open Access Journals (Sweden)

    Jinpei Yan

    2018-01-01

    Full Text Available Malware detection plays a crucial role in computer security. Recent researches mainly use machine learning based methods heavily relying on domain knowledge for manually extracting malicious features. In this paper, we propose MalNet, a novel malware detection method that learns features automatically from the raw data. Concretely, we first generate a grayscale image from malware file, meanwhile extracting its opcode sequences with the decompilation tool IDA. Then MalNet uses CNN and LSTM networks to learn from grayscale image and opcode sequence, respectively, and takes a stacking ensemble for malware classification. We perform experiments on more than 40,000 samples including 20,650 benign files collected from online software providers and 21,736 malwares provided by Microsoft. The evaluation result shows that MalNet achieves 99.88% validation accuracy for malware detection. In addition, we also take malware family classification experiment on 9 malware families to compare MalNet with other related works, in which MalNet outperforms most of related works with 99.36% detection accuracy and achieves a considerable speed-up on detecting efficiency comparing with two state-of-the-art results on Microsoft malware dataset.

  11. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    Science.gov (United States)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  12. Ensemble method: Community detection based on game theory

    Science.gov (United States)

    Zhang, Xia; Xia, Zhengyou; Xu, Shengwu; Wang, J. D.

    2014-08-01

    Timely and cost-effective analytics over social network has emerged as a key ingredient for success in many businesses and government endeavors. Community detection is an active research area of relevance to analyze online social network. The problem of selecting a particular community detection algorithm is crucial if the aim is to unveil the community structure of a network. The choice of a given methodology could affect the outcome of the experiments because different algorithms have different advantages and depend on tuning specific parameters. In this paper, we propose a community division model based on the notion of game theory, which can combine advantages of previous algorithms effectively to get a better community classification result. By making experiments on some standard dataset, it verifies that our community detection model based on game theory is valid and better.

  13. Polyphony: superposition independent methods for ensemble-based drug discovery.

    Science.gov (United States)

    Pitt, William R; Montalvão, Rinaldo W; Blundell, Tom L

    2014-09-30

    Structure-based drug design is an iterative process, following cycles of structural biology, computer-aided design, synthetic chemistry and bioassay. In favorable circumstances, this process can lead to the structures of hundreds of protein-ligand crystal structures. In addition, molecular dynamics simulations are increasingly being used to further explore the conformational landscape of these complexes. Currently, methods capable of the analysis of ensembles of crystal structures and MD trajectories are limited and usually rely upon least squares superposition of coordinates. Novel methodologies are described for the analysis of multiple structures of a protein. Statistical approaches that rely upon residue equivalence, but not superposition, are developed. Tasks that can be performed include the identification of hinge regions, allosteric conformational changes and transient binding sites. The approaches are tested on crystal structures of CDK2 and other CMGC protein kinases and a simulation of p38α. Known interaction - conformational change relationships are highlighted but also new ones are revealed. A transient but druggable allosteric pocket in CDK2 is predicted to occur under the CMGC insert. Furthermore, an evolutionarily-conserved conformational link from the location of this pocket, via the αEF-αF loop, to phosphorylation sites on the activation loop is discovered. New methodologies are described and validated for the superimposition independent conformational analysis of large collections of structures or simulation snapshots of the same protein. The methodologies are encoded in a Python package called Polyphony, which is released as open source to accompany this paper [http://wrpitt.bitbucket.org/polyphony/].

  14. Development of multimodel ensemble based district level medium ...

    Indian Academy of Sciences (India)

    tively by computing the anomaly correlation coef- ficient between the predicted rainfall and observed rainfall. High resolution (lat./long.) gridded data ..... particularly in the prediction of intensity and mesoscale rainfall features causing inland flooding. During recent years, Ensemble. Prediction System (EPS) has emerged as ...

  15. Efficient Kernel-Based Ensemble Gaussian Mixture Filtering

    KAUST Repository

    Liu, Bo; Ait-El-Fquih, Boujemaa; Hoteit, Ibrahim

    2015-01-01

    (KF)-like update of the ensemble members and a particle filter (PF)-like update of the weights, followed by a resampling step to start a new forecast cycle. After formulating EnGMF for any observational operator, we analyze the influence

  16. Analysis and Classification of Stride Patterns Associated with Children Development Using Gait Signal Dynamics Parameters and Ensemble Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Meihong Wu

    2016-01-01

    Full Text Available Measuring stride variability and dynamics in children is useful for the quantitative study of gait maturation and neuromotor development in childhood and adolescence. In this paper, we computed the sample entropy (SampEn and average stride interval (ASI parameters to quantify the stride series of 50 gender-matched children participants in three age groups. We also normalized the SampEn and ASI values by leg length and body mass for each participant, respectively. Results show that the original and normalized SampEn values consistently decrease over the significance level of the Mann-Whitney U test (p<0.01 in children of 3–14 years old, which indicates the stride irregularity has been significantly ameliorated with the body growth. The original and normalized ASI values are also significantly changing when comparing between any two groups of young (aged 3–5 years, middle (aged 6–8 years, and elder (aged 10–14 years children. Such results suggest that healthy children may better modulate their gait cadence rhythm with the development of their musculoskeletal and neurological systems. In addition, the AdaBoost.M2 and Bagging algorithms were used to effectively distinguish the children’s gait patterns. These ensemble learning algorithms both provided excellent gait classification results in terms of overall accuracy (≥90%, recall (≥0.8, and precision (≥0.8077.

  17. The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

    OpenAIRE

    Qin, Xiwen; Li, Qiaoling; Dong, Xiaogang; Lv, Siqi

    2017-01-01

    Accurate diagnosis of rolling bearing fault on the normal operation of machinery and equipment has a very important significance. A method combining Ensemble Empirical Mode Decomposition (EEMD) and Random Forest (RF) is proposed. Firstly, the original signal is decomposed into several intrinsic mode functions (IMFs) by EEMD, and the effective IMFs are selected. Then their energy entropy is calculated as the feature. Finally, the classification is performed by RF. In addition, the wavelet meth...

  18. Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models.

    Directory of Open Access Journals (Sweden)

    Nikola Simidjievski

    Full Text Available Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting, significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.

  19. Current path in light emitting diodes based on nanowire ensembles

    International Nuclear Information System (INIS)

    Limbach, F; Hauswald, C; Lähnemann, J; Wölz, M; Brandt, O; Trampert, A; Hanke, M; Jahn, U; Calarco, R; Geelhaar, L; Riechert, H

    2012-01-01

    Light emitting diodes (LEDs) have been fabricated using ensembles of free-standing (In, Ga)N/GaN nanowires (NWs) grown on Si substrates in the self-induced growth mode by molecular beam epitaxy. Electron-beam-induced current analysis, cathodoluminescence as well as biased μ-photoluminescence spectroscopy, transmission electron microscopy, and electrical measurements indicate that the electroluminescence of such LEDs is governed by the differences in the individual current densities of the single-NW LEDs operated in parallel, i.e. by the inhomogeneity of the current path in the ensemble LED. In addition, the optoelectronic characterization leads to the conclusion that these NWs exhibit N-polarity and that the (In, Ga)N quantum well states in the NWs are subject to a non-vanishing quantum confined Stark effect. (paper)

  20. Dynamic Metabolic Model Building Based on the Ensemble Modeling Approach

    Energy Technology Data Exchange (ETDEWEB)

    Liao, James C. [Univ. of California, Los Angeles, CA (United States)

    2016-10-01

    Ensemble modeling of kinetic systems addresses the challenges of kinetic model construction, with respect to parameter value selection, and still allows for the rich insights possible from kinetic models. This project aimed to show that constructing, implementing, and analyzing such models is a useful tool for the metabolic engineering toolkit, and that they can result in actionable insights from models. Key concepts are developed and deliverable publications and results are presented.

  1. A preclustering-based ensemble learning technique for acute appendicitis diagnoses.

    Science.gov (United States)

    Lee, Yen-Hsien; Hu, Paul Jen-Hwa; Cheng, Tsang-Hsiang; Huang, Te-Chia; Chuang, Wei-Yao

    2013-06-01

    Acute appendicitis is a common medical condition, whose effective, timely diagnosis can be difficult. A missed diagnosis not only puts the patient in danger but also requires additional resources for corrective treatments. An acute appendicitis diagnosis constitutes a classification problem, for which a further fundamental challenge pertains to the skewed outcome class distribution of instances in the training sample. A preclustering-based ensemble learning (PEL) technique aims to address the associated imbalanced sample learning problems and thereby support the timely, accurate diagnosis of acute appendicitis. The proposed PEL technique employs undersampling to reduce the number of majority-class instances in a training sample, uses preclustering to group similar majority-class instances into multiple groups, and selects from each group representative instances to create more balanced samples. The PEL technique thereby reduces potential information loss from random undersampling. It also takes advantage of ensemble learning to improve performance. We empirically evaluate this proposed technique with 574 clinical cases obtained from a comprehensive tertiary hospital in southern Taiwan, using several prevalent techniques and a salient scoring system as benchmarks. The comparative results show that PEL is more effective and less biased than any benchmarks. The proposed PEL technique seems more sensitive to identifying positive acute appendicitis than the commonly used Alvarado scoring system and exhibits higher specificity in identifying negative acute appendicitis. In addition, the sensitivity and specificity values of PEL appear higher than those of the investigated benchmarks that follow the resampling approach. Our analysis suggests PEL benefits from the more representative majority-class instances in the training sample. According to our overall evaluation results, PEL records the best overall performance, and its area under the curve measure reaches 0.619. The

  2. Ensembl 2017

    OpenAIRE

    Aken, Bronwen L.; Achuthan, Premanand; Akanni, Wasiu; Amode, M. Ridwan; Bernsdorff, Friederike; Bhai, Jyothish; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Gil, Laurent; Gir?n, Carlos Garc?a; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.

    2016-01-01

    Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access ...

  3. Cloud field classification based on textural features

    Science.gov (United States)

    Sengupta, Sailes Kumar

    1989-01-01

    An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes

  4. Competitive Learning Neural Network Ensemble Weighted by Predicted Performance

    Science.gov (United States)

    Ye, Qiang

    2010-01-01

    Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different…

  5. An Authentication Technique Based on Classification

    Institute of Scientific and Technical Information of China (English)

    李钢; 杨杰

    2004-01-01

    We present a novel watermarking approach based on classification for authentication, in which a watermark is embedded into the host image. When the marked image is modified, the extracted watermark is also different to the original watermark, and different kinds of modification lead to different extracted watermarks. In this paper, different kinds of modification are considered as classes, and we used classification algorithm to recognize the modifications with high probability. Simulation results show that the proposed method is potential and effective.

  6. Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China

    Science.gov (United States)

    Liu, Li; Gao, Chao; Xuan, Weidong; Xu, Yue-Ping

    2017-11-01

    Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGA II multi-objective algorithm. According to the solutions by ε-NSGA II, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGA II can provide an objective determination on parameter estimation, and the parallel program permits a more efficient simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For

  7. EMG finger movement classification based on ANFIS

    Science.gov (United States)

    Caesarendra, W.; Tjahjowidodo, T.; Nico, Y.; Wahyudati, S.; Nurhasanah, L.

    2018-04-01

    An increase number of people suffering from stroke has impact to the rapid development of finger hand exoskeleton to enable an automatic physical therapy. Prior to the development of finger exoskeleton, a research topic yet important i.e. machine learning of finger gestures classification is conducted. This paper presents a study on EMG signal classification of 5 finger gestures as a preliminary study toward the finger exoskeleton design and development in Indonesia. The EMG signals of 5 finger gestures were acquired using Myo EMG sensor. The EMG signal features were extracted and reduced using PCA. The ANFIS based learning is used to classify reduced features of 5 finger gestures. The result shows that the classification of finger gestures is less than the classification of 7 hand gestures.

  8. A convection-allowing ensemble forecast based on the breeding growth mode and associated optimization of precipitation forecast

    Science.gov (United States)

    Li, Xiang; He, Hongrang; Chen, Chaohui; Miao, Ziqing; Bai, Shigang

    2017-10-01

    A convection-allowing ensemble forecast experiment on a squall line was conducted based on the breeding growth mode (BGM). Meanwhile, the probability matched mean (PMM) and neighborhood ensemble probability (NEP) methods were used to optimize the associated precipitation forecast. The ensemble forecast predicted the precipitation tendency accurately, which was closer to the observation than in the control forecast. For heavy rainfall, the precipitation center produced by the ensemble forecast was also better. The Fractions Skill Score (FSS) results indicated that the ensemble mean was skillful in light rainfall, while the PMM produced better probability distribution of precipitation for heavy rainfall. Preliminary results demonstrated that convection-allowing ensemble forecast could improve precipitation forecast skill through providing valuable probability forecasts. It is necessary to employ new methods, such as the PMM and NEP, to generate precipitation probability forecasts. Nonetheless, the lack of spread and the overprediction of precipitation by the ensemble members are still problems that need to be solved.

  9. Inventory classification based on decoupling points

    Directory of Open Access Journals (Sweden)

    Joakim Wikner

    2015-01-01

    Full Text Available The ideal state of continuous one-piece flow may never be achieved. Still the logistics manager can improve the flow by carefully positioning inventory to buffer against variations. Strategies such as lean, postponement, mass customization, and outsourcing all rely on strategic positioning of decoupling points to separate forecast-driven from customer-order-driven flows. Planning and scheduling of the flow are also based on classification of decoupling points as master scheduled or not. A comprehensive classification scheme for these types of decoupling points is introduced. The approach rests on identification of flows as being either demand based or supply based. The demand or supply is then combined with exogenous factors, classified as independent, or endogenous factors, classified as dependent. As a result, eight types of strategic as well as tactical decoupling points are identified resulting in a process-based framework for inventory classification that can be used for flow design.

  10. Multiple kernel boosting framework based on information measure for classification

    International Nuclear Information System (INIS)

    Qi, Chengming; Wang, Yuping; Tian, Wenjie; Wang, Qun

    2016-01-01

    The performance of kernel-based method, such as support vector machine (SVM), is greatly affected by the choice of kernel function. Multiple kernel learning (MKL) is a promising family of machine learning algorithms and has attracted many attentions in recent years. MKL combines multiple sub-kernels to seek better results compared to single kernel learning. In order to improve the efficiency of SVM and MKL, in this paper, the Kullback–Leibler kernel function is derived to develop SVM. The proposed method employs an improved ensemble learning framework, named KLMKB, which applies Adaboost to learning multiple kernel-based classifier. In the experiment for hyperspectral remote sensing image classification, we employ feature selected through Optional Index Factor (OIF) to classify the satellite image. We extensively examine the performance of our approach in comparison to some relevant and state-of-the-art algorithms on a number of benchmark classification data sets and hyperspectral remote sensing image data set. Experimental results show that our method has a stable behavior and a noticeable accuracy for different data set.

  11. A Matrix-Free Posterior Ensemble Kalman Filter Implementation Based on a Modified Cholesky Decomposition

    Directory of Open Access Journals (Sweden)

    Elias D. Nino-Ruiz

    2017-07-01

    Full Text Available In this paper, a matrix-free posterior ensemble Kalman filter implementation based on a modified Cholesky decomposition is proposed. The method works as follows: the precision matrix of the background error distribution is estimated based on a modified Cholesky decomposition. The resulting estimator can be expressed in terms of Cholesky factors which can be updated based on a series of rank-one matrices in order to approximate the precision matrix of the analysis distribution. By using this matrix, the posterior ensemble can be built by either sampling from the posterior distribution or using synthetic observations. Furthermore, the computational effort of the proposed method is linear with regard to the model dimension and the number of observed components from the model domain. Experimental tests are performed making use of the Lorenz-96 model. The results reveal that, the accuracy of the proposed implementation in terms of root-mean-square-error is similar, and in some cases better, to that of a well-known ensemble Kalman filter (EnKF implementation: the local ensemble transform Kalman filter. In addition, the results are comparable to those obtained by the EnKF with large ensemble sizes.

  12. Genetic algorithm based adaptive neural network ensemble and its application in predicting carbon flux

    Science.gov (United States)

    Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.

    2007-01-01

    To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.

  13. Comparative Visualization of Vector Field Ensembles Based on Longest Common Subsequence

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Richen; Guo, Hanqi; Zhang, Jiang; Yuan, Xiaoru

    2016-04-19

    We propose a longest common subsequence (LCS) based approach to compute the distance among vector field ensembles. By measuring how many common blocks the ensemble pathlines passing through, the LCS distance defines the similarity among vector field ensembles by counting the number of sharing domain data blocks. Compared to the traditional methods (e.g. point-wise Euclidean distance or dynamic time warping distance), the proposed approach is robust to outlier, data missing, and sampling rate of pathline timestep. Taking the advantages of smaller and reusable intermediate output, visualization based on the proposed LCS approach revealing temporal trends in the data at low storage cost, and avoiding tracing pathlines repeatedly. Finally, we evaluate our method on both synthetic data and simulation data, which demonstrate the robustness of the proposed approach.

  14. The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

    Directory of Open Access Journals (Sweden)

    Xiwen Qin

    2017-01-01

    Full Text Available Accurate diagnosis of rolling bearing fault on the normal operation of machinery and equipment has a very important significance. A method combining Ensemble Empirical Mode Decomposition (EEMD and Random Forest (RF is proposed. Firstly, the original signal is decomposed into several intrinsic mode functions (IMFs by EEMD, and the effective IMFs are selected. Then their energy entropy is calculated as the feature. Finally, the classification is performed by RF. In addition, the wavelet method is also used in the proposed process, the same as EEMD. The results of the comparison show that the EEMD method is more accurate than the wavelet method.

  15. R-FCN Object Detection Ensemble based on Object Resolution and Image Quality

    DEFF Research Database (Denmark)

    Rasmussen, Christoffer Bøgelund; Nasrollahi, Kamal; Moeslund, Thomas B.

    2017-01-01

    Object detection can be difficult due to challenges such as variations in objects both inter- and intra-class. Additionally, variations can also be present between images. Based on this, research was conducted into creating an ensemble of Region-based Fully Convolutional Networks (R-FCN) object d...

  16. AN ENSEMBLE TEMPLATE MATCHING AND CONTENT-BASED IMAGE RETRIEVAL SCHEME TOWARDS EARLY STAGE DETECTION OF MELANOMA

    Directory of Open Access Journals (Sweden)

    Spiros Kostopoulos

    2016-12-01

    Full Text Available Malignant melanoma represents the most dangerous type of skin cancer. In this study we present an ensemble classification scheme, employing the mutual information, the cross-correlation and the clustering based on proximity of image features methods, for early stage assessment of melanomas on plain photography images. The proposed scheme performs two main operations. First, it retrieves the most similar, to the unknown case, image samples from an available image database with verified benign moles and malignant melanoma cases. Second, it provides an automated estimation regarding the nature of the unknown image sample based on the majority of the most similar images retrieved from the available database. Clinical material comprised 75 melanoma and 75 benign plain photography images collected from publicly available dermatological atlases. Results showed that the ensemble scheme outperformed all other methods tested in terms of accuracy with 94.9±1.5%, following an external cross-validation evaluation methodology. The proposed scheme may benefit patients by providing a second opinion consultation during the self-skin examination process and the physician by providing a second opinion estimation regarding the nature of suspicious moles that may assist towards decision making especially for ambiguous cases, safeguarding, in this way from potential diagnostic misinterpretations.

  17. A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning.

    Science.gov (United States)

    Lu, Wei; Li, Zhe; Chu, Jinghui

    2017-04-01

    Breast cancer is a common cancer among women. With the development of modern medical science and information technology, medical imaging techniques have an increasingly important role in the early detection and diagnosis of breast cancer. In this paper, we propose an automated computer-aided diagnosis (CADx) framework for magnetic resonance imaging (MRI). The scheme consists of an ensemble of several machine learning-based techniques, including ensemble under-sampling (EUS) for imbalanced data processing, the Relief algorithm for feature selection, the subspace method for providing data diversity, and Adaboost for improving the performance of base classifiers. We extracted morphological, various texture, and Gabor features. To clarify the feature subsets' physical meaning, subspaces are built by combining morphological features with each kind of texture or Gabor feature. We tested our proposal using a manually segmented Region of Interest (ROI) data set, which contains 438 images of malignant tumors and 1898 images of normal tissues or benign tumors. Our proposal achieves an area under the ROC curve (AUC) value of 0.9617, which outperforms most other state-of-the-art breast MRI CADx systems. Compared with other methods, our proposal significantly reduces the false-positive classification rate. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Ensemble-based Regional Climate Prediction: Political Impacts

    Science.gov (United States)

    Miguel, E.; Dykema, J.; Satyanath, S.; Anderson, J. G.

    2008-12-01

    Accurate forecasts of regional climate, including temperature and precipitation, have significant implications for human activities, not just economically but socially. Sub Saharan Africa is a region that has displayed an exceptional propensity for devastating civil wars. Recent research in political economy has revealed a strong statistical relationship between year to year fluctuations in precipitation and civil conflict in this region in the 1980s and 1990s. To investigate how climate change may modify the regional risk of civil conflict in the future requires a probabilistic regional forecast that explicitly accounts for the community's uncertainty in the evolution of rainfall under anthropogenic forcing. We approach the regional climate prediction aspect of this question through the application of a recently demonstrated method called generalized scalar prediction (Leroy et al. 2009), which predicts arbitrary scalar quantities of the climate system. This prediction method can predict change in any variable or linear combination of variables of the climate system averaged over a wide range spatial scales, from regional to hemispheric to global. Generalized scalar prediction utilizes an ensemble of model predictions to represent the community's uncertainty range in climate modeling in combination with a timeseries of any type of observational data that exhibits sensitivity to the scalar of interest. It is not necessary to prioritize models in deriving with the final prediction. We present the results of the application of generalized scalar prediction for regional forecasts of temperature and precipitation and Sub Saharan Africa. We utilize the climate predictions along with the established statistical relationship between year-to-year rainfall variability in Sub Saharan Africa to investigate the potential impact of climate change on civil conflict within that region.

  19. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    Energy Technology Data Exchange (ETDEWEB)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    2014-03-15

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  20. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    International Nuclear Information System (INIS)

    Singh, Kunwar P.; Gupta, Shikha

    2014-01-01

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R 2 ) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R 2 and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  1. Wang-Landau Reaction Ensemble Method: Simulation of Weak Polyelectrolytes and General Acid-Base Reactions.

    Science.gov (United States)

    Landsgesell, Jonas; Holm, Christian; Smiatek, Jens

    2017-02-14

    We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.

  2. Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning

    DEFF Research Database (Denmark)

    Aakerberg, Andreas; Nasrollahi, Kamal; Heder, Thomas

    2018-01-01

    Augmenting RGB images with depth information is a well-known method to significantly improve the recognition accuracy of object recognition models. Another method to im- prove the performance of visual recognition models is ensemble learning. However, this method has not been widely explored...... in combination with deep convolutional neural network based RGB-D object recognition models. Hence, in this paper, we form different ensembles of complementary deep convolutional neural network models, and show that this can be used to increase the recognition performance beyond existing limits. Experiments...

  3. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  4. Rough set classification based on quantum logic

    Science.gov (United States)

    Hassan, Yasser F.

    2017-11-01

    By combining the advantages of quantum computing and soft computing, the paper shows that rough sets can be used with quantum logic for classification and recognition systems. We suggest the new definition of rough set theory as quantum logic theory. Rough approximations are essential elements in rough set theory, the quantum rough set model for set-valued data directly construct set approximation based on a kind of quantum similarity relation which is presented here. Theoretical analyses demonstrate that the new model for quantum rough sets has new type of decision rule with less redundancy which can be used to give accurate classification using principles of quantum superposition and non-linear quantum relations. To our knowledge, this is the first attempt aiming to define rough sets in representation of a quantum rather than logic or sets. The experiments on data-sets have demonstrated that the proposed model is more accuracy than the traditional rough sets in terms of finding optimal classifications.

  5. An adaptive Gaussian process-based iterative ensemble smoother for data assimilation

    Science.gov (United States)

    Ju, Lei; Zhang, Jiangjiang; Meng, Long; Wu, Laosheng; Zeng, Lingzao

    2018-05-01

    Accurate characterization of subsurface hydraulic conductivity is vital for modeling of subsurface flow and transport. The iterative ensemble smoother (IES) has been proposed to estimate the heterogeneous parameter field. As a Monte Carlo-based method, IES requires a relatively large ensemble size to guarantee its performance. To improve the computational efficiency, we propose an adaptive Gaussian process (GP)-based iterative ensemble smoother (GPIES) in this study. At each iteration, the GP surrogate is adaptively refined by adding a few new base points chosen from the updated parameter realizations. Then the sensitivity information between model parameters and measurements is calculated from a large number of realizations generated by the GP surrogate with virtually no computational cost. Since the original model evaluations are only required for base points, whose number is much smaller than the ensemble size, the computational cost is significantly reduced. The applicability of GPIES in estimating heterogeneous conductivity is evaluated by the saturated and unsaturated flow problems, respectively. Without sacrificing estimation accuracy, GPIES achieves about an order of magnitude of speed-up compared with the standard IES. Although subsurface flow problems are considered in this study, the proposed method can be equally applied to other hydrological models.

  6. An Ensemble Learning Based Framework for Traditional Chinese Medicine Data Analysis with ICD-10 Labels

    Directory of Open Access Journals (Sweden)

    Gang Zhang

    2015-01-01

    Full Text Available Objective. This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation. Methods. We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT and support vector machine (SVM are trained by bootstrapping the training dataset. The base learners are sorted by accuracy and diversity through nondominated sort (NDS algorithm and combined through a deep ensemble learning strategy. Results. We evaluate the proposed method with comparison to two currently successful methods on a clinical diagnosis dataset with manually labeled ICD-10 information. ICD-10 label annotation and acupoints recommendation are evaluated for three methods. The proposed method achieves an accuracy rate of 88.2%  ±  2.8% measured by zero-one loss for the first evaluation session and 79.6%  ±  3.6% measured by Hamming loss, which are superior to the other two methods. Conclusion. The proposed ensemble model can effectively model the implied knowledge and experience in historic clinical data records. The computational cost of training a set of base learners is relatively low.

  7. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm

    International Nuclear Information System (INIS)

    Yu, Lean; Wang, Shouyang; Lai, Kin Keung

    2008-01-01

    In this study, an empirical mode decomposition (EMD) based neural network ensemble learning paradigm is proposed for world crude oil spot price forecasting. For this purpose, the original crude oil spot price series were first decomposed into a finite, and often small, number of intrinsic mode functions (IMFs). Then a three-layer feed-forward neural network (FNN) model was used to model each of the extracted IMFs, so that the tendencies of these IMFs could be accurately predicted. Finally, the prediction results of all IMFs are combined with an adaptive linear neural network (ALNN), to formulate an ensemble output for the original crude oil price series. For verification and testing, two main crude oil price series, West Texas Intermediate (WTI) crude oil spot price and Brent crude oil spot price, are used to test the effectiveness of the proposed EMD-based neural network ensemble learning methodology. Empirical results obtained demonstrate attractiveness of the proposed EMD-based neural network ensemble learning paradigm. (author)

  8. An Ensemble of Classifiers based Approach for Prediction of Alzheimer's Disease using fMRI Images based on Fusion of Volumetric, Textural and Hemodynamic Features

    Directory of Open Access Journals (Sweden)

    MALIK, F.

    2018-02-01

    Full Text Available Alzheimer's is a neurodegenerative disease caused by the destruction and death of brain neurons resulting in memory loss, impaired thinking ability, and in certain behavioral changes. Alzheimer disease is a major cause of dementia and eventually death all around the world. Early diagnosis of the disease is crucial which can help the victims to maintain their level of independence for comparatively longer time and live a best life possible. For early detection of Alzheimer's disease, we are proposing a novel approach based on fusion of multiple types of features including hemodynamic, volumetric and textural features of the brain. Our approach uses non-invasive fMRI with ensemble of classifiers, for the classification of the normal controls and the Alzheimer patients. For performance evaluation, ten-fold cross validation is used. Individual feature sets and fusion of features have been investigated with ensemble classifiers for successful classification of Alzheimer's patients from normal controls. It is observed that fusion of features resulted in improved results for accuracy, specificity and sensitivity.

  9. Estimating Uncertainty of Point-Cloud Based Single-Tree Segmentation with Ensemble Based Filtering

    Directory of Open Access Journals (Sweden)

    Matthew Parkan

    2018-02-01

    Full Text Available Individual tree crown segmentation from Airborne Laser Scanning data is a nodal problem in forest remote sensing. Focusing on single layered spruce and fir dominated coniferous forests, this article addresses the problem of directly estimating 3D segment shape uncertainty (i.e., without field/reference surveys, using a probabilistic approach. First, a coarse segmentation (marker controlled watershed is applied. Then, the 3D alpha hull and several descriptors are computed for each segment. Based on these descriptors, the alpha hulls are grouped to form ensembles (i.e., groups of similar tree shapes. By examining how frequently regions of a shape occur within an ensemble, it is possible to assign a shape probability to each point within a segment. The shape probability can subsequently be thresholded to obtain improved (filtered tree segments. Results indicate this approach can be used to produce segmentation reliability maps. A comparison to manually segmented tree crowns also indicates that the approach is able to produce more reliable tree shapes than the initial (unfiltered segmentation.

  10. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

    Science.gov (United States)

    Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

    2014-08-01

    As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.

  11. Three-dimensional theory of quantum memories based on Λ-type atomic ensembles

    International Nuclear Information System (INIS)

    Zeuthen, Emil; Grodecka-Grad, Anna; Soerensen, Anders S.

    2011-01-01

    We develop a three-dimensional theory for quantum memories based on light storage in ensembles of Λ-type atoms, where two long-lived atomic ground states are employed. We consider light storage in an ensemble of finite spatial extent and we show that within the paraxial approximation the Fresnel number of the atomic ensemble and the optical depth are the only important physical parameters determining the quality of the quantum memory. We analyze the influence of these parameters on the storage of light followed by either forward or backward read-out from the quantum memory. We show that for small Fresnel numbers the forward memory provides higher efficiencies, whereas for large Fresnel numbers the backward memory is advantageous. The optimal light modes to store in the memory are presented together with the corresponding spin waves and outcoming light modes. We show that for high optical depths such Λ-type atomic ensembles allow for highly efficient backward and forward memories even for small Fresnel numbers F(greater-or-similar sign)0.1.

  12. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units

    International Nuclear Information System (INIS)

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien

    2015-01-01

    Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables

  13. Analysis of composition-based metagenomic classification.

    Science.gov (United States)

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in

  14. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    International Nuclear Information System (INIS)

    Blanco, A; Rodriguez, R; Martinez-Maranon, I

    2014-01-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity

  15. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    Science.gov (United States)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  16. Mechanism-based drug exposure classification in pharmacoepidemiological studies

    NARCIS (Netherlands)

    Verdel, B.M.

    2010-01-01

    Mechanism-based classification of drug exposure in pharmacoepidemiological studies In pharmacoepidemiology and pharmacovigilance, the relation between drug exposure and clinical outcomes is crucial. Exposure classification in pharmacoepidemiological studies is traditionally based on

  17. Making decisions based on an imperfect ensemble of climate simulators: strategies and future directions

    Science.gov (United States)

    Sanderson, B. M.

    2017-12-01

    The CMIP ensembles represent the most comprehensive source of information available to decision-makers for climate adaptation, yet it is clear that there are fundamental limitations in our ability to treat the ensemble as an unbiased sample of possible future climate trajectories. There is considerable evidence that models are not independent, and increasing complexity and resolution combined with computational constraints prevent a thorough exploration of parametric uncertainty or internal variability. Although more data than ever is available for calibration, the optimization of each model is influenced by institutional priorities, historical precedent and available resources. The resulting ensemble thus represents a miscellany of climate simulators which defy traditional statistical interpretation. Models are in some cases interdependent, but are sufficiently complex that the degree of interdependency is conditional on the application. Configurations have been updated using available observations to some degree, but not in a consistent or easily identifiable fashion. This means that the ensemble cannot be viewed as a true posterior distribution updated by available data, but nor can observational data alone be used to assess individual model likelihood. We assess recent literature for combining projections from an imperfect ensemble of climate simulators. Beginning with our published methodology for addressing model interdependency and skill in the weighting scheme for the 4th US National Climate Assessment, we consider strategies for incorporating process-based constraints on future response, perturbed parameter experiments and multi-model output into an integrated framework. We focus on a number of guiding questions: Is the traditional framework of confidence in projections inferred from model agreement leading to biased or misleading conclusions? Can the benefits of upweighting skillful models be reconciled with the increased risk of truth lying outside the

  18. SQL based cardiovascular ultrasound image classification.

    Science.gov (United States)

    Nandagopalan, S; Suryanarayana, Adiga B; Sudarshan, T S B; Chandrashekar, Dhanalakshmi; Manjunath, C N

    2013-01-01

    This paper proposes a novel method to analyze and classify the cardiovascular ultrasound echocardiographic images using Naïve-Bayesian model via database OLAP-SQL. Efficient data mining algorithms based on tightly-coupled model is used to extract features. Three algorithms are proposed for classification namely Naïve-Bayesian Classifier for Discrete variables (NBCD) with SQL, NBCD with OLAP-SQL, and Naïve-Bayesian Classifier for Continuous variables (NBCC) using OLAP-SQL. The proposed model is trained with 207 patient images containing normal and abnormal categories. Out of the three proposed algorithms, a high classification accuracy of 96.59% was achieved from NBCC which is better than the earlier methods.

  19. AUC-Maximizing Ensembles through Metalearning.

    Science.gov (United States)

    LeDell, Erin; van der Laan, Mark J; Petersen, Maya

    2016-05-01

    Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.

  20. A Discrete Wavelet Based Feature Extraction and Hybrid Classification Technique for Microarray Data Analysis

    Directory of Open Access Journals (Sweden)

    Jaison Bennet

    2014-01-01

    Full Text Available Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN, naive Bayes, and support vector machine (SVM. Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT and moving window technique (MWT is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.

  1. AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.

    Science.gov (United States)

    Zhao, X G; Dai, W; Li, Y; Tian, L

    2011-11-01

    The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.

  2. A Prediction Method of Airport Noise Based on Hybrid Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Tao XU

    2014-05-01

    Full Text Available Using monitoring history data to build and to train a prediction model for airport noise is a normal method in recent years. However, the single model built in different ways has various performances in the storage, efficiency and accuracy. In order to predict the noise accurately in some complex environment around airport, this paper presents a prediction method based on hybrid ensemble learning. The proposed method ensembles three algorithms: artificial neural network as an active learner, nearest neighbor as a passive leaner and nonlinear regression as a synthesized learner. The experimental results show that the three learners can meet forecast demands respectively in on- line, near-line and off-line. And the accuracy of prediction is improved by integrating these three learners’ results.

  3. Decoding of Human Movements Based on Deep Brain Local Field Potentials Using Ensemble Neural Networks

    Directory of Open Access Journals (Sweden)

    Mohammad S. Islam

    2017-01-01

    Full Text Available Decoding neural activities related to voluntary and involuntary movements is fundamental to understanding human brain motor circuits and neuromotor disorders and can lead to the development of neuromotor prosthetic devices for neurorehabilitation. This study explores using recorded deep brain local field potentials (LFPs for robust movement decoding of Parkinson’s disease (PD and Dystonia patients. The LFP data from voluntary movement activities such as left and right hand index finger clicking were recorded from patients who underwent surgeries for implantation of deep brain stimulation electrodes. Movement-related LFP signal features were extracted by computing instantaneous power related to motor response in different neural frequency bands. An innovative neural network ensemble classifier has been proposed and developed for accurate prediction of finger movement and its forthcoming laterality. The ensemble classifier contains three base neural network classifiers, namely, feedforward, radial basis, and probabilistic neural networks. The majority voting rule is used to fuse the decisions of the three base classifiers to generate the final decision of the ensemble classifier. The overall decoding performance reaches a level of agreement (kappa value at about 0.729±0.16 for decoding movement from the resting state and about 0.671±0.14 for decoding left and right visually cued movements.

  4. Ensemble-based data assimilation and optimal sensor placement for scalar source reconstruction

    Science.gov (United States)

    Mons, Vincent; Wang, Qi; Zaki, Tamer

    2017-11-01

    Reconstructing the characteristics of a scalar source from limited remote measurements in a turbulent flow is a problem of great interest for environmental monitoring, and is challenging due to several aspects. Firstly, the numerical estimation of the scalar dispersion in a turbulent flow requires significant computational resources. Secondly, in actual practice, only a limited number of observations are available, which generally makes the corresponding inverse problem ill-posed. Ensemble-based variational data assimilation techniques are adopted to solve the problem of scalar source localization in a turbulent channel flow at Reτ = 180 . This approach combines the components of variational data assimilation and ensemble Kalman filtering, and inherits the robustness from the former and the ease of implementation from the latter. An ensemble-based methodology for optimal sensor placement is also proposed in order to improve the condition of the inverse problem, which enhances the performances of the data assimilation scheme. This work has been partially funded by the Office of Naval Research (Grant N00014-16-1-2542) and by the National Science Foundation (Grant 1461870).

  5. Semi-Supervised Multi-View Ensemble Learning Based On Extracting Cross-View Correlation

    Directory of Open Access Journals (Sweden)

    ZALL, R.

    2016-05-01

    Full Text Available Correlated information between different views incorporate useful for learning in multi view data. Canonical correlation analysis (CCA plays important role to extract these information. However, CCA only extracts the correlated information between paired data and cannot preserve correlated information between within-class samples. In this paper, we propose a two-view semi-supervised learning method called semi-supervised random correlation ensemble base on spectral clustering (SS_RCE. SS_RCE uses a multi-view method based on spectral clustering which takes advantage of discriminative information in multiple views to estimate labeling information of unlabeled samples. In order to enhance discriminative power of CCA features, we incorporate the labeling information of both unlabeled and labeled samples into CCA. Then, we use random correlation between within-class samples from cross view to extract diverse correlated features for training component classifiers. Furthermore, we extend a general model namely SSMV_RCE to construct ensemble method to tackle semi-supervised learning in the presence of multiple views. Finally, we compare the proposed methods with existing multi-view feature extraction methods using multi-view semi-supervised ensembles. Experimental results on various multi-view data sets are presented to demonstrate the effectiveness of the proposed methods.

  6. Object Classification Based on Analysis of Spectral Characteristics of Seismic Signal Envelopes

    Science.gov (United States)

    Morozov, Yu. V.; Spektor, A. A.

    2017-11-01

    A method for classifying moving objects having a seismic effect on the ground surface is proposed which is based on statistical analysis of the envelopes of received signals. The values of the components of the amplitude spectrum of the envelopes obtained applying Hilbert and Fourier transforms are used as classification criteria. Examples illustrating the statistical properties of spectra and the operation of the seismic classifier are given for an ensemble of objects of four classes (person, group of people, large animal, vehicle). It is shown that the computational procedures for processing seismic signals are quite simple and can therefore be used in real-time systems with modest requirements for computational resources.

  7. An approach for classification of hydrogeological systems at the regional scale based on groundwater hydrographs

    Science.gov (United States)

    Haaf, Ezra; Barthel, Roland

    2016-04-01

    When assessing hydrogeological conditions at the regional scale, the analyst is often confronted with uncertainty of structures, inputs and processes while having to base inference on scarce and patchy data. Haaf and Barthel (2015) proposed a concept for handling this predicament by developing a groundwater systems classification framework, where information is transferred from similar, but well-explored and better understood to poorly described systems. The concept is based on the central hypothesis that similar systems react similarly to the same inputs and vice versa. It is conceptually related to PUB (Prediction in ungauged basins) where organization of systems and processes by quantitative methods is intended and used to improve understanding and prediction. Furthermore, using the framework it is expected that regional conceptual and numerical models can be checked or enriched by ensemble generated data from neighborhood-based estimators. In a first step, groundwater hydrographs from a large dataset in Southern Germany are compared in an effort to identify structural similarity in groundwater dynamics. A number of approaches to group hydrographs, mostly based on a similarity measure - which have previously only been used in local-scale studies, can be found in the literature. These are tested alongside different global feature extraction techniques. The resulting classifications are then compared to a visual "expert assessment"-based classification which serves as a reference. A ranking of the classification methods is carried out and differences shown. Selected groups from the classifications are related to geological descriptors. Here we present the most promising results from a comparison of classifications based on series correlation, different series distances and series features, such as the coefficients of the discrete Fourier transform and the intrinsic mode functions of empirical mode decomposition. Additionally, we show examples of classes

  8. Coastal aquifer management under parameter uncertainty: Ensemble surrogate modeling based simulation-optimization

    Science.gov (United States)

    Janardhanan, S.; Datta, B.

    2011-12-01

    Surrogate models are widely used to develop computationally efficient simulation-optimization models to solve complex groundwater management problems. Artificial intelligence based models are most often used for this purpose where they are trained using predictor-predictand data obtained from a numerical simulation model. Most often this is implemented with the assumption that the parameters and boundary conditions used in the numerical simulation model are perfectly known. However, in most practical situations these values are uncertain. Under these circumstances the application of such approximation surrogates becomes limited. In our study we develop a surrogate model based coupled simulation optimization methodology for determining optimal pumping strategies for coastal aquifers considering parameter uncertainty. An ensemble surrogate modeling approach is used along with multiple realization optimization. The methodology is used to solve a multi-objective coastal aquifer management problem considering two conflicting objectives. Hydraulic conductivity and the aquifer recharge are considered as uncertain values. Three dimensional coupled flow and transport simulation model FEMWATER is used to simulate the aquifer responses for a number of scenarios corresponding to Latin hypercube samples of pumping and uncertain parameters to generate input-output patterns for training the surrogate models. Non-parametric bootstrap sampling of this original data set is used to generate multiple data sets which belong to different regions in the multi-dimensional decision and parameter space. These data sets are used to train and test multiple surrogate models based on genetic programming. The ensemble of surrogate models is then linked to a multi-objective genetic algorithm to solve the pumping optimization problem. Two conflicting objectives, viz, maximizing total pumping from beneficial wells and minimizing the total pumping from barrier wells for hydraulic control of

  9. Research on Classification of Chinese Text Data Based on SVM

    Science.gov (United States)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  10. The role of ensemble-based statistics in variational assimilation of cloud-affected observations from infrared imagers

    Science.gov (United States)

    Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris

    2017-04-01

    Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the

  11. Contextual segment-based classification of airborne laser scanner data

    NARCIS (Netherlands)

    Vosselman, George; Coenen, Maximilian; Rottensteiner, Franz

    2017-01-01

    Classification of point clouds is needed as a first step in the extraction of various types of geo-information from point clouds. We present a new approach to contextual classification of segmented airborne laser scanning data. Potential advantages of segment-based classification are easily offset

  12. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    Science.gov (United States)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  13. The Drag-based Ensemble Model (DBEM) for Coronal Mass Ejection Propagation

    Science.gov (United States)

    Dumbović, Mateja; Čalogović, Jaša; Vršnak, Bojan; Temmer, Manuela; Mays, M. Leila; Veronig, Astrid; Piantschitsch, Isabell

    2018-02-01

    The drag-based model for heliospheric propagation of coronal mass ejections (CMEs) is a widely used analytical model that can predict CME arrival time and speed at a given heliospheric location. It is based on the assumption that the propagation of CMEs in interplanetary space is solely under the influence of magnetohydrodynamical drag, where CME propagation is determined based on CME initial properties as well as the properties of the ambient solar wind. We present an upgraded version, the drag-based ensemble model (DBEM), that covers ensemble modeling to produce a distribution of possible ICME arrival times and speeds. Multiple runs using uncertainty ranges for the input values can be performed in almost real-time, within a few minutes. This allows us to define the most likely ICME arrival times and speeds, quantify prediction uncertainties, and determine forecast confidence. The performance of the DBEM is evaluated and compared to that of ensemble WSA-ENLIL+Cone model (ENLIL) using the same sample of events. It is found that the mean error is ME = ‑9.7 hr, mean absolute error MAE = 14.3 hr, and root mean square error RMSE = 16.7 hr, which is somewhat higher than, but comparable to ENLIL errors (ME = ‑6.1 hr, MAE = 12.8 hr and RMSE = 14.4 hr). Overall, DBEM and ENLIL show a similar performance. Furthermore, we find that in both models fast CMEs are predicted to arrive earlier than observed, most likely owing to the physical limitations of models, but possibly also related to an overestimation of the CME initial speed for fast CMEs.

  14. An ensemble based nonlinear orthogonal matching pursuit algorithm for sparse history matching of reservoir models

    KAUST Repository

    Fsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

    2013-01-01

    the dictionary, the solution is obtained by applying Tikhonov regularization. The proposed algorithm relies on approximate gradient estimation using an iterative stochastic ensemble method (ISEM). ISEM utilizes an ensemble of directional derivatives

  15. Skill prediction of local weather forecasts based on the ECMWF ensemble

    Directory of Open Access Journals (Sweden)

    C. Ziehmann

    2001-01-01

    Full Text Available Ensemble Prediction has become an essential part of numerical weather forecasting. In this paper we investigate the ability of ensemble forecasts to provide an a priori estimate of the expected forecast skill. Several quantities derived from the local ensemble distribution are investigated for a two year data set of European Centre for Medium-Range Weather Forecasts (ECMWF temperature and wind speed ensemble forecasts at 30 German stations. The results indicate that the population of the ensemble mode provides useful information for the uncertainty in temperature forecasts. The ensemble entropy is a similar good measure. This is not true for the spread if it is simply calculated as the variance of the ensemble members with respect to the ensemble mean. The number of clusters in the C regions is almost unrelated to the local skill. For wind forecasts, the results are less promising.

  16. Distributed HUC-based modeling with SUMMA for ensemble streamflow forecasting over large regional domains.

    Science.gov (United States)

    Saharia, M.; Wood, A.; Clark, M. P.; Bennett, A.; Nijssen, B.; Clark, E.; Newman, A. J.

    2017-12-01

    Most operational streamflow forecasting systems rely on a forecaster-in-the-loop approach in which some parts of the forecast workflow require an experienced human forecaster. But this approach faces challenges surrounding process reproducibility, hindcasting capability, and extension to large domains. The operational hydrologic community is increasingly moving towards `over-the-loop' (completely automated) large-domain simulations yet recent developments indicate a widespread lack of community knowledge about the strengths and weaknesses of such systems for forecasting. A realistic representation of land surface hydrologic processes is a critical element for improving forecasts, but often comes at the substantial cost of forecast system agility and efficiency. While popular grid-based models support the distributed representation of land surface processes, intermediate-scale Hydrologic Unit Code (HUC)-based modeling could provide a more efficient and process-aligned spatial discretization, reducing the need for tradeoffs between model complexity and critical forecasting requirements such as ensemble methods and comprehensive model calibration. The National Center for Atmospheric Research is collaborating with the University of Washington, the Bureau of Reclamation and the USACE to implement, assess, and demonstrate real-time, over-the-loop distributed streamflow forecasting for several large western US river basins and regions. In this presentation, we present early results from short to medium range hydrologic and streamflow forecasts for the Pacific Northwest (PNW). We employ a real-time 1/16th degree daily ensemble model forcings as well as downscaled Global Ensemble Forecasting System (GEFS) meteorological forecasts. These datasets drive an intermediate-scale configuration of the Structure for Unifying Multiple Modeling Alternatives (SUMMA) model, which represents the PNW using over 11,700 HUCs. The system produces not only streamflow forecasts (using the Mizu

  17. Ensemble-based flash-flood modelling: Taking into account hydrodynamic parameters and initial soil moisture uncertainties

    Science.gov (United States)

    Edouard, Simon; Vincendon, Béatrice; Ducrocq, Véronique

    2018-05-01

    Intense precipitation events in the Mediterranean often lead to devastating flash floods (FF). FF modelling is affected by several kinds of uncertainties and Hydrological Ensemble Prediction Systems (HEPS) are designed to take those uncertainties into account. The major source of uncertainty comes from rainfall forcing and convective-scale meteorological ensemble prediction systems can manage it for forecasting purpose. But other sources are related to the hydrological modelling part of the HEPS. This study focuses on the uncertainties arising from the hydrological model parameters and initial soil moisture with aim to design an ensemble-based version of an hydrological model dedicated to Mediterranean fast responding rivers simulations, the ISBA-TOP coupled system. The first step consists in identifying the parameters that have the strongest influence on FF simulations by assuming perfect precipitation. A sensitivity study is carried out first using a synthetic framework and then for several real events and several catchments. Perturbation methods varying the most sensitive parameters as well as initial soil moisture allow designing an ensemble-based version of ISBA-TOP. The first results of this system on some real events are presented. The direct perspective of this work will be to drive this ensemble-based version with the members of a convective-scale meteorological ensemble prediction system to design a complete HEPS for FF forecasting.

  18. A new strategy for snow-cover mapping using remote sensing data and ensemble based systems techniques

    Science.gov (United States)

    Roberge, S.; Chokmani, K.; De Sève, D.

    2012-04-01

    The snow cover plays an important role in the hydrological cycle of Quebec (Eastern Canada). Consequently, evaluating its spatial extent interests the authorities responsible for the management of water resources, especially hydropower companies. The main objective of this study is the development of a snow-cover mapping strategy using remote sensing data and ensemble based systems techniques. Planned to be tested in a near real-time operational mode, this snow-cover mapping strategy has the advantage to provide the probability of a pixel to be snow covered and its uncertainty. Ensemble systems are made of two key components. First, a method is needed to build an ensemble of classifiers that is diverse as much as possible. Second, an approach is required to combine the outputs of individual classifiers that make up the ensemble in such a way that correct decisions are amplified, and incorrect ones are cancelled out. In this study, we demonstrate the potential of ensemble systems to snow-cover mapping using remote sensing data. The chosen classifier is a sequential thresholds algorithm using NOAA-AVHRR data adapted to conditions over Eastern Canada. Its special feature is the use of a combination of six sequential thresholds varying according to the day in the winter season. Two versions of the snow-cover mapping algorithm have been developed: one is specific for autumn (from October 1st to December 31st) and the other for spring (from March 16th to May 31st). In order to build the ensemble based system, different versions of the algorithm are created by varying randomly its parameters. One hundred of the versions are included in the ensemble. The probability of a pixel to be snow, no-snow or cloud covered corresponds to the amount of votes the pixel has been classified as such by all classifiers. The overall performance of ensemble based mapping is compared to the overall performance of the chosen classifier, and also with ground observations at meteorological

  19. Density Based Support Vector Machines for Classification

    OpenAIRE

    Zahra Nazari; Dongshik Kang

    2015-01-01

    Support Vector Machines (SVM) is the most successful algorithm for classification problems. SVM learns the decision boundary from two classes (for Binary Classification) of training points. However, sometimes there are some less meaningful samples amongst training points, which are corrupted by noises or misplaced in wrong side, called outliers. These outliers are affecting on margin and classification performance, and machine should better to discard them. SVM as a popular and widely used cl...

  20. Ensemble empirical mode decomposition based fluorescence spectral noise reduction for low concentration PAHs

    Science.gov (United States)

    Wang, Shu-tao; Yang, Xue-ying; Kong, De-ming; Wang, Yu-tian

    2017-11-01

    A new noise reduction method based on ensemble empirical mode decomposition (EEMD) is proposed to improve the detection effect for fluorescence spectra. Polycyclic aromatic hydrocarbons (PAHs) pollutants, as a kind of important current environmental pollution source, are highly oncogenic. Using the fluorescence spectroscopy method, the PAHs pollutants can be detected. However, instrument will produce noise in the experiment. Weak fluorescent signals can be affected by noise, so we propose a way to denoise and improve the detection effect. Firstly, we use fluorescence spectrometer to detect PAHs to obtain fluorescence spectra. Subsequently, noises are reduced by EEMD algorithm. Finally, the experiment results show the proposed method is feasible.

  1. Ensemble-based evaluation of extreme water levels for the eastern Baltic Sea

    Science.gov (United States)

    Eelsalu, Maris; Soomere, Tarmo

    2016-04-01

    The risks and damages associated with coastal flooding that are naturally associated with an increase in the magnitude of extreme storm surges are one of the largest concerns of countries with extensive low-lying nearshore areas. The relevant risks are even more contrast for semi-enclosed water bodies such as the Baltic Sea where subtidal (weekly-scale) variations in the water volume of the sea substantially contribute to the water level and lead to large spreading of projections of future extreme water levels. We explore the options for using large ensembles of projections to more reliably evaluate return periods of extreme water levels. Single projections of the ensemble are constructed by means of fitting several sets of block maxima with various extreme value distributions. The ensemble is based on two simulated data sets produced in the Swedish Meteorological and Hydrological Institute. A hindcast by the Rossby Centre Ocean model is sampled with a resolution of 6 h and a similar hindcast by the circulation model NEMO with a resolution of 1 h. As the annual maxima of water levels in the Baltic Sea are not always uncorrelated, we employ maxima for calendar years and for stormy seasons. As the shape parameter of the Generalised Extreme Value distribution changes its sign and substantially varies in magnitude along the eastern coast of the Baltic Sea, the use of a single distribution for the entire coast is inappropriate. The ensemble involves projections based on the Generalised Extreme Value, Gumbel and Weibull distributions. The parameters of these distributions are evaluated using three different ways: maximum likelihood method and method of moments based on both biased and unbiased estimates. The total number of projections in the ensemble is 40. As some of the resulting estimates contain limited additional information, the members of pairs of projections that are highly correlated are assigned weights 0.6. A comparison of the ensemble-based projection of

  2. Classification of research reactors and discussion of thinking of safety regulation based on the classification

    International Nuclear Information System (INIS)

    Song Chenxiu; Zhu Lixin

    2013-01-01

    Research reactors have different characteristics in the fields of reactor type, use, power level, design principle, operation model and safety performance, etc, and also have significant discrepancy in the aspect of nuclear safety regulation. This paper introduces classification of research reactors and discusses thinking of safety regulation based on the classification of research reactors. (authors)

  3. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

    Science.gov (United States)

    Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

    2018-04-11

    The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes

  4. An ensemble prediction approach to weekly Dengue cases forecasting based on climatic and terrain conditions

    Directory of Open Access Journals (Sweden)

    Sougata Deb

    2017-11-01

    Full Text Available Introduction: Dengue fever has been one of the most concerning endemic diseases of recent times. Every year, 50-100 million people get infected by the dengue virus across the world. Historically, it has been most prevalent in Southeast Asia and the Pacific Islands. In recent years, frequent dengue epidemics have started occurring in Latin America as well. This study focused on assessing the impact of different short and long-term lagged climatic predictors on dengue cases. Additionally, it assessed the impact of building an ensemble model using multiple time series and regression models, in improving prediction accuracy. Materials and Methods: Experimental data were based on two Latin American cities, viz. San Juan (Puerto Rico and Iquitos (Peru. Due to weather and geographic differences, San Juan recorded higher dengue incidences than Iquitos. Using lagged cross-correlations, this study confirmed the impact of temperature and vegetation on the number of dengue cases for both cities, though in varied degrees and time lags. An ensemble of multiple predictive models using an elaborate set of derived predictors was built and validated. Results: The proposed ensemble prediction achieved a mean absolute error of 21.55, 4.26 points lower than the 25.81 obtained by a standard negative binomial model. Changes in climatic conditions and urbanization were found to be strong predictors as established empirically in other researches. Some of the predictors were new and informative, which have not been explored in any other relevant studies yet. Discussion and Conclusions: Two original contributions were made in this research. Firstly, a focused and extensive feature engineering aligned with the mosquito lifecycle. Secondly, a novel covariate pattern-matching based prediction approach using past time series trend of the predictor variables. Increased accuracy of the proposed model over the benchmark model proved the appropriateness of the analytical approach

  5. Hybrid nanomembrane-based capacitors for the determination of the dielectric constant of semiconducting molecular ensembles

    Science.gov (United States)

    Petrini, Paula A.; Silva, Ricardo M. L.; de Oliveira, Rafael F.; Merces, Leandro; Bof Bufon, Carlos C.

    2018-06-01

    Considerable advances in the field of molecular electronics have been achieved over the recent years. One persistent challenge, however, is the exploitation of the electronic properties of molecules fully integrated into devices. Typically, the molecular electronic properties are investigated using sophisticated techniques incompatible with a practical device technology, such as the scanning tunneling microscopy. The incorporation of molecular materials in devices is not a trivial task as the typical dimensions of electrical contacts are much larger than the molecular ones. To tackle this issue, we report on hybrid capacitors using mechanically-compliant nanomembranes to encapsulate ultrathin molecular ensembles for the investigation of molecular dielectric properties. As the prototype material, copper (II) phthalocyanine (CuPc) has been chosen as information on its dielectric constant (k CuPc) at the molecular scale is missing. Here, hybrid nanomembrane-based capacitors containing metallic nanomembranes, insulating Al2O3 layers, and the CuPc molecular ensembles have been fabricated and evaluated. The Al2O3 is used to prevent short circuits through the capacitor plates as the molecular layer is considerably thin (electrical measurements of devices with molecular layers of different thicknesses, the CuPc dielectric constant has been reliably determined (k CuPc = 4.5 ± 0.5). These values suggest a mild contribution of the molecular orientation on the CuPc dielectric properties. The reported nanomembrane-based capacitor is a viable strategy for the dielectric characterization of ultrathin molecular ensembles integrated into a practical, real device technology.

  6. Hybrid nanomembrane-based capacitors for the determination of the dielectric constant of semiconducting molecular ensembles.

    Science.gov (United States)

    Petrini, Paula Andreia; Lopes da Silva, Ricardo Magno; de Oliveira, Rafael Furlan; Merces, Leandro; Bufon, Carlos César Bof

    2018-04-06

    Considerable advances in the field of molecular electronics have been achieved over the recent years. One persistent challenge, however, is the exploitation of the electronic properties of molecules fully integrated into devices. Typically, the molecular electronic properties are investigated using sophisticated techniques incompatible with a practical device technology, such as the scanning tunneling microscope (STM). The incorporation of molecular materials in devices is not a trivial task since the typical dimensions of electrical contacts are much larger than the molecular ones. To tackle this issue, we report on hybrid capacitors using mechanically-compliant nanomembranes to encapsulate ultrathin molecular ensembles for the investigation of molecular dielectric properties. As the prototype material, copper (II) phthalocyanine (CuPc) has been chosen as information on its dielectric constant (kCuPc) at the molecular scale is missing. Here, hybrid nanomembrane-based capacitors containing metallic nanomembranes, insulating Al2O3 layers, and the CuPc molecular ensemble have been fabricated and evaluated. The Al2O3 is used to prevent short circuits through the capacitor plates as the molecular layer is considerably thin (< 30 nm). From the electrical measurements of devices with molecular layers of different thicknesses, the CuPc dielectric constant has been reliably determined (kCuPc = 4.5 ± 0.5). These values suggest a mild contribution of molecular orientation in the CuPc dielectric properties. The reported nanomembrane-based capacitor is a viable strategy for the dielectric characterization of ultrathin molecular ensembles integrated into a practical, real device technology. © 2018 IOP Publishing Ltd.

  7. Sentiment classification technology based on Markov logic networks

    Science.gov (United States)

    He, Hui; Li, Zhigang; Yao, Chongchong; Zhang, Weizhe

    2016-07-01

    With diverse online media emerging, there is a growing concern of sentiment classification problem. At present, text sentiment classification mainly utilizes supervised machine learning methods, which feature certain domain dependency. On the basis of Markov logic networks (MLNs), this study proposed a cross-domain multi-task text sentiment classification method rooted in transfer learning. Through many-to-one knowledge transfer, labeled text sentiment classification, knowledge was successfully transferred into other domains, and the precision of the sentiment classification analysis in the text tendency domain was improved. The experimental results revealed the following: (1) the model based on a MLN demonstrated higher precision than the single individual learning plan model. (2) Multi-task transfer learning based on Markov logical networks could acquire more knowledge than self-domain learning. The cross-domain text sentiment classification model could significantly improve the precision and efficiency of text sentiment classification.

  8. Comparing writing style feature-based classification methods for estimating user reputations in social media.

    Science.gov (United States)

    Suh, Jong Hwan

    2016-01-01

    In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.

  9. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment betw...

  10. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge Emil Borch Laurs; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment...

  11. Neural Network Ensembles

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Salamon, Peter

    1990-01-01

    We propose several means for improving the performance an training of neural networks for classification. We use crossvalidation as a tool for optimizing network parameters and architecture. We show further that the remaining generalization error can be reduced by invoking ensembles of similar...... networks....

  12. An Ensemble Approach to Knowledge-Based Intensity-Modulated Radiation Therapy Planning

    Directory of Open Access Journals (Sweden)

    Jiahan Zhang

    2018-03-01

    Full Text Available Knowledge-based planning (KBP utilizes experienced planners’ knowledge embedded in prior plans to estimate optimal achievable dose volume histogram (DVH of new cases. In the regression-based KBP framework, previously planned patients’ anatomical features and DVHs are extracted, and prior knowledge is summarized as the regression coefficients that transform features to organ-at-risk DVH predictions. In our study, we find that in different settings, different regression methods work better. To improve the robustness of KBP models, we propose an ensemble method that combines the strengths of various linear regression models, including stepwise, lasso, elastic net, and ridge regression. In the ensemble approach, we first obtain individual model prediction metadata using in-training-set leave-one-out cross validation. A constrained optimization is subsequently performed to decide individual model weights. The metadata is also used to filter out impactful training set outliers. We evaluate our method on a fresh set of retrospectively retrieved anonymized prostate intensity-modulated radiation therapy (IMRT cases and head and neck IMRT cases. The proposed approach is more robust against small training set size, wrongly labeled cases, and dosimetric inferior plans, compared with other individual models. In summary, we believe the improved robustness makes the proposed method more suitable for clinical settings than individual models.

  13. Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.

    Science.gov (United States)

    Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw

    2018-07-01

    Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Extending Correlation Filter-Based Visual Tracking by Tree-Structured Ensemble and Spatial Windowing.

    Science.gov (United States)

    Gundogdu, Erhan; Ozkan, Huseyin; Alatan, A Aydin

    2017-11-01

    Correlation filters have been successfully used in visual tracking due to their modeling power and computational efficiency. However, the state-of-the-art correlation filter-based (CFB) tracking algorithms tend to quickly discard the previous poses of the target, since they consider only a single filter in their models. On the contrary, our approach is to register multiple CFB trackers for previous poses and exploit the registered knowledge when an appearance change occurs. To this end, we propose a novel tracking algorithm [of complexity O(D) ] based on a large ensemble of CFB trackers. The ensemble [of size O(2 D ) ] is organized over a binary tree (depth D ), and learns the target appearance subspaces such that each constituent tracker becomes an expert of a certain appearance. During tracking, the proposed algorithm combines only the appearance-aware relevant experts to produce boosted tracking decisions. Additionally, we propose a versatile spatial windowing technique to enhance the individual expert trackers. For this purpose, spatial windows are learned for target objects as well as the correlation filters and then the windowed regions are processed for more robust correlations. In our extensive experiments on benchmark datasets, we achieve a substantial performance increase by using the proposed tracking algorithm together with the spatial windowing.

  15. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    Science.gov (United States)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  16. Risk-based classification system of nanomaterials

    International Nuclear Information System (INIS)

    Tervonen, Tommi; Linkov, Igor; Figueira, Jose Rui; Steevens, Jeffery; Chappell, Mark; Merad, Myriam

    2009-01-01

    Various stakeholders are increasingly interested in the potential toxicity and other risks associated with nanomaterials throughout the different stages of a product's life cycle (e.g., development, production, use, disposal). Risk assessment methods and tools developed and applied to chemical and biological materials may not be readily adaptable for nanomaterials because of the current uncertainty in identifying the relevant physico-chemical and biological properties that adequately describe the materials. Such uncertainty is further driven by the substantial variations in the properties of the original material due to variable manufacturing processes employed in nanomaterial production. To guide scientists and engineers in nanomaterial research and application as well as to promote the safe handling and use of these materials, we propose a decision support system for classifying nanomaterials into different risk categories. The classification system is based on a set of performance metrics that measure both the toxicity and physico-chemical characteristics of the original materials, as well as the expected environmental impacts through the product life cycle. Stochastic multicriteria acceptability analysis (SMAA-TRI), a formal decision analysis method, was used as the foundation for this task. This method allowed us to cluster various nanomaterials in different ecological risk categories based on our current knowledge of nanomaterial physico-chemical characteristics, variation in produced material, and best professional judgments. SMAA-TRI uses Monte Carlo simulations to explore all feasible values for weights, criteria measurements, and other model parameters to assess the robustness of nanomaterial grouping for risk management purposes.

  17. Structure-Based Algorithms for Microvessel Classification

    KAUST Repository

    Smith, Amy F.

    2015-02-01

    © 2014 The Authors. Microcirculation published by John Wiley & Sons Ltd. Objective: Recent developments in high-resolution imaging techniques have enabled digital reconstruction of three-dimensional sections of microvascular networks down to the capillary scale. To better interpret these large data sets, our goal is to distinguish branching trees of arterioles and venules from capillaries. Methods: Two novel algorithms are presented for classifying vessels in microvascular anatomical data sets without requiring flow information. The algorithms are compared with a classification based on observed flow directions (considered the gold standard), and with an existing resistance-based method that relies only on structural data. Results: The first algorithm, developed for networks with one arteriolar and one venular tree, performs well in identifying arterioles and venules and is robust to parameter changes, but incorrectly labels a significant number of capillaries as arterioles or venules. The second algorithm, developed for networks with multiple inlets and outlets, correctly identifies more arterioles and venules, but is more sensitive to parameter changes. Conclusions: The algorithms presented here can be used to classify microvessels in large microvascular data sets lacking flow information. This provides a basis for analyzing the distinct geometrical properties and modelling the functional behavior of arterioles, capillaries, and venules.

  18. Risk-based classification system of nanomaterials

    Energy Technology Data Exchange (ETDEWEB)

    Tervonen, Tommi, E-mail: t.p.tervonen@rug.n [University of Groningen, Faculty of Economics and Business (Netherlands); Linkov, Igor, E-mail: igor.linkov@usace.army.mi [US Army Research and Development Center (United States); Figueira, Jose Rui, E-mail: figueira@ist.utl.p [Technical University of Lisbon, CEG-IST, Centre for Management Studies, Instituto Superior Tecnico (Portugal); Steevens, Jeffery, E-mail: jeffery.a.steevens@usace.army.mil; Chappell, Mark, E-mail: mark.a.chappell@usace.army.mi [US Army Research and Development Center (United States); Merad, Myriam, E-mail: myriam.merad@ineris.f [INERIS BP 2, Societal Management of Risks Unit/Accidental Risks Division (France)

    2009-05-15

    Various stakeholders are increasingly interested in the potential toxicity and other risks associated with nanomaterials throughout the different stages of a product's life cycle (e.g., development, production, use, disposal). Risk assessment methods and tools developed and applied to chemical and biological materials may not be readily adaptable for nanomaterials because of the current uncertainty in identifying the relevant physico-chemical and biological properties that adequately describe the materials. Such uncertainty is further driven by the substantial variations in the properties of the original material due to variable manufacturing processes employed in nanomaterial production. To guide scientists and engineers in nanomaterial research and application as well as to promote the safe handling and use of these materials, we propose a decision support system for classifying nanomaterials into different risk categories. The classification system is based on a set of performance metrics that measure both the toxicity and physico-chemical characteristics of the original materials, as well as the expected environmental impacts through the product life cycle. Stochastic multicriteria acceptability analysis (SMAA-TRI), a formal decision analysis method, was used as the foundation for this task. This method allowed us to cluster various nanomaterials in different ecological risk categories based on our current knowledge of nanomaterial physico-chemical characteristics, variation in produced material, and best professional judgments. SMAA-TRI uses Monte Carlo simulations to explore all feasible values for weights, criteria measurements, and other model parameters to assess the robustness of nanomaterial grouping for risk management purposes.

  19. An Ensemble Based Evolutionary Approach to the Class Imbalance Problem with Applications in CBIR

    Directory of Open Access Journals (Sweden)

    Aun Irtaza

    2018-03-01

    Full Text Available In order to lower the dependence on textual annotations for image searches, the content based image retrieval (CBIR has become a popular topic in computer vision. A wide range of CBIR applications consider classification techniques, such as artificial neural networks (ANN, support vector machines (SVM, etc. to understand the query image content to retrieve relevant output. However, in multi-class search environments, the retrieval results are far from optimal due to overlapping semantics amongst subjects of various classes. The classification through multiple classifiers generate better results, but as the number of negative examples increases due to highly correlated semantic classes, classification bias occurs towards the negative class, hence, the combination of the classifiers become even more unstable particularly in one-against-all classification scenarios. In order to resolve this issue, a genetic algorithm (GA based classifier comity learning (GCCL method is presented in this paper to generate stable classifiers by combining ANN with SVMs through asymmetric and symmetric bagging. The proposed approach resolves the classification disagreement amongst different classifiers and also resolves the class imbalance problem in CBIR. Once the stable classifiers are generated, the query image is presented to the trained model to understand the underlying semantic content of the query image for association with the precise semantic class. Afterwards, the feature similarity is computed within the obtained class to generate the semantic response of the system. The experiments reveal that the proposed method outperforms various state-of-the-art methods and significantly improves the image retrieval performance.

  20. KNN BASED CLASSIFICATION OF DIGITAL MODULATED SIGNALS

    Directory of Open Access Journals (Sweden)

    Sajjad Ahmed Ghauri

    2016-11-01

    Full Text Available Demodulation process without the knowledge of modulation scheme requires Automatic Modulation Classification (AMC. When receiver has limited information about received signal then AMC become essential process. AMC finds important place in the field many civil and military fields such as modern electronic warfare, interfering source recognition, frequency management, link adaptation etc. In this paper we explore the use of K-nearest neighbor (KNN for modulation classification with different distance measurement methods. Five modulation schemes are used for classification purpose which is Binary Phase Shift Keying (BPSK, Quadrature Phase Shift Keying (QPSK, Quadrature Amplitude Modulation (QAM, 16-QAM and 64-QAM. Higher order cummulants (HOC are used as an input feature set to the classifier. Simulation results shows that proposed classification method provides better results for the considered modulation formats.

  1. Ensemble Classifiers for Predicting HIV-1 Resistance from Three Rule-Based Genotypic Resistance Interpretation Systems.

    Science.gov (United States)

    Raposo, Letícia M; Nobre, Flavio F

    2017-08-30

    Resistance to antiretrovirals (ARVs) is a major problem faced by HIV-infected individuals. Different rule-based algorithms were developed to infer HIV-1 susceptibility to antiretrovirals from genotypic data. However, there is discordance between them, resulting in difficulties for clinical decisions about which treatment to use. Here, we developed ensemble classifiers integrating three interpretation algorithms: Agence Nationale de Recherche sur le SIDA (ANRS), Rega, and the genotypic resistance interpretation system from Stanford HIV Drug Resistance Database (HIVdb). Three approaches were applied to develop a classifier with a single resistance profile: stacked generalization, a simple plurality vote scheme and the selection of the interpretation system with the best performance. The strategies were compared with the Friedman's test and the performance of the classifiers was evaluated using the F-measure, sensitivity and specificity values. We found that the three strategies had similar performances for the selected antiretrovirals. For some cases, the stacking technique with naïve Bayes as the learning algorithm showed a statistically superior F-measure. This study demonstrates that ensemble classifiers can be an alternative tool for clinical decision-making since they provide a single resistance profile from the most commonly used resistance interpretation systems.

  2. Integrating Globality and Locality for Robust Representation Based Classification

    Directory of Open Access Journals (Sweden)

    Zheng Zhang

    2014-01-01

    Full Text Available The representation based classification method (RBCM has shown huge potential for face recognition since it first emerged. Linear regression classification (LRC method and collaborative representation classification (CRC method are two well-known RBCMs. LRC and CRC exploit training samples of each class and all the training samples to represent the testing sample, respectively, and subsequently conduct classification on the basis of the representation residual. LRC method can be viewed as a “locality representation” method because it just uses the training samples of each class to represent the testing sample and it cannot embody the effectiveness of the “globality representation.” On the contrary, it seems that CRC method cannot own the benefit of locality of the general RBCM. Thus we propose to integrate CRC and LRC to perform more robust representation based classification. The experimental results on benchmark face databases substantially demonstrate that the proposed method achieves high classification accuracy.

  3. Intelligent and robust prediction of short term wind power using genetic programming based ensemble of neural networks

    International Nuclear Information System (INIS)

    Zameer, Aneela; Arshad, Junaid; Khan, Asifullah; Raja, Muhammad Asif Zahoor

    2017-01-01

    Highlights: • Genetic programming based ensemble of neural networks is employed for short term wind power prediction. • Proposed predictor shows resilience against abrupt changes in weather. • Genetic programming evolves nonlinear mapping between meteorological measures and wind-power. • Proposed approach gives mathematical expressions of wind power to its independent variables. • Proposed model shows relatively accurate and steady wind-power prediction performance. - Abstract: The inherent instability of wind power production leads to critical problems for smooth power generation from wind turbines, which then requires an accurate forecast of wind power. In this study, an effective short term wind power prediction methodology is presented, which uses an intelligent ensemble regressor that comprises Artificial Neural Networks and Genetic Programming. In contrast to existing series based combination of wind power predictors, whereby the error or variation in the leading predictor is propagated down the stream to the next predictors, the proposed intelligent ensemble predictor avoids this shortcoming by introducing Genetical Programming based semi-stochastic combination of neural networks. It is observed that the decision of the individual base regressors may vary due to the frequent and inherent fluctuations in the atmospheric conditions and thus meteorological properties. The novelty of the reported work lies in creating ensemble to generate an intelligent, collective and robust decision space and thereby avoiding large errors due to the sensitivity of the individual wind predictors. The proposed ensemble based regressor, Genetic Programming based ensemble of Artificial Neural Networks, has been implemented and tested on data taken from five different wind farms located in Europe. Obtained numerical results of the proposed model in terms of various error measures are compared with the recent artificial intelligence based strategies to demonstrate the

  4. An ensemble training scheme for machine-learning classification of Hyperion satellite imagery with independent hyperspectral libraries

    Science.gov (United States)

    Friedel, Michael; Buscema, Massimo

    2016-04-01

    A training scheme is proposed for the real-time classification of soil and vegetation (landscape) components in EO-1 Hyperion hyperspectral images. First, an auto-contractive map is used to compute connectivity of reflectance values for spectral bands (N=200) from independent laboratory spectral library components. Second, a minimum spanning tree is used to identify optimal grouping of training components from connectivity values. Third, the reflectance values for optimal landscape component signatures are sorted. Fourth, empirical distribution functions (EDF) are computed for each landscape component. Fifth, the Monte-Carlo technique is used to generate realizations (N=30) for each landscape EDF. The correspondence of component realizations to original signatures validates the stochastic procedure. Presentation of the realizations to the self-organizing map (SOM) is done using three different map sizes: 14x10, 28x20, and 40 x 30. In each case, the SOM training proceeds first with a rough phase (20 iterations using a Gaussian neighborhood with an initial and final radius of 11 units and 3 units) and then fine phase (400 iterations using a Gaussian neighborhood with an initial and final radius of 3 units and 1 unit). The initial and final learning rates of 0.5 and 0.05 decay linearly down to 10-5, and the Gaussian neighborhood function decreases exponentially (decay rate of 10-3 iteration-1) providing reasonable convergence. Following training of the three networks, each corresponding SOM is used to independently classify the original spectral library signatures. In comparing the different SOM networks, the 28x20 map size is chosen for independent reproducibility and processing speed. The corresponding universal distance matrix reveals separation of the seven component classes for this map size thereby supporting it use as a Hyperion classifier.

  5. Comparison of ensemble post-processing approaches, based on empirical and dynamical error modelisation of rainfall-runoff model forecasts

    Science.gov (United States)

    Chardon, J.; Mathevet, T.; Le Lay, M.; Gailhard, J.

    2012-04-01

    In the context of a national energy company (EDF : Electricité de France), hydro-meteorological forecasts are necessary to ensure safety and security of installations, meet environmental standards and improve water ressources management and decision making. Hydrological ensemble forecasts allow a better representation of meteorological and hydrological forecasts uncertainties and improve human expertise of hydrological forecasts, which is essential to synthesize available informations, coming from different meteorological and hydrological models and human experience. An operational hydrological ensemble forecasting chain has been developed at EDF since 2008 and is being used since 2010 on more than 30 watersheds in France. This ensemble forecasting chain is characterized ensemble pre-processing (rainfall and temperature) and post-processing (streamflow), where a large human expertise is solicited. The aim of this paper is to compare 2 hydrological ensemble post-processing methods developed at EDF in order improve ensemble forecasts reliability (similar to Monatanari &Brath, 2004; Schaefli et al., 2007). The aim of the post-processing methods is to dress hydrological ensemble forecasts with hydrological model uncertainties, based on perfect forecasts. The first method (called empirical approach) is based on a statistical modelisation of empirical error of perfect forecasts, by streamflow sub-samples of quantile class and lead-time. The second method (called dynamical approach) is based on streamflow sub-samples of quantile class and streamflow variation, and lead-time. On a set of 20 watersheds used for operational forecasts, results show that both approaches are necessary to ensure a good post-processing of hydrological ensemble, allowing a good improvement of reliability, skill and sharpness of ensemble forecasts. The comparison of the empirical and dynamical approaches shows the limits of the empirical approach which is not able to take into account hydrological

  6. Symmetric minimally entangled typical thermal states, grand-canonical ensembles, and the influence of the collapse bases

    Science.gov (United States)

    Binder, Moritz; Barthel, Thomas

    Based on DMRG, strongly correlated quantum many-body systems at finite temperatures can be simulated by sampling over a certain class of pure matrix product states (MPS) called minimally entangled typical thermal states (METTS). Here, we show how symmetries of the system can be exploited to considerably reduce computation costs in the METTS algorithm. While this is straightforward for the canonical ensemble, we introduce a modification of the algorithm to efficiently simulate the grand-canonical ensemble under utilization of symmetries. In addition, we construct novel symmetry-conserving collapse bases for the transitions in the Markov chain of METTS that improve the speed of convergence of the algorithm by reducing autocorrelations.

  7. Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients.

    Science.gov (United States)

    Chen, Jinying; Yu, Hong

    2017-04-01

    Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care. The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word co-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines. FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<0.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. FIT can automatically identify EHR terms important to patients. It may help develop future interventions

  8. Preliminary Research on Grassland Fine-classification Based on MODIS

    International Nuclear Information System (INIS)

    Hu, Z W; Zhang, S; Yu, X Y; Wang, X S

    2014-01-01

    Grassland ecosystem is important for climatic regulation, maintaining the soil and water. Research on the grassland monitoring method could provide effective reference for grassland resource investigation. In this study, we used the vegetation index method for grassland classification. There are several types of climate in China. Therefore, we need to use China's Main Climate Zone Maps and divide the study region into four climate zones. Based on grassland classification system of the first nation-wide grass resource survey in China, we established a new grassland classification system which is only suitable for this research. We used MODIS images as the basic data resources, and use the expert classifier method to perform grassland classification. Based on the 1:1,000,000 Grassland Resource Map of China, we obtained the basic distribution of all the grassland types and selected 20 samples evenly distributed in each type, then used NDVI/EVI product to summarize different spectral features of different grassland types. Finally, we introduced other classification auxiliary data, such as elevation, accumulate temperature (AT), humidity index (HI) and rainfall. China's nation-wide grassland classification map is resulted by merging the grassland in different climate zone. The overall classification accuracy is 60.4%. The result indicated that expert classifier is proper for national wide grassland classification, but the classification accuracy need to be improved

  9. Observation-based Quantitative Uncertainty Estimation for Realtime Tsunami Inundation Forecast using ABIC and Ensemble Simulation

    Science.gov (United States)

    Takagawa, T.

    2016-12-01

    An ensemble forecasting scheme for tsunami inundation is presented. The scheme consists of three elemental methods. The first is a hierarchical Bayesian inversion using Akaike's Bayesian Information Criterion (ABIC). The second is Montecarlo sampling from a probability density function of multidimensional normal distribution. The third is ensamble analysis of tsunami inundation simulations with multiple tsunami sources. Simulation based validation of the model was conducted. A tsunami scenario of M9.1 Nankai earthquake was chosen as a target of validation. Tsunami inundation around Nagoya Port was estimated by using synthetic tsunami waveforms at offshore GPS buoys. The error of estimation of tsunami inundation area was about 10% even if we used only ten minutes observation data. The estimation accuracy of waveforms on/off land and spatial distribution of maximum tsunami inundation depth is demonstrated.

  10. Neural network ensemble based supplier evaluation model in line with nuclear safety conditions

    International Nuclear Information System (INIS)

    Wang Yonggang; Chang Baosheng

    2006-01-01

    Nuclear safety is the most critical target for nuclear power plant operation. Besides the rigid operation procedures established, evaluation of suppliers working with plants can be another important aspects. Selection and evaluation of suppliers can be classified with qualitative analysis and quantitative management. The indicators involved are coupled with each other in a very complicated manner, therefore the relevant data show the strong characteristic of non-linearity. The article is based on the research and analysis of the real conditions of the Daya Bay nuclear power plant operation management. Through study and analysis of the information home and abroad, and with reference to the neural network ensemble technology, the supplier evaluation system and model are established as illustrated within the paper, thus to heighten objectivity of the supplier selection. (authors)

  11. A proposed data base system for detection, classification and ...

    African Journals Online (AJOL)

    A proposed data base system for detection, classification and location of fault on electricity company of Ghana electrical distribution system. Isaac Owusu-Nyarko, Mensah-Ananoo Eugine. Abstract. No Abstract. Keywords: database, classification of fault, power, distribution system, SCADA, ECG. Full Text: EMAIL FULL TEXT ...

  12. AN OBJECT-BASED METHOD FOR CHINESE LANDFORM TYPES CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    H. Ding

    2016-06-01

    Full Text Available Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM. In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.

  13. 3-D visualization of ensemble weather forecasts - Part 2: Forecasting warm conveyor belt situations for aircraft-based field campaigns

    Science.gov (United States)

    Rautenhaus, M.; Grams, C. M.; Schäfler, A.; Westermann, R.

    2015-02-01

    We present the application of interactive 3-D visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the ECMWF ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and forecast wind field resolution. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (three to seven days before take-off).

  14. Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure

    Directory of Open Access Journals (Sweden)

    Xiaodong Zeng

    2014-01-01

    Full Text Available A weighted accuracy and diversity (WAD method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.

  15. Evaluation of LDA Ensembles Classifiers for Brain Computer Interface

    International Nuclear Information System (INIS)

    Arjona, Cristian; Pentácolo, José; Gareis, Iván; Atum, Yanina; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

    2011-01-01

    The Brain Computer Interface (BCI) translates brain activity into computer commands. To increase the performance of the BCI, to decode the user intentions it is necessary to get better the feature extraction and classification techniques. In this article the performance of a three linear discriminant analysis (LDA) classifiers ensemble is studied. The system based on ensemble can theoretically achieved better classification results than the individual counterpart, regarding individual classifier generation algorithm and the procedures for combine their outputs. Classic algorithms based on ensembles such as bagging and boosting are discussed here. For the application on BCI, it was concluded that the generated results using ER and AUC as performance index do not give enough information to establish which configuration is better.

  16. NYYD Ensemble

    Index Scriptorium Estoniae

    2002-01-01

    NYYD Ensemble'i duost Traksmann - Lukk E.-S. Tüüri teosega "Symbiosis", mis on salvestatud ka hiljuti ilmunud NYYD Ensemble'i CDle. 2. märtsil Rakvere Teatri väikeses saalis ja 3. märtsil Rotermanni Soolalaos, kavas Tüür, Kaumann, Berio, Reich, Yun, Hauta-aho, Buckinx

  17. Finger language recognition based on ensemble artificial neural network learning using armband EMG sensors.

    Science.gov (United States)

    Kim, Seongjung; Kim, Jongman; Ahn, Soonjae; Kim, Youngho

    2018-04-18

    Deaf people use sign or finger languages for communication, but these methods of communication are very specialized. For this reason, the deaf can suffer from social inequalities and financial losses due to their communication restrictions. In this study, we developed a finger language recognition algorithm based on an ensemble artificial neural network (E-ANN) using an armband system with 8-channel electromyography (EMG) sensors. The developed algorithm was composed of signal acquisition, filtering, segmentation, feature extraction and an E-ANN based classifier that was evaluated with the Korean finger language (14 consonants, 17 vowels and 7 numbers) in 17 subjects. E-ANN was categorized according to the number of classifiers (1 to 10) and size of training data (50 to 1500). The accuracy of the E-ANN-based classifier was obtained by 5-fold cross validation and compared with an artificial neural network (ANN)-based classifier. As the number of classifiers (1 to 8) and size of training data (50 to 300) increased, the average accuracy of the E-ANN-based classifier increased and the standard deviation decreased. The optimal E-ANN was composed with eight classifiers and 300 size of training data, and the accuracy of the E-ANN was significantly higher than that of the general ANN.

  18. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

    Directory of Open Access Journals (Sweden)

    Cuicui Zhang

    2014-12-01

    Full Text Available Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1 how to define diverse base classifiers from the small data; (2 how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  19. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

    Science.gov (United States)

    Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

    2014-12-08

    Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  20. Application of NARR-based NLDAS Ensemble Simulations to Continental-Scale Drought Monitoring

    Science.gov (United States)

    Alonge, C. J.; Cosgrove, B. A.

    2008-05-01

    Government estimates indicate that droughts cause billions of dollars of damage to agricultural interests each year. More effective identification of droughts would directly benefit decision makers, and would allow for the more efficient allocation of resources that might mitigate the event. Land data assimilation systems, with their high quality representations of soil moisture, present an ideal platform for drought monitoring, and offer many advantages over traditional modeling systems. The recently released North American Regional Reanalysis (NARR) covers the NLDAS domain and provides all fields necessary to force the NLDAS for 27 years. This presents an ideal opportunity to combine NARR and NLDAS resources into an effective real-time drought monitor. Toward this end, our project seeks to validate and explore the NARR's suitability as a base for drought monitoring applications - both in terms of data set length and accuracy. Along the same lines, the project will examine the impact of the use of different (longer) LDAS model climatologies on drought monitoring, and will explore the advantages of ensemble simulations versus single model simulations in drought monitoring activities. We also plan to produce a NARR- and observation-based high quality 27 year, 1/8th degree, 3-hourly, land surface and meteorological forcing data sets. An investigation of the best way to force an LDAS-type system will also be made, with traditional NLDAS and NLDASE forcing options explored. This presentation will focus on an overview of the drought monitoring project, and will include a summary of recent progress. Developments include the generation of forcing data sets, ensemble LSM output, and production of model-based drought indices over the entire NLDAS domain. Project forcing files use 32km NARR model output as a data backbone, and include observed precipitation (blended CPC gauge, PRISM gauge, Stage II, HPD, and CMORPH) and a GOES-based bias correction of downward solar

  1. A Classification-based Review Recommender

    Science.gov (United States)

    O'Mahony, Michael P.; Smyth, Barry

    Many online stores encourage their users to submit product/service reviews in order to guide future purchasing decisions. These reviews are often listed alongside product recommendations but, to date, limited attention has been paid as to how best to present these reviews to the end-user. In this paper, we describe a supervised classification approach that is designed to identify and recommend the most helpful product reviews. Using the TripAdvisor service as a case study, we compare the performance of several classification techniques using a range of features derived from hotel reviews. We then describe how these classifiers can be used as the basis for a practical recommender that automatically suggests the mosthelpful contrasting reviews to end-users. We present an empirical evaluation which shows that our approach achieves a statistically significant improvement over alternative review ranking schemes.

  2. Assessment of robustness and significance of climate change signals for an ensemble of distribution-based scaled climate projections

    DEFF Research Database (Denmark)

    Seaby, Lauren Paige; Refsgaard, J.C.; Sonnenborg, T.O.

    2013-01-01

    An ensemble of 11 regional climate model (RCM) projections are analysed for Denmark from a hydrological modelling inputs perspective. Two bias correction approaches are applied: a relatively simple monthly delta change (DC) method and a more complex daily distribution-based scaling (DBS) method...

  3. Text document classification based on mixture models

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Malík, Antonín

    2004-01-01

    Roč. 40, č. 3 (2004), s. 293-304 ISSN 0023-5954 R&D Projects: GA AV ČR IAA2075302; GA ČR GA102/03/0049; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : text classification * text categorization * multinomial mixture model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004

  4. Classification

    Science.gov (United States)

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  5. TENSOR MODELING BASED FOR AIRBORNE LiDAR DATA CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    N. Li

    2016-06-01

    Full Text Available Feature selection and description is a key factor in classification of Earth observation data. In this paper a classification method based on tensor decomposition is proposed. First, multiple features are extracted from raw LiDAR point cloud, and raster LiDAR images are derived by accumulating features or the “raw” data attributes. Then, the feature rasters of LiDAR data are stored as a tensor, and tensor decomposition is used to select component features. This tensor representation could keep the initial spatial structure and insure the consideration of the neighborhood. Based on a small number of component features a k nearest neighborhood classification is applied.

  6. Uncertainty analysis of neural network based flood forecasting models: An ensemble based approach for constructing prediction interval

    Science.gov (United States)

    Kasiviswanathan, K.; Sudheer, K.

    2013-05-01

    Artificial neural network (ANN) based hydrologic models have gained lot of attention among water resources engineers and scientists, owing to their potential for accurate prediction of flood flows as compared to conceptual or physics based hydrologic models. The ANN approximates the non-linear functional relationship between the complex hydrologic variables in arriving at the river flow forecast values. Despite a large number of applications, there is still some criticism that ANN's point prediction lacks in reliability since the uncertainty of predictions are not quantified, and it limits its use in practical applications. A major concern in application of traditional uncertainty analysis techniques on neural network framework is its parallel computing architecture with large degrees of freedom, which makes the uncertainty assessment a challenging task. Very limited studies have considered assessment of predictive uncertainty of ANN based hydrologic models. In this study, a novel method is proposed that help construct the prediction interval of ANN flood forecasting model during calibration itself. The method is designed to have two stages of optimization during calibration: at stage 1, the ANN model is trained with genetic algorithm (GA) to obtain optimal set of weights and biases vector, and during stage 2, the optimal variability of ANN parameters (obtained in stage 1) is identified so as to create an ensemble of predictions. During the 2nd stage, the optimization is performed with multiple objectives, (i) minimum residual variance for the ensemble mean, (ii) maximum measured data points to fall within the estimated prediction interval and (iii) minimum width of prediction interval. The method is illustrated using a real world case study of an Indian basin. The method was able to produce an ensemble that has an average prediction interval width of 23.03 m3/s, with 97.17% of the total validation data points (measured) lying within the interval. The derived

  7. Ensemble Manifold Rank Preserving for Acceleration-Based Human Activity Recognition.

    Science.gov (United States)

    Tao, Dapeng; Jin, Lianwen; Yuan, Yuan; Xue, Yang

    2016-06-01

    With the rapid development of mobile devices and pervasive computing technologies, acceleration-based human activity recognition, a difficult yet essential problem in mobile apps, has received intensive attention recently. Different acceleration signals for representing different activities or even a same activity have different attributes, which causes troubles in normalizing the signals. We thus cannot directly compare these signals with each other, because they are embedded in a nonmetric space. Therefore, we present a nonmetric scheme that retains discriminative and robust frequency domain information by developing a novel ensemble manifold rank preserving (EMRP) algorithm. EMRP simultaneously considers three aspects: 1) it encodes the local geometry using the ranking order information of intraclass samples distributed on local patches; 2) it keeps the discriminative information by maximizing the margin between samples of different classes; and 3) it finds the optimal linear combination of the alignment matrices to approximate the intrinsic manifold lied in the data. Experiments are conducted on the South China University of Technology naturalistic 3-D acceleration-based activity dataset and the naturalistic mobile-devices based human activity dataset to demonstrate the robustness and effectiveness of the new nonmetric scheme for acceleration-based human activity recognition.

  8. Iris Image Classification Based on Hierarchical Visual Codebook.

    Science.gov (United States)

    Zhenan Sun; Hui Zhang; Tieniu Tan; Jianyu Wang

    2014-06-01

    Iris recognition as a reliable method for personal identification has been well-studied with the objective to assign the class label of each iris image to a unique subject. In contrast, iris image classification aims to classify an iris image to an application specific category, e.g., iris liveness detection (classification of genuine and fake iris images), race classification (e.g., classification of iris images of Asian and non-Asian subjects), coarse-to-fine iris identification (classification of all iris images in the central database into multiple categories). This paper proposes a general framework for iris image classification based on texture analysis. A novel texture pattern representation method called Hierarchical Visual Codebook (HVC) is proposed to encode the texture primitives of iris images. The proposed HVC method is an integration of two existing Bag-of-Words models, namely Vocabulary Tree (VT), and Locality-constrained Linear Coding (LLC). The HVC adopts a coarse-to-fine visual coding strategy and takes advantages of both VT and LLC for accurate and sparse representation of iris texture. Extensive experimental results demonstrate that the proposed iris image classification method achieves state-of-the-art performance for iris liveness detection, race classification, and coarse-to-fine iris identification. A comprehensive fake iris image database simulating four types of iris spoof attacks is developed as the benchmark for research of iris liveness detection.

  9. A multi-model ensemble approach to seabed mapping

    Science.gov (United States)

    Diesing, Markus; Stephens, David

    2015-06-01

    Seabed habitat mapping based on swath acoustic data and ground-truth samples is an emergent and active marine science discipline. Significant progress could be achieved by transferring techniques and approaches that have been successfully developed and employed in such fields as terrestrial land cover mapping. One such promising approach is the multiple classifier system, which aims at improving classification performance by combining the outputs of several classifiers. Here we present results of a multi-model ensemble applied to multibeam acoustic data covering more than 5000 km2 of seabed in the North Sea with the aim to derive accurate spatial predictions of seabed substrate. A suite of six machine learning classifiers (k-Nearest Neighbour, Support Vector Machine, Classification Tree, Random Forest, Neural Network and Naïve Bayes) was trained with ground-truth sample data classified into seabed substrate classes and their prediction accuracy was assessed with an independent set of samples. The three and five best performing models were combined to classifier ensembles. Both ensembles led to increased prediction accuracy as compared to the best performing single classifier. The improvements were however not statistically significant at the 5% level. Although the three-model ensemble did not perform significantly better than its individual component models, we noticed that the five-model ensemble did perform significantly better than three of the five component models. A classifier ensemble might therefore be an effective strategy to improve classification performance. Another advantage is the fact that the agreement in predicted substrate class between the individual models of the ensemble could be used as a measure of confidence. We propose a simple and spatially explicit measure of confidence that is based on model agreement and prediction accuracy.

  10. SVM and SVM Ensembles in Breast Cancer Prediction.

    Science.gov (United States)

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  11. SVM and SVM Ensembles in Breast Cancer Prediction.

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    Full Text Available Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  12. Predictor-Year Subspace Clustering Based Ensemble Prediction of Indian Summer Monsoon

    Directory of Open Access Journals (Sweden)

    Moumita Saha

    2016-01-01

    Full Text Available Forecasting the Indian summer monsoon is a challenging task due to its complex and nonlinear behavior. A large number of global climatic variables with varying interaction patterns over years influence monsoon. Various statistical and neural prediction models have been proposed for forecasting monsoon, but many of them fail to capture variability over years. The skill of predictor variables of monsoon also evolves over time. In this article, we propose a joint-clustering of monsoon years and predictors for understanding and predicting the monsoon. This is achieved by subspace clustering algorithm. It groups the years based on prevailing global climatic condition using statistical clustering technique and subsequently for each such group it identifies significant climatic predictor variables which assist in better prediction. Prediction model is designed to frame individual cluster using random forest of regression tree. Prediction of aggregate and regional monsoon is attempted. Mean absolute error of 5.2% is obtained for forecasting aggregate Indian summer monsoon. Errors in predicting the regional monsoons are also comparable in comparison to the high variation of regional precipitation. Proposed joint-clustering based ensemble model is observed to be superior to existing monsoon prediction models and it also surpasses general nonclustering based prediction models.

  13. Investigating properties of the cardiovascular system using innovative analysis algorithms based on ensemble empirical mode decomposition.

    Science.gov (United States)

    Yeh, Jia-Rong; Lin, Tzu-Yu; Chen, Yun; Sun, Wei-Zen; Abbod, Maysam F; Shieh, Jiann-Shing

    2012-01-01

    Cardiovascular system is known to be nonlinear and nonstationary. Traditional linear assessments algorithms of arterial stiffness and systemic resistance of cardiac system accompany the problem of nonstationary or inconvenience in practical applications. In this pilot study, two new assessment methods were developed: the first is ensemble empirical mode decomposition based reflection index (EEMD-RI) while the second is based on the phase shift between ECG and BP on cardiac oscillation. Both methods utilise the EEMD algorithm which is suitable for nonlinear and nonstationary systems. These methods were used to investigate the properties of arterial stiffness and systemic resistance for a pig's cardiovascular system via ECG and blood pressure (BP). This experiment simulated a sequence of continuous changes of blood pressure arising from steady condition to high blood pressure by clamping the artery and an inverse by relaxing the artery. As a hypothesis, the arterial stiffness and systemic resistance should vary with the blood pressure due to clamping and relaxing the artery. The results show statistically significant correlations between BP, EEMD-based RI, and the phase shift between ECG and BP on cardiac oscillation. The two assessments results demonstrate the merits of the EEMD for signal analysis.

  14. Identifying climate analogues for precipitation extremes for Denmark based on RCM simulations from the ENSEMBLES database.

    Science.gov (United States)

    Arnbjerg-Nielsen, K; Funder, S G; Madsen, H

    2015-01-01

    Climate analogues, also denoted Space-For-Time, may be used to identify regions where the present climatic conditions resemble conditions of a past or future state of another location or region based on robust climate variable statistics in combination with projections of how these statistics change over time. The study focuses on assessing climate analogues for Denmark based on current climate data set (E-OBS) observations as well as the ENSEMBLES database of future climates with the aim of projecting future precipitation extremes. The local present precipitation extremes are assessed by means of intensity-duration-frequency curves for urban drainage design for the relevant locations being France, the Netherlands, Belgium, Germany, the United Kingdom, and Denmark. Based on this approach projected increases of extreme precipitation by 2100 of 9 and 21% are expected for 2 and 10 year return periods, respectively. The results should be interpreted with caution as the best region to represent future conditions for Denmark is the coastal areas of Northern France, for which only little information is available with respect to present precipitation extremes.

  15. Classification

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2017-01-01

    This article presents and discusses definitions of the term “classification” and the related concepts “Concept/conceptualization,”“categorization,” “ordering,” “taxonomy” and “typology.” It further presents and discusses theories of classification including the influences of Aristotle...... and Wittgenstein. It presents different views on forming classes, including logical division, numerical taxonomy, historical classification, hermeneutical and pragmatic/critical views. Finally, issues related to artificial versus natural classification and taxonomic monism versus taxonomic pluralism are briefly...

  16. A study on reducing update frequency of the forecast samples in the ensemble-based 4DVar data assimilation method

    Energy Technology Data Exchange (ETDEWEB)

    Shao, Aimei; Xu, Daosheng [Lanzhou Univ. (China). Key Lab. of Arid Climatic Changing and Reducing Disaster of Gansu Province; Chinese Academy of Meteorological Sciences, Beijing (China). State Key Lab. of Severe Weather; Qiu, Xiaobin [Lanzhou Univ. (China). Key Lab. of Arid Climatic Changing and Reducing Disaster of Gansu Province; Tianjin Institute of Meteorological Science (China); Qiu, Chongjian [Lanzhou Univ. (China). Key Lab. of Arid Climatic Changing and Reducing Disaster of Gansu Province

    2013-02-15

    In the ensemble-based four dimensional variational assimilation method (SVD-En4DVar), a singular value decomposition (SVD) technique is used to select the leading eigenvectors and the analysis variables are expressed as the orthogonal bases expansion of the eigenvectors. The experiments with a two-dimensional shallow-water equation model and simulated observations show that the truncation error and rejection of observed signals due to the reduced-dimensional reconstruction of the analysis variable are the major factors that damage the analysis when the ensemble size is not large enough. However, a larger-sized ensemble is daunting computational burden. Experiments with a shallow-water equation model also show that the forecast error covariances remain relatively constant over time. For that reason, we propose an approach that increases the members of the forecast ensemble while reducing the update frequency of the forecast error covariance in order to increase analysis accuracy and to reduce the computational cost. A series of experiments were conducted with the shallow-water equation model to test the efficiency of this approach. The experimental results indicate that this approach is promising. Further experiments with the WRF model show that this approach is also suitable for the real atmospheric data assimilation problem, but the update frequency of the forecast error covariances should not be too low. (orig.)

  17. Ebolavirus Classification Based on Natural Vectors

    Science.gov (United States)

    Zheng, Hui; Yin, Changchuan; Hoang, Tung; He, Rong Lucy; Yang, Jie

    2015-01-01

    According to the WHO, ebolaviruses have resulted in 8818 human deaths in West Africa as of January 2015. To better understand the evolutionary relationship of the ebolaviruses and infer virulence from the relationship, we applied the alignment-free natural vector method to classify the newest ebolaviruses. The dataset includes three new Guinea viruses as well as 99 viruses from Sierra Leone. For the viruses of the family of Filoviridae, both genus label classification and species label classification achieve an accuracy rate of 100%. We represented the relationships among Filoviridae viruses by Unweighted Pair Group Method with Arithmetic Mean (UPGMA) phylogenetic trees and found that the filoviruses can be separated well by three genera. We performed the phylogenetic analysis on the relationship among different species of Ebolavirus by their coding-complete genomes and seven viral protein genes (glycoprotein [GP], nucleoprotein [NP], VP24, VP30, VP35, VP40, and RNA polymerase [L]). The topology of the phylogenetic tree by the viral protein VP24 shows consistency with the variations of virulence of ebolaviruses. The result suggests that VP24 be a pharmaceutical target for treating or preventing ebolaviruses. PMID:25803489

  18. An Integrated Scenario Ensemble-Based Framework for Hurricane Evacuation Modeling: Part 1-Decision Support System.

    Science.gov (United States)

    Davidson, Rachel A; Nozick, Linda K; Wachtendorf, Tricia; Blanton, Brian; Colle, Brian; Kolar, Randall L; DeYoung, Sarah; Dresback, Kendra M; Yi, Wenqi; Yang, Kun; Leonardo, Nicholas

    2018-03-30

    This article introduces a new integrated scenario-based evacuation (ISE) framework to support hurricane evacuation decision making. It explicitly captures the dynamics, uncertainty, and human-natural system interactions that are fundamental to the challenge of hurricane evacuation, but have not been fully captured in previous formal evacuation models. The hazard is represented with an ensemble of probabilistic scenarios, population behavior with a dynamic decision model, and traffic with a dynamic user equilibrium model. The components are integrated in a multistage stochastic programming model that minimizes risk and travel times to provide a tree of evacuation order recommendations and an evaluation of the risk and travel time performance for that solution. The ISE framework recommendations offer an advance in the state of the art because they: (1) are based on an integrated hazard assessment (designed to ultimately include inland flooding), (2) explicitly balance the sometimes competing objectives of minimizing risk and minimizing travel time, (3) offer a well-hedged solution that is robust under the range of ways the hurricane might evolve, and (4) leverage the substantial value of increasing information (or decreasing degree of uncertainty) over the course of a hurricane event. A case study for Hurricane Isabel (2003) in eastern North Carolina is presented to demonstrate how the framework is applied, the type of results it can provide, and how it compares to available methods of a single scenario deterministic analysis and a two-stage stochastic program. © 2018 Society for Risk Analysis.

  19. A DDoS Attack Detection Method Based on Hybrid Heterogeneous Multiclassifier Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Bin Jia

    2017-01-01

    Full Text Available The explosive growth of network traffic and its multitype on Internet have brought new and severe challenges to DDoS attack detection. To get the higher True Negative Rate (TNR, accuracy, and precision and to guarantee the robustness, stability, and universality of detection system, in this paper, we propose a DDoS attack detection method based on hybrid heterogeneous multiclassifier ensemble learning and design a heuristic detection algorithm based on Singular Value Decomposition (SVD to construct our detection system. Experimental results show that our detection method is excellent in TNR, accuracy, and precision. Therefore, our algorithm has good detective performance for DDoS attack. Through the comparisons with Random Forest, k-Nearest Neighbor (k-NN, and Bagging comprising the component classifiers when the three algorithms are used alone by SVD and by un-SVD, it is shown that our model is superior to the state-of-the-art attack detection techniques in system generalization ability, detection stability, and overall detection performance.

  20. Determination of knock characteristics in spark ignition engines: an approach based on ensemble empirical mode decomposition

    International Nuclear Information System (INIS)

    Li, Ning; Liang, Caiping; Yang, Jianguo; Zhou, Rui

    2016-01-01

    Knock is one of the major constraints to improve the performance and thermal efficiency of spark ignition (SI) engines. It can also result in severe permanent engine damage under certain operating conditions. Based on the ensemble empirical mode decomposition (EEMD), this paper proposes a new approach to determine the knock characteristics in SI engines. By adding a uniformly distributed and finite white Gaussian noise, the EEMD can preserve signal continuity in different scales and therefore alleviates the mode-mixing problem occurring in the classic empirical mode decomposition (EMD). The feasibilities of applying the EEMD to detect the knock signatures of a test SI engine via the pressure signal measured from combustion chamber and the vibration signal measured from cylinder head are investigated. Experimental results show that the EEMD-based method is able to detect the knock signatures from both the pressure signal and vibration signal, even in initial stage of knock. Finally, by comparing the application results with those obtained by short-time Fourier transform (STFT), Wigner–Ville distribution (WVD) and discrete wavelet transform (DWT), the superiority of the EEMD method in determining knock characteristics is demonstrated. (paper)

  1. A rapid, ensemble and free energy based method for engineering protein stabilities.

    Science.gov (United States)

    Naganathan, Athi N

    2013-05-02

    Engineering the conformational stabilities of proteins through mutations has immense potential in biotechnological applications. It is, however, an inherently challenging problem given the weak noncovalent nature of the stabilizing interactions. In this regard, we present here a robust and fast strategy to engineer protein stabilities through mutations involving charged residues using a structure-based statistical mechanical model that accounts for the ensemble nature of folding. We validate the method by predicting the absolute changes in stability for 138 experimental mutations from 16 different proteins and enzymes with a correlation of 0.65 and importantly with a success rate of 81%. Multiple point mutants are predicted with a higher success rate (90%) that is validated further by comparing meosphile-thermophile protein pairs. In parallel, we devise a methodology to rapidly engineer mutations in silico which we benchmark against experimental mutations of ubiquitin (correlation of 0.95) and check for its feasibility on a larger therapeutic protein DNase I. We expect the method to be of importance as a first and rapid step to screen for protein mutants with specific stability in the biotechnology industry, in the construction of stability maps at the residue level (i.e., hot spots), and as a robust tool to probe for mutations that enhance the stability of protein-based drugs.

  2. Hot complaint intelligent classification based on text mining

    Directory of Open Access Journals (Sweden)

    XIA Haifeng

    2013-10-01

    Full Text Available The complaint recognizer system plays an important role in making sure the correct classification of the hot complaint,improving the service quantity of telecommunications industry.The customers’ complaint in telecommunications industry has its special particularity which should be done in limited time,which cause the error in classification of hot complaint.The paper presents a model of complaint hot intelligent classification based on text mining,which can classify the hot complaint in the correct level of the complaint navigation.The examples show that the model can be efficient to classify the text of the complaint.

  3. Radar Target Classification using Recursive Knowledge-Based Methods

    DEFF Research Database (Denmark)

    Jochumsen, Lars Wurtz

    The topic of this thesis is target classification of radar tracks from a 2D mechanically scanning coastal surveillance radar. The measurements provided by the radar are position data and therefore the classification is mainly based on kinematic data, which is deduced from the position. The target...... been terminated. Therefore, an update of the classification results must be made for each measurement of the target. The data for this work are collected throughout the PhD and are both collected from radars and other sensors such as GPS....

  4. Key-phrase based classification of public health web pages.

    Science.gov (United States)

    Dolamic, Ljiljana; Boyer, Célia

    2013-01-01

    This paper describes and evaluates the public health web pages classification model based on key phrase extraction and matching. Easily extendible both in terms of new classes as well as the new language this method proves to be a good solution for text classification faced with the total lack of training data. To evaluate the proposed solution we have used a small collection of public health related web pages created by a double blind manual classification. Our experiments have shown that by choosing the adequate threshold value the desired value for either precision or recall can be achieved.

  5. A Region-Based GeneSIS Segmentation Algorithm for the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    Stelios K. Mylonas

    2015-03-01

    Full Text Available This paper proposes an object-based segmentation/classification scheme for remotely sensed images, based on a novel variant of the recently proposed Genetic Sequential Image Segmentation (GeneSIS algorithm. GeneSIS segments the image in an iterative manner, whereby at each iteration a single object is extracted via a genetic-based object extraction algorithm. Contrary to the previous pixel-based GeneSIS where the candidate objects to be extracted were evaluated through the fuzzy content of their included pixels, in the newly developed region-based GeneSIS algorithm, a watershed-driven fine segmentation map is initially obtained from the original image, which serves as the basis for the forthcoming GeneSIS segmentation. Furthermore, in order to enhance the spatial search capabilities, we introduce a more descriptive encoding scheme in the object extraction algorithm, where the structural search modules are represented by polygonal shapes. Our objectives in the new framework are posed as follows: enhance the flexibility of the algorithm in extracting more flexible object shapes, assure high level classification accuracies, and reduce the execution time of the segmentation, while at the same time preserving all the inherent attributes of the GeneSIS approach. Finally, exploiting the inherent attribute of GeneSIS to produce multiple segmentations, we also propose two segmentation fusion schemes that operate on the ensemble of segmentations generated by GeneSIS. Our approaches are tested on an urban and two agricultural images. The results show that region-based GeneSIS has considerably lower computational demands compared to the pixel-based one. Furthermore, the suggested methods achieve higher classification accuracies and good segmentation maps compared to a series of existing algorithms.

  6. Ensembl 2004.

    Science.gov (United States)

    Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

    2004-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.

  7. Ensemble Sampling

    OpenAIRE

    Lu, Xiuyuan; Van Roy, Benjamin

    2017-01-01

    Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applica...

  8. Object-Based Change Detection in Urban Areas from High Spatial Resolution Images Based on Multiple Features and Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Xin Wang

    2018-02-01

    Full Text Available To improve the accuracy of change detection in urban areas using bi-temporal high-resolution remote sensing images, a novel object-based change detection scheme combining multiple features and ensemble learning is proposed in this paper. Image segmentation is conducted to determine the objects in bi-temporal images separately. Subsequently, three kinds of object features, i.e., spectral, shape and texture, are extracted. Using the image differencing process, a difference image is generated and used as the input for nonlinear supervised classifiers, including k-nearest neighbor, support vector machine, extreme learning machine and random forest. Finally, the results of multiple classifiers are integrated using an ensemble rule called weighted voting to generate the final change detection result. Experimental results of two pairs of real high-resolution remote sensing datasets demonstrate that the proposed approach outperforms the traditional methods in terms of overall accuracy and generates change detection maps with a higher number of homogeneous regions in urban areas. Moreover, the influences of segmentation scale and the feature selection strategy on the change detection performance are also analyzed and discussed.

  9. Development of web-based services for an ensemble flood forecasting and risk assessment system

    Science.gov (United States)

    Yaw Manful, Desmond; He, Yi; Cloke, Hannah; Pappenberger, Florian; Li, Zhijia; Wetterhall, Fredrik; Huang, Yingchun; Hu, Yuzhong

    2010-05-01

    Flooding is a wide spread and devastating natural disaster worldwide. Floods that took place in the last decade in China were ranked the worst amongst recorded floods worldwide in terms of the number of human fatalities and economic losses (Munich Re-Insurance). Rapid economic development and population expansion into low lying flood plains has worsened the situation. Current conventional flood prediction systems in China are neither suited to the perceptible climate variability nor the rapid pace of urbanization sweeping the country. Flood prediction, from short-term (a few hours) to medium-term (a few days), needs to be revisited and adapted to changing socio-economic and hydro-climatic realities. The latest technology requires implementation of multiple numerical weather prediction systems. The availability of twelve global ensemble weather prediction systems through the ‘THORPEX Interactive Grand Global Ensemble' (TIGGE) offers a good opportunity for an effective state-of-the-art early forecasting system. A prototype of a Novel Flood Early Warning System (NEWS) using the TIGGE database is tested in the Huai River basin in east-central China. It is the first early flood warning system in China that uses the massive TIGGE database cascaded with river catchment models, the Xinanjiang hydrologic model and a 1-D hydraulic model, to predict river discharge and flood inundation. The NEWS algorithm is also designed to provide web-based services to a broad spectrum of end-users. The latter presents challenges as both databases and proprietary codes reside in different locations and converge at dissimilar times. NEWS will thus make use of a ready-to-run grid system that makes distributed computing and data resources available in a seamless and secure way. An ability to run or function on different operating systems and provide an interface or front that is accessible to broad spectrum of end-users is additional requirement. The aim is to achieve robust interoperability

  10. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  11. Predicting Hepatotoxicity of Drug Metabolites Via an Ensemble Approach Based on Support Vector Machine

    Science.gov (United States)

    Lu, Yin; Liu, Lili; Lu, Dong; Cai, Yudong; Zheng, Mingyue; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2017-11-20

    Drug-induced liver injury (DILI) is a major cause of drug withdrawal. The chemical properties of the drug, especially drug metabolites, play key roles in DILI. Our goal is to construct a QSAR model to predict drug hepatotoxicity based on drug metabolites. 64 hepatotoxic drug metabolites and 3,339 non-hepatotoxic drug metabolites were gathered from MDL Metabolite Database. Considering the imbalance of the dataset, we randomly split the negative samples and combined each portion with all the positive samples to construct individually balanced datasets for constructing independent classifiers. Then, we adopted an ensemble approach to make prediction based on the results of all individual classifiers and applied the minimum Redundancy Maximum Relevance (mRMR) feature selection method to select the molecular descriptors. Eventually, for the drugs in the external test set, a Bayesian inference method was used to predict the hepatotoxicity of a drug based on its metabolites. The model showed the average balanced accuracy=78.47%, sensitivity =74.17%, and specificity=82.77%. Five molecular descriptors characterizing molecular polarity, intramolecular bonding strength, and molecular frontier orbital energy were obtained. When predicting the hepatotoxicity of a drug based on all its metabolites, the sensitivity, specificity and balanced accuracy were 60.38%, 70.00%, and 65.19%, respectively, indicating that this method is useful for identifying the hepatotoxicity of drugs. We developed an in silico model to predict hepatotoxicity of drug metabolites. Moreover, Bayesian inference was applied to predict the hepatotoxicity of a drug based on its metabolites which brought out valuable high sensitivity and specificity. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Exploiting ensemble learning for automatic cataract detection and grading.

    Science.gov (United States)

    Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing

    2016-02-01

    Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  13. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains

    Directory of Open Access Journals (Sweden)

    Eils Roland

    2006-06-01

    Full Text Available Abstract Background The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. Results A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. Conclusion This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  14. Neural Network Ensemble Based Approach for 2D-Interval Prediction of Solar Photovoltaic Power

    Directory of Open Access Journals (Sweden)

    Mashud Rana

    2016-10-01

    Full Text Available Solar energy generated from PhotoVoltaic (PV systems is one of the most promising types of renewable energy. However, it is highly variable as it depends on the solar irradiance and other meteorological factors. This variability creates difficulties for the large-scale integration of PV power in the electricity grid and requires accurate forecasting of the electricity generated by PV systems. In this paper we consider 2D-interval forecasts, where the goal is to predict summary statistics for the distribution of the PV power values in a future time interval. 2D-interval forecasts have been recently introduced, and they are more suitable than point forecasts for applications where the predicted variable has a high variability. We propose a method called NNE2D that combines variable selection based on mutual information and an ensemble of neural networks, to compute 2D-interval forecasts, where the two interval boundaries are expressed in terms of percentiles. NNE2D was evaluated for univariate prediction of Australian solar PV power data for two years. The results show that it is a promising method, outperforming persistence baselines and other methods used for comparison in terms of accuracy and coverage probability.

  15. A CN-Based Ensembled Hydrological Model for Enhanced Watershed Runoff Prediction

    Directory of Open Access Journals (Sweden)

    Muhammad Ajmal

    2016-01-01

    Full Text Available A major structural inconsistency of the traditional curve number (CN model is its dependence on an unstable fixed initial abstraction, which normally results in sudden jumps in runoff estimation. Likewise, the lack of pre-storm soil moisture accounting (PSMA procedure is another inherent limitation of the model. To circumvent those problems, we used a variable initial abstraction after ensembling the traditional CN model and a French four-parameter (GR4J model to better quantify direct runoff from ungauged watersheds. To mimic the natural rainfall-runoff transformation at the watershed scale, our new parameterization designates intrinsic parameters and uses a simple structure. It exhibited more accurate and consistent results than earlier methods in evaluating data from 39 forest-dominated watersheds, both for small and large watersheds. In addition, based on different performance evaluation indicators, the runoff reproduction results show that the proposed model produced more consistent results for dry, normal, and wet watershed conditions than the other models used in this study.

  16. Analyzing the uncertainty of ensemble-based gridded observations in land surface simulations and drought assessment

    Science.gov (United States)

    Ahmadalipour, Ali; Moradkhani, Hamid

    2017-12-01

    Hydrologic modeling is one of the primary tools utilized for drought monitoring and drought early warning systems. Several sources of uncertainty in hydrologic modeling have been addressed in the literature. However, few studies have assessed the uncertainty of gridded observation datasets from a drought monitoring perspective. This study provides a hydrologic modeling oriented analysis of the gridded observation data uncertainties over the Pacific Northwest (PNW) and its implications on drought assessment. We utilized a recently developed 100-member ensemble-based observed forcing data to simulate hydrologic fluxes at 1/8° spatial resolution using Variable Infiltration Capacity (VIC) model, and compared the results with a deterministic observation. Meteorological and hydrological droughts are studied at multiple timescales over the basin, and seasonal long-term trends and variations of drought extent is investigated for each case. Results reveal large uncertainty of observed datasets at monthly timescale, with systematic differences for temperature records, mainly due to different lapse rates. The uncertainty eventuates in large disparities of drought characteristics. In general, an increasing trend is found for winter drought extent across the PNW. Furthermore, a ∼3% decrease per decade is detected for snow water equivalent (SWE) over the PNW, with the region being more susceptible to SWE variations of the northern Rockies than the western Cascades. The agricultural areas of southern Idaho demonstrate decreasing trend of natural soil moisture as a result of precipitation decline, which implies higher appeal for anthropogenic water storage and irrigation systems.

  17. An ensemble deep learning based approach for red lesion detection in fundus images.

    Science.gov (United States)

    Orlando, José Ignacio; Prokofyeva, Elena; Del Fresno, Mariana; Blaschko, Matthew B

    2018-01-01

    Diabetic retinopathy (DR) is one of the leading causes of preventable blindness in the world. Its earliest sign are red lesions, a general term that groups both microaneurysms (MAs) and hemorrhages (HEs). In daily clinical practice, these lesions are manually detected by physicians using fundus photographs. However, this task is tedious and time consuming, and requires an intensive effort due to the small size of the lesions and their lack of contrast. Computer-assisted diagnosis of DR based on red lesion detection is being actively explored due to its improvement effects both in clinicians consistency and accuracy. Moreover, it provides comprehensive feedback that is easy to assess by the physicians. Several methods for detecting red lesions have been proposed in the literature, most of them based on characterizing lesion candidates using hand crafted features, and classifying them into true or false positive detections. Deep learning based approaches, by contrast, are scarce in this domain due to the high expense of annotating the lesions manually. In this paper we propose a novel method for red lesion detection based on combining both deep learned and domain knowledge. Features learned by a convolutional neural network (CNN) are augmented by incorporating hand crafted features. Such ensemble vector of descriptors is used afterwards to identify true lesion candidates using a Random Forest classifier. We empirically observed that combining both sources of information significantly improve results with respect to using each approach separately. Furthermore, our method reported the highest performance on a per-lesion basis on DIARETDB1 and e-ophtha, and for screening and need for referral on MESSIDOR compared to a second human expert. Results highlight the fact that integrating manually engineered approaches with deep learned features is relevant to improve results when the networks are trained from lesion-level annotated data. An open source implementation of our

  18. Classification of suppressor additives based on synergistic and antagonistic ensemble effects

    Energy Technology Data Exchange (ETDEWEB)

    Broekmann, P., E-mail: peter.broekmann@iac.unibe.ch [BASF SE, Global Business Unit Electronic Materials, 67056 Ludwigshafen (Germany); Department of Chemistry and Biochemistry, University of Bern, Bern (Switzerland); Fluegel, A.; Emnet, C.; Arnold, M.; Roeger-Goepfert, C.; Wagner, A. [BASF SE, Global Business Unit Electronic Materials, 67056 Ludwigshafen (Germany); Hai, N.T.M. [Department of Chemistry and Biochemistry, University of Bern, Bern (Switzerland); Mayer, D. [BASF SE, Global Business Unit Electronic Materials, 67056 Ludwigshafen (Germany)

    2011-05-01

    Highlights: > Three fundamental types of suppressor additives for copper electroplating could be identified by means of potential transient measurements. > These suppressor additives differ in their synergistic and antagonistic interplay with anions that are chemisorbed on the metallic copper surface during electrodeposition. > In addition these suppressor chemistries reveal different barrier properties with respect to cupric ions and plating additives (Cl, SPS). - Abstract: Three fundamental types of suppressor additives for copper electroplating could be identified by means of potential transient measurements. These suppressor additives differ in their synergistic and antagonistic interplay with anions that are chemisorbed on the metallic copper surface during electrodeposition. In addition these suppressor chemistries reveal different barrier properties with respect to cupric ions and plating additives (Cl, SPS). While the type-I suppressor selectively forms efficient barriers for copper inter-diffusion on chloride-terminated electrode surfaces we identified a type-II suppressor that interacts non-selectively with any kind of anions chemisorbed on copper (chloride, sulfate, sulfonate). Type-I suppressors are vital for the superconformal copper growth mode in Damascene processing and show an antagonistic interaction with SPS (Bis-Sodium-Sulfopropyl-Disulfide) which involves the deactivation of this suppressor chemistry. This suppressor deactivation is rationalized in terms of compositional changes in the layer of the chemisorbed anions due to the competition of chloride and MPS (Mercaptopropane Sulfonic Acid) for adsorption sites on the metallic copper surface. MPS is the product of the dissociative SPS adsorption within the preexisting chloride matrix on the copper surface. The non-selectivity in the adsorption behavior of the type-II suppressor is rationalized in terms of anion/cation pairing effects of the poly-cationic suppressor and the anion-modified copper substrate. Atomic-scale insights into the competitive Cl/MPS adsorption are gained from in situ STM (Scanning Tunneling Microscopy) using single crystalline copper surfaces as model substrates. Type-III suppressors are a third class of suppressors. In case of type-I and type-II suppressor chemistries the resulting steady-state deposition conditions are completely independent on the particular succession of additive adsorption. In contrast to that a strong dependence of the suppressing capabilities on the sequence of additive adsorption ('first comes, first serves' principle) is observed for the type-III suppressor. This behavior is explained by a suppressor barrier that impedes not only the copper inter-diffusion but also the transport of other additives (e.g. SPS) to the copper surface.

  19. An Integrated Pruning Criterion for Ensemble Learning Based on Classification Accuracy and Diversity

    DEFF Research Database (Denmark)

    Fu, Bin; Wang, Zhihai; Pan, Rong

    2013-01-01

    be further considered while designing a pruning criterion is presented, and then an effective definition of diversity is proposed. The experimental results have validated that the given pruning criterion could single out the subset of classifiers that show better performance in the process of hill...

  20. A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    P.Balaji

    2015-01-01

    Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.

  1. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Scheid Anika

    2012-07-01

    Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst

  2. Polarimetric SAR image classification based on discriminative dictionary learning model

    Science.gov (United States)

    Sang, Cheng Wei; Sun, Hong

    2018-03-01

    Polarimetric SAR (PolSAR) image classification is one of the important applications of PolSAR remote sensing. It is a difficult high-dimension nonlinear mapping problem, the sparse representations based on learning overcomplete dictionary have shown great potential to solve such problem. The overcomplete dictionary plays an important role in PolSAR image classification, however for PolSAR image complex scenes, features shared by different classes will weaken the discrimination of learned dictionary, so as to degrade classification performance. In this paper, we propose a novel overcomplete dictionary learning model to enhance the discrimination of dictionary. The learned overcomplete dictionary by the proposed model is more discriminative and very suitable for PolSAR classification.

  3. Semantic Document Image Classification Based on Valuable Text Pattern

    Directory of Open Access Journals (Sweden)

    Hossein Pourghassem

    2011-01-01

    Full Text Available Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.

  4. Video based object representation and classification using multiple covariance matrices.

    Science.gov (United States)

    Zhang, Yurong; Liu, Quan

    2017-01-01

    Video based object recognition and classification has been widely studied in computer vision and image processing area. One main issue of this task is to develop an effective representation for video. This problem can generally be formulated as image set representation. In this paper, we present a new method called Multiple Covariance Discriminative Learning (MCDL) for image set representation and classification problem. The core idea of MCDL is to represent an image set using multiple covariance matrices with each covariance matrix representing one cluster of images. Firstly, we use the Nonnegative Matrix Factorization (NMF) method to do image clustering within each image set, and then adopt Covariance Discriminative Learning on each cluster (subset) of images. At last, we adopt KLDA and nearest neighborhood classification method for image set classification. Promising experimental results on several datasets show the effectiveness of our MCDL method.

  5. Pathological Bases for a Robust Application of Cancer Molecular Classification

    Directory of Open Access Journals (Sweden)

    Salvador J. Diaz-Cano

    2015-04-01

    Full Text Available Any robust classification system depends on its purpose and must refer to accepted standards, its strength relying on predictive values and a careful consideration of known factors that can affect its reliability. In this context, a molecular classification of human cancer must refer to the current gold standard (histological classification and try to improve it with key prognosticators for metastatic potential, staging and grading. Although organ-specific examples have been published based on proteomics, transcriptomics and genomics evaluations, the most popular approach uses gene expression analysis as a direct correlate of cellular differentiation, which represents the key feature of the histological classification. RNA is a labile molecule that varies significantly according with the preservation protocol, its transcription reflect the adaptation of the tumor cells to the microenvironment, it can be passed through mechanisms of intercellular transference of genetic information (exosomes, and it is exposed to epigenetic modifications. More robust classifications should be based on stable molecules, at the genetic level represented by DNA to improve reliability, and its analysis must deal with the concept of intratumoral heterogeneity, which is at the origin of tumor progression and is the byproduct of the selection process during the clonal expansion and progression of neoplasms. The simultaneous analysis of multiple DNA targets and next generation sequencing offer the best practical approach for an analytical genomic classification of tumors.

  6. Assessing an ensemble Kalman filter inference of Manning’s n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    KAUST Repository

    Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

    2017-01-01

    an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model

  7. Classification of BCI Users Based on Cognition

    Directory of Open Access Journals (Sweden)

    N. Firat Ozkan

    2018-01-01

    Full Text Available Brain-Computer Interfaces (BCI are systems originally developed to assist paralyzed patients allowing for commands to the computer with brain activities. This study aims to examine cognitive state with an objective, easy-to-use, and easy-to-interpret method utilizing Brain-Computer Interface systems. Seventy healthy participants completed six tasks using a Brain-Computer Interface system and participants’ pupil dilation, blink rate, and Galvanic Skin Response (GSR data were collected simultaneously. Participants filled Nasa-TLX forms following each task and task performances of participants were also measured. Cognitive state clusters were created from the data collected using the K-means method. Taking these clusters and task performances into account, the general cognitive state of each participant was classified as low risk or high risk. Logistic Regression, Decision Tree, and Neural Networks were also used to classify the same data in order to measure the consistency of this classification with other techniques and the method provided a consistency between 87.1% and 100% with other techniques.

  8. The Study of Land Use Classification Based on SPOT6 High Resolution Data

    OpenAIRE

    Wu Song; Jiang Qigang

    2016-01-01

    A method is carried out to quick classification extract of the type of land use in agricultural areas, which is based on the spot6 high resolution remote sensing classification data and used of the good nonlinear classification ability of support vector machine. The results show that the spot6 high resolution remote sensing classification data can realize land classification efficiently, the overall classification accuracy reached 88.79% and Kappa factor is 0.8632 which means that the classif...

  9. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Directory of Open Access Journals (Sweden)

    Yoonsik Shim

    2016-10-01

    Full Text Available We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP. The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  10. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Science.gov (United States)

    Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

    2016-10-01

    We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  11. A novel signal compression method based on optimal ensemble empirical mode decomposition for bearing vibration signals

    Science.gov (United States)

    Guo, Wei; Tse, Peter W.

    2013-01-01

    Today, remote machine condition monitoring is popular due to the continuous advancement in wireless communication. Bearing is the most frequently and easily failed component in many rotating machines. To accurately identify the type of bearing fault, large amounts of vibration data need to be collected. However, the volume of transmitted data cannot be too high because the bandwidth of wireless communication is limited. To solve this problem, the data are usually compressed before transmitting to a remote maintenance center. This paper proposes a novel signal compression method that can substantially reduce the amount of data that need to be transmitted without sacrificing the accuracy of fault identification. The proposed signal compression method is based on ensemble empirical mode decomposition (EEMD), which is an effective method for adaptively decomposing the vibration signal into different bands of signal components, termed intrinsic mode functions (IMFs). An optimization method was designed to automatically select appropriate EEMD parameters for the analyzed signal, and in particular to select the appropriate level of the added white noise in the EEMD method. An index termed the relative root-mean-square error was used to evaluate the decomposition performances under different noise levels to find the optimal level. After applying the optimal EEMD method to a vibration signal, the IMF relating to the bearing fault can be extracted from the original vibration signal. Compressing this signal component obtains a much smaller proportion of data samples to be retained for transmission and further reconstruction. The proposed compression method were also compared with the popular wavelet compression method. Experimental results demonstrate that the optimization of EEMD parameters can automatically find appropriate EEMD parameters for the analyzed signals, and the IMF-based compression method provides a higher compression ratio, while retaining the bearing defect

  12. A new circulation type classification based upon Lagrangian air trajectories

    Directory of Open Access Journals (Sweden)

    Alexandre M. Ramos

    2014-10-01

    Full Text Available A new classification method of the large-scale circulation characteristic for a specific target area (NW Iberian Peninsula is presented, based on the analysis of 90-h backward trajectories arriving in this area calculated with the 3-D Lagrangian particle dispersion model FLEXPART. A cluster analysis is applied to separate the backward trajectories in up to five representative air streams for each day. Specific measures are then used to characterise the distinct air streams (e.g., curvature of the trajectories, cyclonic or anticyclonic flow, moisture evolution, origin and length of the trajectories. The robustness of the presented method is demonstrated in comparison with the Eulerian Lamb weather type classification.A case study of the 2003 heatwave is discussed in terms of the new Lagrangian circulation and the Lamb weather type classifications. It is shown that the new classification method adds valuable information about the pertinent meteorological conditions, which are missing in an Eulerian approach. The new method is climatologically evaluated for the five-year time period from December 1999 to November 2004. The ability of the method to capture the inter-seasonal circulation variability in the target region is shown. Furthermore, the multi-dimensional character of the classification is shortly discussed, in particular with respect to inter-seasonal differences. Finally, the relationship between the new Lagrangian classification and the precipitation in the target area is studied.

  13. Ensembles-based predictions of climate change impacts on bioclimatic zones in Northeast Asia

    Science.gov (United States)

    Choi, Y.; Jeon, S. W.; Lim, C. H.; Ryu, J.

    2017-12-01

    Biodiversity is rapidly declining globally and efforts are needed to mitigate this continually increasing loss of species. Clustering of areas with similar habitats can be used to prioritize protected areas and distribute resources for the conservation of species, selection of representative sample areas for research, and evaluation of impacts due to environmental changes. In this study, Northeast Asia (NEA) was classified into 14 bioclimatic zones using statistical techniques, which are correlation analysis and principal component analysis (PCA), and the iterative self-organizing data analysis technique algorithm (ISODATA). Based on these bioclimatic classification, we predicted shift of bioclimatic zones due to climate change. The input variables include the current climatic data (1960-1990) and the future climatic data of the HadGEM2-AO model (RCP 4.5(2050, 2070) and 8.5(2050, 2070)) provided by WorldClim. Using these data, multi-modeling methods including maximum likelihood classification, random forest, and species distribution modelling have been used to project the impact of climate change on the spatial distribution of bioclimatic zones within NEA. The results of various models were compared and analyzed by overlapping each result. As the result, significant changes in bioclimatic conditions can be expected throughout the NEA by 2050s and 2070s. The overall zones moved upward and some zones were predicted to disappear. This analysis provides the basis for understanding potential impacts of climate change on biodiversity and ecosystem. Also, this could be used more effectively to support decision making on climate change adaptation.

  14. Atmospheric circulation classification comparison based on wildfires in Portugal

    Science.gov (United States)

    Pereira, M. G.; Trigo, R. M.

    2009-04-01

    Atmospheric circulation classifications are not a simple description of atmospheric states but a tool to understand and interpret the atmospheric processes and to model the relation between atmospheric circulation and surface climate and other related variables (Radan Huth et al., 2008). Classifications were initially developed with weather forecasting purposes, however with the progress in computer processing capability, new and more robust objective methods were developed and applied to large datasets prompting atmospheric circulation classification methods to one of the most important fields in synoptic and statistical climatology. Classification studies have been extensively used in climate change studies (e.g. reconstructed past climates, recent observed changes and future climates), in bioclimatological research (e.g. relating human mortality to climatic factors) and in a wide variety of synoptic climatological applications (e.g. comparison between datasets, air pollution, snow avalanches, wine quality, fish captures and forest fires). Likewise, atmospheric circulation classifications are important for the study of the role of weather in wildfire occurrence in Portugal because the daily synoptic variability is the most important driver of local weather conditions (Pereira et al., 2005). In particular, the objective classification scheme developed by Trigo and DaCamara (2000) to classify the atmospheric circulation affecting Portugal have proved to be quite useful in discriminating the occurrence and development of wildfires as well as the distribution over Portugal of surface climatic variables with impact in wildfire activity such as maximum and minimum temperature and precipitation. This work aims to present: (i) an overview the existing circulation classification for the Iberian Peninsula, and (ii) the results of a comparison study between these atmospheric circulation classifications based on its relation with wildfires and relevant meteorological

  15. An Efficient Local Correlation Matrix Decomposition Approach for the Localization Implementation of Ensemble-Based Assimilation Methods

    Science.gov (United States)

    Zhang, Hongqin; Tian, Xiangjun

    2018-04-01

    Ensemble-based data assimilation methods often use the so-called localization scheme to improve the representation of the ensemble background error covariance (Be). Extensive research has been undertaken to reduce the computational cost of these methods by using the localized ensemble samples to localize Be by means of a direct decomposition of the local correlation matrix C. However, the computational costs of the direct decomposition of the local correlation matrix C are still extremely high due to its high dimension. In this paper, we propose an efficient local correlation matrix decomposition approach based on the concept of alternating directions. This approach is intended to avoid direct decomposition of the correlation matrix. Instead, we first decompose the correlation matrix into 1-D correlation matrices in the three coordinate directions, then construct their empirical orthogonal function decomposition at low resolution. This procedure is followed by the 1-D spline interpolation process to transform the above decompositions to the high-resolution grid. Finally, an efficient correlation matrix decomposition is achieved by computing the very similar Kronecker product. We conducted a series of comparison experiments to illustrate the validity and accuracy of the proposed local correlation matrix decomposition approach. The effectiveness of the proposed correlation matrix decomposition approach and its efficient localization implementation of the nonlinear least-squares four-dimensional variational assimilation are further demonstrated by several groups of numerical experiments based on the Advanced Research Weather Research and Forecasting model.

  16. Failure diagnosis using deep belief learning based health state classification

    International Nuclear Information System (INIS)

    Tamilselvan, Prasanna; Wang, Pingfeng

    2013-01-01

    Effective health diagnosis provides multifarious benefits such as improved safety, improved reliability and reduced costs for operation and maintenance of complex engineered systems. This paper presents a novel multi-sensor health diagnosis method using deep belief network (DBN). DBN has recently become a popular approach in machine learning for its promised advantages such as fast inference and the ability to encode richer and higher order network structures. The DBN employs a hierarchical structure with multiple stacked restricted Boltzmann machines and works through a layer by layer successive learning process. The proposed multi-sensor health diagnosis methodology using DBN based state classification can be structured in three consecutive stages: first, defining health states and preprocessing sensory data for DBN training and testing; second, developing DBN based classification models for diagnosis of predefined health states; third, validating DBN classification models with testing sensory dataset. Health diagnosis using DBN based health state classification technique is compared with four existing diagnosis techniques. Benchmark classification problems and two engineering health diagnosis applications: aircraft engine health diagnosis and electric power transformer health diagnosis are employed to demonstrate the efficacy of the proposed approach

  17. Analysis of ensemble learning using simple perceptrons based on online learning theory

    Science.gov (United States)

    Miyoshi, Seiji; Hara, Kazuyuki; Okada, Masato

    2005-03-01

    Ensemble learning of K nonlinear perceptrons, which determine their outputs by sign functions, is discussed within the framework of online learning and statistical mechanics. One purpose of statistical learning theory is to theoretically obtain the generalization error. This paper shows that ensemble generalization error can be calculated by using two order parameters, that is, the similarity between a teacher and a student, and the similarity among students. The differential equations that describe the dynamical behaviors of these order parameters are derived in the case of general learning rules. The concrete forms of these differential equations are derived analytically in the cases of three well-known rules: Hebbian learning, perceptron learning, and AdaTron (adaptive perceptron) learning. Ensemble generalization errors of these three rules are calculated by using the results determined by solving their differential equations. As a result, these three rules show different characteristics in their affinity for ensemble learning, that is “maintaining variety among students.” Results show that AdaTron learning is superior to the other two rules with respect to that affinity.

  18. Forecasting European cold waves based on subsampling strategies of CMIP5 and Euro-CORDEX ensembles

    Science.gov (United States)

    Cordero-Llana, Laura; Braconnot, Pascale; Vautard, Robert; Vrac, Mathieu; Jezequel, Aglae

    2016-04-01

    Forecasting future extreme events under the present changing climate represents a difficult task. Currently there are a large number of ensembles of simulations for climate projections that take in account different models and scenarios. However, there is a need for reducing the size of the ensemble to make the interpretation of these simulations more manageable for impact studies or climate risk assessment. This can be achieved by developing subsampling strategies to identify a limited number of simulations that best represent the ensemble. In this study, cold waves are chosen to test different approaches for subsampling available simulations. The definition of cold waves depends on the criteria used, but they are generally defined using a minimum temperature threshold, the duration of the cold spell as well as their geographical extend. These climate indicators are not universal, highlighting the difficulty of directly comparing different studies. As part of the of the CLIPC European project, we use daily surface temperature data obtained from CMIP5 outputs as well as Euro-CORDEX simulations to predict future cold waves events in Europe. From these simulations a clustering method is applied to minimise the number of ensembles required. Furthermore, we analyse the different uncertainties that arise from the different model characteristics and definitions of climate indicators. Finally, we will test if the same subsampling strategy can be used for different climate indicators. This will facilitate the use of the subsampling results for a wide number of impact assessment studies.

  19. Point Based Emotion Classification Using SVM

    OpenAIRE

    Swinkels, Wout

    2016-01-01

    The detection of emotions is a hot topic in the area of computer vision. Emotions are based on subtle changes in the face that are intuitively detected and interpreted by humans. Detecting these subtle changes, based on mathematical models, is a great challenge in the area of computer vision. In this thesis a new method is proposed to achieve state-of-the-art emotion detection performance. This method is based on facial feature points to monitor subtle changes in the face. Therefore the c...

  20. A Theoretical Analysis of Why Hybrid Ensembles Work

    Directory of Open Access Journals (Sweden)

    Kuo-Wei Hsu

    2017-01-01

    Full Text Available Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.

  1. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    Science.gov (United States)

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  2. A Hierarchical Method for Transient Stability Prediction of Power Systems Using the Confidence of a SVM-Based Ensemble Classifier

    Directory of Open Access Journals (Sweden)

    Yanzhen Zhou

    2016-09-01

    Full Text Available Machine learning techniques have been widely used in transient stability prediction of power systems. When using the post-fault dynamic responses, it is difficult to draw a definite conclusion about how long the duration of response data used should be in order to balance the accuracy and speed. Besides, previous studies have the problem of lacking consideration for the confidence level. To solve these problems, a hierarchical method for transient stability prediction based on the confidence of ensemble classifier using multiple support vector machines (SVMs is proposed. Firstly, multiple datasets are generated by bootstrap sampling, then features are randomly picked up to compress the datasets. Secondly, the confidence indices are defined and multiple SVMs are built based on these generated datasets. By synthesizing the probabilistic outputs of multiple SVMs, the prediction results and confidence of the ensemble classifier will be obtained. Finally, different ensemble classifiers with different response times are built to construct different layers of the proposed hierarchical scheme. The simulation results show that the proposed hierarchical method can balance the accuracy and rapidity of the transient stability prediction. Moreover, the hierarchical method can reduce the misjudgments of unstable instances and cooperate with the time domain simulation to insure the security and stability of power systems.

  3. Conservative strategy-based ensemble surrogate model for optimal groundwater remediation design at DNAPLs-contaminated sites

    Science.gov (United States)

    Ouyang, Qi; Lu, Wenxi; Lin, Jin; Deng, Wenbing; Cheng, Weiguo

    2017-08-01

    The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.

  4. ICF-based classification and measurement of functioning.

    Science.gov (United States)

    Stucki, G; Kostanjsek, N; Ustün, B; Cieza, A

    2008-09-01

    If we aim towards a comprehensive understanding of human functioning and the development of comprehensive programs to optimize functioning of individuals and populations we need to develop suitable measures. The approval of the International Classification, Disability and Health (ICF) in 2001 by the 54th World Health Assembly as the first universally shared model and classification of functioning, disability and health marks, therefore an important step in the development of measurement instruments and ultimately for our understanding of functioning, disability and health. The acceptance and use of the ICF as a reference framework and classification has been facilitated by its development in a worldwide, comprehensive consensus process and the increasing evidence regarding its validity. However, the broad acceptance and use of the ICF as a reference framework and classification will also depend on the resolution of conceptual and methodological challenges relevant for the classification and measurement of functioning. This paper therefore describes first how the ICF categories can serve as building blocks for the measurement of functioning and then the current state of the development of ICF based practical tools and international standards such as the ICF Core Sets. Finally it illustrates how to map the world of measures to the ICF and vice versa and the methodological principles relevant for the transformation of information obtained with a clinical test or a patient-oriented instrument to the ICF as well as the development of ICF-based clinical and self-reported measurement instruments.

  5. Chinese Sentence Classification Based on Convolutional Neural Network

    Science.gov (United States)

    Gu, Chengwei; Wu, Ming; Zhang, Chuang

    2017-10-01

    Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.

  6. A study of fuzzy logic ensemble system performance on face recognition problem

    Science.gov (United States)

    Polyakova, A.; Lipinskiy, L.

    2017-02-01

    Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.

  7. The Development of Target-Specific Pose Filter Ensembles To Boost Ligand Enrichment for Structure-Based Virtual Screening.

    Science.gov (United States)

    Xia, Jie; Hsieh, Jui-Hua; Hu, Huabin; Wu, Song; Wang, Xiang Simon

    2017-06-26

    Structure-based virtual screening (SBVS) has become an indispensable technique for hit identification at the early stage of drug discovery. However, the accuracy of current scoring functions is not high enough to confer success to every target and thus remains to be improved. Previously, we had developed binary pose filters (PFs) using knowledge derived from the protein-ligand interface of a single X-ray structure of a specific target. This novel approach had been validated as an effective way to improve ligand enrichment. Continuing from it, in the present work we attempted to incorporate knowledge collected from diverse protein-ligand interfaces of multiple crystal structures of the same target to build PF ensembles (PFEs). Toward this end, we first constructed a comprehensive data set to meet the requirements of ensemble modeling and validation. This set contains 10 diverse targets, 118 well-prepared X-ray structures of protein-ligand complexes, and large benchmarking actives/decoys sets. Notably, we designed a unique workflow of two-layer classifiers based on the concept of ensemble learning and applied it to the construction of PFEs for all of the targets. Through extensive benchmarking studies, we demonstrated that (1) coupling PFE with Chemgauss4 significantly improves the early enrichment of Chemgauss4 itself and (2) PFEs show greater consistency in boosting early enrichment and larger overall enrichment than our prior PFs. In addition, we analyzed the pairwise topological similarities among cognate ligands used to construct PFEs and found that it is the higher chemical diversity of the cognate ligands that leads to the improved performance of PFEs. Taken together, the results so far prove that the incorporation of knowledge from diverse protein-ligand interfaces by ensemble modeling is able to enhance the screening competence of SBVS scoring functions.

  8. NIM: A Node Influence Based Method for Cancer Classification

    Directory of Open Access Journals (Sweden)

    Yiwen Wang

    2014-01-01

    Full Text Available The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.

  9. Towards Data-Driven Simulations of Wildfire Spread using Ensemble-based Data Assimilation

    Science.gov (United States)

    Rochoux, M. C.; Bart, J.; Ricci, S. M.; Cuenot, B.; Trouvé, A.; Duchaine, F.; Morel, T.

    2012-12-01

    Real-time predictions of a propagating wildfire remain a challenging task because the problem involves both multi-physics and multi-scales. The propagation speed of wildfires, also called the rate of spread (ROS), is indeed determined by complex interactions between pyrolysis, combustion and flow dynamics, atmospheric dynamics occurring at vegetation, topographical and meteorological scales. Current operational fire spread models are mainly based on a semi-empirical parameterization of the ROS in terms of vegetation, topographical and meteorological properties. For the fire spread simulation to be predictive and compatible with operational applications, the uncertainty on the ROS model should be reduced. As recent progress made in remote sensing technology provides new ways to monitor the fire front position, a promising approach to overcome the difficulties found in wildfire spread simulations is to integrate fire modeling and fire sensing technologies using data assimilation (DA). For this purpose we have developed a prototype data-driven wildfire spread simulator in order to provide optimal estimates of poorly known model parameters [*]. The data-driven simulation capability is adapted for more realistic wildfire spread : it considers a regional-scale fire spread model that is informed by observations of the fire front location. An Ensemble Kalman Filter algorithm (EnKF) based on a parallel computing platform (OpenPALM) was implemented in order to perform a multi-parameter sequential estimation where wind magnitude and direction are in addition to vegetation properties (see attached figure). The EnKF algorithm shows its good ability to track a small-scale grassland fire experiment and ensures a good accounting for the sensitivity of the simulation outcomes to the control parameters. As a conclusion, it was shown that data assimilation is a promising approach to more accurately forecast time-varying wildfire spread conditions as new airborne-like observations of

  10. Some improved classification-based ridge parameter of Hoerl and ...

    African Journals Online (AJOL)

    Some improved classification-based ridge parameter of Hoerl and Kennard estimation techniques. ... This assumption is often violated and Ridge Regression estimator introduced by [2]has been identified to be more efficient than ordinary least square (OLS) in handling it. However, it requires a ridge parameter, K, of which ...

  11. Classification and Target Group Selection Based Upon Frequent Patterns

    NARCIS (Netherlands)

    W.H.L.M. Pijls (Wim); R. Potharst (Rob)

    2000-01-01

    textabstractIn this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is

  12. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

    Science.gov (United States)

    Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

    2017-09-01

    Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.

  13. Torrent classification - Base of rational management of erosive regions

    International Nuclear Information System (INIS)

    Gavrilovic, Zoran; Stefanovic, Milutin; Milovanovic, Irina; Cotric, Jelena; Milojevic, Mileta

    2008-01-01

    A complex methodology for torrents and erosion and the associated calculations was developed during the second half of the twentieth century in Serbia. It was the 'Erosion Potential Method'. One of the modules of that complex method was focused on torrent classification. The module enables the identification of hydro graphic, climate and erosion characteristics. The method makes it possible for each torrent, regardless of its magnitude, to be simply and recognizably described by the 'Formula of torrentially'. The above torrent classification is the base on which a set of optimisation calculations is developed for the required scope of erosion-control works and measures, the application of which enables the management of significantly larger erosion and torrential regions compared to the previous period. This paper will present the procedure and the method of torrent classification.

  14. Classification of scintigrams on the base of an automatic analysis

    International Nuclear Information System (INIS)

    Vidyukov, V.I.; Kasatkin, Yu.N.; Kal'nitskaya, E.F.; Mironov, S.P.; Rotenberg, E.M.

    1980-01-01

    The stages of drawing a discriminative system based on self-education for an automatic analysis of scintigrams have been considered. The results of the classification of 240 scintigrams of the liver into ''normal'', ''diffuse lesions'', ''focal lesions'' have been evaluated by medical experts and computer. The accuracy of the computerized classification was 91.7%, that of the experts-85%. The automatic analysis methods of scintigrams of the liver have been realized using the specialized MDS system of data processing. The quality of the discriminative system has been assessed on 125 scintigrams. The accuracy of the classification is equal to 89.6%. The employment of the self-education; methods permitted one to single out two subclasses depending on the severity of diffuse lesions

  15. Hyperspectral image classification based on local binary patterns and PCANet

    Science.gov (United States)

    Yang, Huizhen; Gao, Feng; Dong, Junyu; Yang, Yang

    2018-04-01

    Hyperspectral image classification has been well acknowledged as one of the challenging tasks of hyperspectral data processing. In this paper, we propose a novel hyperspectral image classification framework based on local binary pattern (LBP) features and PCANet. In the proposed method, linear prediction error (LPE) is first employed to select a subset of informative bands, and LBP is utilized to extract texture features. Then, spectral and texture features are stacked into a high dimensional vectors. Next, the extracted features of a specified position are transformed to a 2-D image. The obtained images of all pixels are fed into PCANet for classification. Experimental results on real hyperspectral dataset demonstrate the effectiveness of the proposed method.

  16. Remote Sensing Image Classification Based on Stacked Denoising Autoencoder

    Directory of Open Access Journals (Sweden)

    Peng Liang

    2017-12-01

    Full Text Available Focused on the issue that conventional remote sensing image classification methods have run into the bottlenecks in accuracy, a new remote sensing image classification method inspired by deep learning is proposed, which is based on Stacked Denoising Autoencoder. First, the deep network model is built through the stacked layers of Denoising Autoencoder. Then, with noised input, the unsupervised Greedy layer-wise training algorithm is used to train each layer in turn for more robust expressing, characteristics are obtained in supervised learning by Back Propagation (BP neural network, and the whole network is optimized by error back propagation. Finally, Gaofen-1 satellite (GF-1 remote sensing data are used for evaluation, and the total accuracy and kappa accuracy reach 95.7% and 0.955, respectively, which are higher than that of the Support Vector Machine and Back Propagation neural network. The experiment results show that the proposed method can effectively improve the accuracy of remote sensing image classification.

  17. Torrent classification - Base of rational management of erosive regions

    Energy Technology Data Exchange (ETDEWEB)

    Gavrilovic, Zoran; Stefanovic, Milutin; Milovanovic, Irina; Cotric, Jelena; Milojevic, Mileta [Institute for the Development of Water Resources ' Jaroslav Cerni' , 11226 Beograd (Pinosava), Jaroslava Cernog 80 (Serbia)], E-mail: gavrilovicz@sbb.rs

    2008-11-01

    A complex methodology for torrents and erosion and the associated calculations was developed during the second half of the twentieth century in Serbia. It was the 'Erosion Potential Method'. One of the modules of that complex method was focused on torrent classification. The module enables the identification of hydro graphic, climate and erosion characteristics. The method makes it possible for each torrent, regardless of its magnitude, to be simply and recognizably described by the 'Formula of torrentially'. The above torrent classification is the base on which a set of optimisation calculations is developed for the required scope of erosion-control works and measures, the application of which enables the management of significantly larger erosion and torrential regions compared to the previous period. This paper will present the procedure and the method of torrent classification.

  18. Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels.

    Science.gov (United States)

    Kaneko, Hiromasa

    2018-02-26

    To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.

  19. An ensemble based nonlinear orthogonal matching pursuit algorithm for sparse history matching of reservoir models

    KAUST Repository

    Fsheikh, Ahmed H.

    2013-01-01

    A nonlinear orthogonal matching pursuit (NOMP) for sparse calibration of reservoir models is presented. Sparse calibration is a challenging problem as the unknowns are both the non-zero components of the solution and their associated weights. NOMP is a greedy algorithm that discovers at each iteration the most correlated components of the basis functions with the residual. The discovered basis (aka support) is augmented across the nonlinear iterations. Once the basis functions are selected from the dictionary, the solution is obtained by applying Tikhonov regularization. The proposed algorithm relies on approximate gradient estimation using an iterative stochastic ensemble method (ISEM). ISEM utilizes an ensemble of directional derivatives to efficiently approximate gradients. In the current study, the search space is parameterized using an overcomplete dictionary of basis functions built using the K-SVD algorithm.

  20. Deep learning for EEG-Based preference classification

    Science.gov (United States)

    Teo, Jason; Hou, Chew Lin; Mountstephens, James

    2017-10-01

    Electroencephalogram (EEG)-based emotion classification is rapidly becoming one of the most intensely studied areas of brain-computer interfacing (BCI). The ability to passively identify yet accurately correlate brainwaves with our immediate emotions opens up truly meaningful and previously unattainable human-computer interactions such as in forensic neuroscience, rehabilitative medicine, affective entertainment and neuro-marketing. One particularly useful yet rarely explored areas of EEG-based emotion classification is preference recognition [1], which is simply the detection of like versus dislike. Within the limited investigations into preference classification, all reported studies were based on musically-induced stimuli except for a single study which used 2D images. The main objective of this study is to apply deep learning, which has been shown to produce state-of-the-art results in diverse hard problems such as in computer vision, natural language processing and audio recognition, to 3D object preference classification over a larger group of test subjects. A cohort of 16 users was shown 60 bracelet-like objects as rotating visual stimuli on a computer display while their preferences and EEGs were recorded. After training a variety of machine learning approaches which included deep neural networks, we then attempted to classify the users' preferences for the 3D visual stimuli based on their EEGs. Here, we show that that deep learning outperforms a variety of other machine learning classifiers for this EEG-based preference classification task particularly in a highly challenging dataset with large inter- and intra-subject variability.

  1. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  2. Sparse Representation Based Binary Hypothesis Model for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Yidong Tang

    2016-01-01

    Full Text Available The sparse representation based classifier (SRC and its kernel version (KSRC have been employed for hyperspectral image (HSI classification. However, the state-of-the-art SRC often aims at extended surface objects with linear mixture in smooth scene and assumes that the number of classes is given. Considering the small target with complex background, a sparse representation based binary hypothesis (SRBBH model is established in this paper. In this model, a query pixel is represented in two ways, which are, respectively, by background dictionary and by union dictionary. The background dictionary is composed of samples selected from the local dual concentric window centered at the query pixel. Thus, for each pixel the classification issue becomes an adaptive multiclass classification problem, where only the number of desired classes is required. Furthermore, the kernel method is employed to improve the interclass separability. In kernel space, the coding vector is obtained by using kernel-based orthogonal matching pursuit (KOMP algorithm. Then the query pixel can be labeled by the characteristics of the coding vectors. Instead of directly using the reconstruction residuals, the different impacts the background dictionary and union dictionary have on reconstruction are used for validation and classification. It enhances the discrimination and hence improves the performance.

  3. Energy-efficiency based classification of the manufacturing workstation

    Science.gov (United States)

    Frumuşanu, G.; Afteni, C.; Badea, N.; Epureanu, A.

    2017-08-01

    EU Directive 92/75/EC established for the first time an energy consumption labelling scheme, further implemented by several other directives. As consequence, nowadays many products (e.g. home appliances, tyres, light bulbs, houses) have an EU Energy Label when offered for sale or rent. Several energy consumption models of manufacturing equipments have been also developed. This paper proposes an energy efficiency - based classification of the manufacturing workstation, aiming to characterize its energetic behaviour. The concept of energy efficiency of the manufacturing workstation is defined. On this base, a classification methodology has been developed. It refers to specific criteria and their evaluation modalities, together to the definition & delimitation of energy efficiency classes. The energy class position is defined after the amount of energy needed by the workstation in the middle point of its operating domain, while its extension is determined by the value of the first coefficient from the Taylor series that approximates the dependence between the energy consume and the chosen parameter of the working regime. The main domain of interest for this classification looks to be the optimization of the manufacturing activities planning and programming. A case-study regarding an actual lathe classification from energy efficiency point of view, based on two different approaches (analytical and numerical) is also included.

  4. Knowledge-based approach to video content classification

    Science.gov (United States)

    Chen, Yu; Wong, Edward K.

    2001-01-01

    A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.

  5. Bayesian outcome-based strategy classification.

    Science.gov (United States)

    Lee, Michael D

    2016-03-01

    Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014) recently developed a method for making inferences about the decision processes people use in multi-attribute forced choice tasks. Their paper makes a number of worthwhile theoretical and methodological contributions. Theoretically, they provide an insightful psychological motivation for a probabilistic extension of the widely-used "weighted additive" (WADD) model, and show how this model, as well as other important models like "take-the-best" (TTB), can and should be expressed in terms of meaningful priors. Methodologically, they develop an inference approach based on the Minimum Description Length (MDL) principles that balances both the goodness-of-fit and complexity of the decision models they consider. This paper aims to preserve these useful contributions, but provide a complementary Bayesian approach with some theoretical and methodological advantages. We develop a simple graphical model, implemented in JAGS, that allows for fully Bayesian inferences about which models people use to make decisions. To demonstrate the Bayesian approach, we apply it to the models and data considered by Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014), showing how a prior predictive analysis of the models, and posterior inferences about which models people use and the parameter settings at which they use them, can contribute to our understanding of human decision making.

  6. [Computer aided diagnosis model for lung tumor based on ensemble convolutional neural network].

    Science.gov (United States)

    Wang, Yuanyuan; Zhou, Tao; Lu, Huiling; Wu, Cuiying; Yang, Pengfei

    2017-08-01

    The convolutional neural network (CNN) could be used on computer-aided diagnosis of lung tumor with positron emission tomography (PET)/computed tomography (CT), which can provide accurate quantitative analysis to compensate for visual inertia and defects in gray-scale sensitivity, and help doctors diagnose accurately. Firstly, parameter migration method is used to build three CNNs (CT-CNN, PET-CNN, and PET/CT-CNN) for lung tumor recognition in CT, PET, and PET/CT image, respectively. Then, we aimed at CT-CNN to obtain the appropriate model parameters for CNN training through analysis the influence of model parameters such as epochs, batchsize and image scale on recognition rate and training time. Finally, three single CNNs are used to construct ensemble CNN, and then lung tumor PET/CT recognition was completed through relative majority vote method and the performance between ensemble CNN and single CNN was compared. The experiment results show that the ensemble CNN is better than single CNN on computer-aided diagnosis of lung tumor.

  7. 'Lazy' quantum ensembles

    International Nuclear Information System (INIS)

    Parfionov, George; Zapatrin, Roman

    2006-01-01

    We compare different strategies aimed to prepare an ensemble with a given density matrix ρ. Preparing the ensemble of eigenstates of ρ with appropriate probabilities can be treated as 'generous' strategy: it provides maximal accessible information about the state. Another extremity is the so-called 'Scrooge' ensemble, which is mostly stingy in sharing the information. We introduce 'lazy' ensembles which require minimal effort to prepare the density matrix by selecting pure states with respect to completely random choice. We consider two parties, Alice and Bob, playing a kind of game. Bob wishes to guess which pure state is prepared by Alice. His null hypothesis, based on the lack of any information about Alice's intention, is that Alice prepares any pure state with equal probability. Then, the average quantum state measured by Bob turns out to be ρ, and he has to make a new hypothesis about Alice's intention solely based on the information that the observed density matrix is ρ. The arising 'lazy' ensemble is shown to be the alternative hypothesis which minimizes type I error

  8. Optical beam classification using deep learning: a comparison with rule- and feature-based classification

    Science.gov (United States)

    Alom, Md. Zahangir; Awwal, Abdul A. S.; Lowe-Webb, Roger; Taha, Tarek M.

    2017-08-01

    Vector Machine (SVM). The experimental results show around 96% classification accuracy using CNN; the CNN approach also provides comparable recognition results compared to the present feature-based off-normal detection. The feature-based solution was developed to capture the expertise of a human expert in classifying the images. The misclassified results are further studied to explain the differences and discover any discrepancies or inconsistencies in current classification.

  9. A Chinese text classification system based on Naive Bayes algorithm

    Directory of Open Access Journals (Sweden)

    Cui Wei

    2016-01-01

    Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.

  10. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box

    NARCIS (Netherlands)

    Ciompi, Francesco; de Hoop, Bartjan; van Riel, Sarah J.; Chung, Kaman; Scholten, Ernst Th.; Oudkerk, Matthijs; de Jong, Pim A.; Prokop, Mathias; van Ginneken, Bram

    2015-01-01

    In this paper, we tackle the problem of automatic classification of pulmonary peri-fissural nodules (PFNs). The classification problem is formulated as a machine learning approach, where detected nodule candidates are classified as PFNs or non-PFNs. Supervised learning is used, where a classifier is

  11. A non-destructive surface burn detection method for ferrous metals based on acoustic emission and ensemble empirical mode decomposition: from laser simulation to grinding process

    International Nuclear Information System (INIS)

    Yang, Zhensheng; Wu, Haixi; Yu, Zhonghua; Huang, Youfang

    2014-01-01

    Grinding is usually done in the final finishing of a component. As a result, the surface quality of finished products, e.g., surface roughness, hardness and residual stress, are affected by the grinding procedure. However, the lack of methods for monitoring of grinding makes it difficult to control the quality of the process. This paper focuses on the monitoring approaches for the surface burn phenomenon in grinding. A non-destructive burn detection method based on acoustic emission (AE) and ensemble empirical mode decomposition (EEMD) was proposed for this purpose. To precisely extract the AE features caused by phase transformation during burn formation, artificial burn was produced to mimic grinding burn by means of laser irradiation, since laser-induced burn involves less mechanical and electrical noise. The burn formation process was monitored by an AE sensor. The frequency band ranging from 150 to 400 kHz was believed to be related to surface burn formation in the laser irradiation process. The burn-sensitive frequency band was further used to instruct feature extraction during the grinding process based on EEMD. Linear classification results evidenced a distinct margin between samples with and without surface burn. This work provides a practical means for grinding burn detection. (paper)

  12. A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping

    Science.gov (United States)

    Naghibi, Seyed Amir; Moghaddam, Davood Davoodi; Kalantar, Bahareh; Pradhan, Biswajeet; Kisi, Ozgur

    2017-05-01

    In recent years, application of ensemble models has been increased tremendously in various types of natural hazard assessment such as landslides and floods. However, application of this kind of robust models in groundwater potential mapping is relatively new. This study applied four data mining algorithms including AdaBoost, Bagging, generalized additive model (GAM), and Naive Bayes (NB) models to map groundwater potential. Then, a novel frequency ratio data mining ensemble model (FREM) was introduced and evaluated. For this purpose, eleven groundwater conditioning factors (GCFs), including altitude, slope aspect, slope angle, plan curvature, stream power index (SPI), river density, distance from rivers, topographic wetness index (TWI), land use, normalized difference vegetation index (NDVI), and lithology were mapped. About 281 well locations with high potential were selected. Wells were randomly partitioned into two classes for training the models (70% or 197) and validating them (30% or 84). AdaBoost, Bagging, GAM, and NB algorithms were employed to get groundwater potential maps (GPMs). The GPMs were categorized into potential classes using natural break method of classification scheme. In the next stage, frequency ratio (FR) value was calculated for the output of the four aforementioned models and were summed, and finally a GPM was produced using FREM. For validating the models, area under receiver operating characteristics (ROC) curve was calculated. The ROC curve for prediction dataset was 94.8, 93.5, 92.6, 92.0, and 84.4% for FREM, Bagging, AdaBoost, GAM, and NB models, respectively. The results indicated that FREM had the best performance among all the models. The better performance of the FREM model could be related to reduction of over fitting and possible errors. Other models such as AdaBoost, Bagging, GAM, and NB also produced acceptable performance in groundwater modelling. The GPMs produced in the current study may facilitate groundwater exploitation

  13. Group-Based Active Learning of Classification Models.

    Science.gov (United States)

    Luo, Zhipeng; Hauskrecht, Milos

    2017-05-01

    Learning of classification models from real-world data often requires additional human expert effort to annotate the data. However, this process can be rather costly and finding ways of reducing the human annotation effort is critical for this task. The objective of this paper is to develop and study new ways of providing human feedback for efficient learning of classification models by labeling groups of examples. Briefly, unlike traditional active learning methods that seek feedback on individual examples, we develop a new group-based active learning framework that solicits label information on groups of multiple examples. In order to describe groups in a user-friendly way, conjunctive patterns are used to compactly represent groups. Our empirical study on 12 UCI data sets demonstrates the advantages and superiority of our approach over both classic instance-based active learning work, as well as existing group-based active-learning methods.

  14. Investigating energy-based pool structure selection in the structure ensemble modeling with experimental distance constraints: The example from a multidomain protein Pub1.

    Science.gov (United States)

    Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan

    2018-05-01

    The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.

  15. Classification of Hearing Loss Disorders Using Teoae-Based Descriptors

    Science.gov (United States)

    Hatzopoulos, Stavros Dimitris

    Transiently Evoked Otoacoustic Emissions (TEOAE) are signals produced by the cochlea upon stimulation by an acoustic click. Within the context of this dissertation, it was hypothesized that the relationship between the TEOAEs and the functional status of the OHCs provided an opportunity for designing a TEOAE-based clinical procedure that could be used to assess cochlear function. To understand the nature of the TEOAE signals in the time and the frequency domain several different analyses were performed. Using normative Input-Output (IO) curves, short-time FFT analyses and cochlear computer simulations, it was found that for optimization of the hearing loss classification it is necessary to use a complete 20 ms TEOAE segment. It was also determined that various 2-D filtering methods (median and averaging filtering masks, LP-FFT) used to enhance of the TEOAE S/N offered minimal improvement (less than 6 dB per stimulus level). Higher S/N improvements resulted in TEOAE sequences that were over-smoothed. The final classification algorithm was based on a statistical analysis of raw FFT data and when applied to a sample set of clinically obtained TEOAE recordings (from 56 normal and 66 hearing-loss subjects) correctly identified 94.3% of the normal and 90% of the hearing loss subjects, at the 80 dB SPL stimulus level. To enhance the discrimination between the conductive and the sensorineural populations, data from the 68 dB SPL stimulus level were used, which yielded a normal classification of 90.2%, a hearing loss classification of 87.5% and a conductive-sensorineural classification of 87%. Among the hearing-loss populations the best discrimination was obtained in the group of otosclerosis and the worst in the group of acute acoustic trauma.

  16. Comparison Of Power Quality Disturbances Classification Based On Neural Network

    Directory of Open Access Journals (Sweden)

    Nway Nway Kyaw Win

    2015-07-01

    Full Text Available Abstract Power quality disturbances PQDs result serious problems in the reliability safety and economy of power system network. In order to improve electric power quality events the detection and classification of PQDs must be made type of transient fault. Software analysis of wavelet transform with multiresolution analysis MRA algorithm and feed forward neural network probabilistic and multilayer feed forward neural network based methodology for automatic classification of eight types of PQ signals flicker harmonics sag swell impulse fluctuation notch and oscillatory will be presented. The wavelet family Db4 is chosen in this system to calculate the values of detailed energy distributions as input features for classification because it can perform well in detecting and localizing various types of PQ disturbances. This technique classifies the types of PQDs problem sevents.The classifiers classify and identify the disturbance type according to the energy distribution. The results show that the PNN can analyze different power disturbance types efficiently. Therefore it can be seen that PNN has better classification accuracy than MLFF.

  17. HBC-Evo: predicting human breast cancer by exploiting amino acid sequence-based feature spaces and evolutionary ensemble system.

    Science.gov (United States)

    Majid, Abdul; Ali, Safdar

    2015-01-01

    We developed genetic programming (GP)-based evolutionary ensemble system for the early diagnosis, prognosis and prediction of human breast cancer. This system has effectively exploited the diversity in feature and decision spaces. First, individual learners are trained in different feature spaces using physicochemical properties of protein amino acids. Their predictions are then stacked to develop the best solution during GP evolution process. Finally, results for HBC-Evo system are obtained with optimal threshold, which is computed using particle swarm optimization. Our novel approach has demonstrated promising results compared to state of the art approaches.

  18. A One-Step-Ahead Smoothing-Based Joint Ensemble Kalman Filter for State-Parameter Estimation of Hydrological Models

    KAUST Repository

    El Gharamti, Mohamad

    2015-11-26

    The ensemble Kalman filter (EnKF) recursively integrates field data into simulation models to obtain a better characterization of the model’s state and parameters. These are generally estimated following a state-parameters joint augmentation strategy. In this study, we introduce a new smoothing-based joint EnKF scheme, in which we introduce a one-step-ahead smoothing of the state before updating the parameters. Numerical experiments are performed with a two-dimensional synthetic subsurface contaminant transport model. The improved performance of the proposed joint EnKF scheme compared to the standard joint EnKF compensates for the modest increase in the computational cost.

  19. A generalized polynomial chaos based ensemble Kalman filter with high accuracy

    International Nuclear Information System (INIS)

    Li Jia; Xiu Dongbin

    2009-01-01

    As one of the most adopted sequential data assimilation methods in many areas, especially those involving complex nonlinear dynamics, the ensemble Kalman filter (EnKF) has been under extensive investigation regarding its properties and efficiency. Compared to other variants of the Kalman filter (KF), EnKF is straightforward to implement, as it employs random ensembles to represent solution states. This, however, introduces sampling errors that affect the accuracy of EnKF in a negative manner. Though sampling errors can be easily reduced by using a large number of samples, in practice this is undesirable as each ensemble member is a solution of the system of state equations and can be time consuming to compute for large-scale problems. In this paper we present an efficient EnKF implementation via generalized polynomial chaos (gPC) expansion. The key ingredients of the proposed approach involve (1) solving the system of stochastic state equations via the gPC methodology to gain efficiency; and (2) sampling the gPC approximation of the stochastic solution with an arbitrarily large number of samples, at virtually no additional computational cost, to drastically reduce the sampling errors. The resulting algorithm thus achieves a high accuracy at reduced computational cost, compared to the classical implementations of EnKF. Numerical examples are provided to verify the convergence property and accuracy improvement of the new algorithm. We also prove that for linear systems with Gaussian noise, the first-order gPC Kalman filter method is equivalent to the exact Kalman filter.

  20. Structure-based classification and ontology in chemistry

    Directory of Open Access Journals (Sweden)

    Hastings Janna

    2012-04-01

    Full Text Available Abstract Background Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures, while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. Results We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. Conclusion Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational

  1. Dynamic Security Assessment of Western Danish Power System Based on Ensemble Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Bak, Claus Leth; Chen, Zhe

    2014-01-01

    With the increasing penetration of renewable energy resources and other forms of dispersed generation, more and more uncertainties will be brought to the dynamic security assessment (DSA) of power systems. This paper proposes an approach that uses ensemble decision trees (EDT) for online DSA. Fed...... with online wide-area measurement data, it is capable of not only predicting the security states of current operating conditions (OC) with high accuracy, but also indicating the confidence of the security states 1 minute ahead of the real time by an outlier identification method. The results of EDT together...

  2. Gradient Evolution-based Support Vector Machine Algorithm for Classification

    Science.gov (United States)

    Zulvia, Ferani E.; Kuo, R. J.

    2018-03-01

    This paper proposes a classification algorithm based on a support vector machine (SVM) and gradient evolution (GE) algorithms. SVM algorithm has been widely used in classification. However, its result is significantly influenced by the parameters. Therefore, this paper aims to propose an improvement of SVM algorithm which can find the best SVMs’ parameters automatically. The proposed algorithm employs a GE algorithm to automatically determine the SVMs’ parameters. The GE algorithm takes a role as a global optimizer in finding the best parameter which will be used by SVM algorithm. The proposed GE-SVM algorithm is verified using some benchmark datasets and compared with other metaheuristic-based SVM algorithms. The experimental results show that the proposed GE-SVM algorithm obtains better results than other algorithms tested in this paper.

  3. A strategy learning model for autonomous agents based on classification

    Directory of Open Access Journals (Sweden)

    Śnieżyński Bartłomiej

    2015-09-01

    Full Text Available In this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

  4. Vessel-guided airway segmentation based on voxel classification

    DEFF Research Database (Denmark)

    Lo, Pechin Chien Pau; Sporring, Jon; Ashraf, Haseem

    2008-01-01

    This paper presents a method for improving airway tree segmentation using vessel orientation information. We use the fact that an airway branch is always accompanied by an artery, with both structures having similar orientations. This work is based on a  voxel classification airway segmentation...... method proposed previously. The probability of a voxel belonging to the airway, from the voxel classification method, is augmented with an orientation similarity measure as a criterion for region growing. The orientation similarity measure of a voxel indicates how similar is the orientation...... of the surroundings of a voxel, estimated based on a tube model, is to that of a neighboring vessel. The proposed method is tested on 20 CT images from different subjects selected randomly from a lung cancer screening study. Length of the airway branches from the results of the proposed method are significantly...

  5. A Sieving ANN for Emotion-Based Movie Clip Classification

    Science.gov (United States)

    Watanapa, Saowaluk C.; Thipakorn, Bundit; Charoenkitkarn, Nipon

    Effective classification and analysis of semantic contents are very important for the content-based indexing and retrieval of video database. Our research attempts to classify movie clips into three groups of commonly elicited emotions, namely excitement, joy and sadness, based on a set of abstract-level semantic features extracted from the film sequence. In particular, these features consist of six visual and audio measures grounded on the artistic film theories. A unique sieving-structured neural network is proposed to be the classifying model due to its robustness. The performance of the proposed model is tested with 101 movie clips excerpted from 24 award-winning and well-known Hollywood feature films. The experimental result of 97.8% correct classification rate, measured against the collected human-judges, indicates the great potential of using abstract-level semantic features as an engineered tool for the application of video-content retrieval/indexing.

  6. A comparison of resampling schemes for estimating model observer performance with small ensembles

    Science.gov (United States)

    Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.

    2017-09-01

    In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.

  7. Land Cover and Land Use Classification with TWOPAC: towards Automated Processing for Pixel- and Object-Based Image Classification

    Directory of Open Access Journals (Sweden)

    Stefan Dech

    2012-09-01

    Full Text Available We present a novel and innovative automated processing environment for the derivation of land cover (LC and land use (LU information. This processing framework named TWOPAC (TWinned Object and Pixel based Automated classification Chain enables the standardized, independent, user-friendly, and comparable derivation of LC and LU information, with minimized manual classification labor. TWOPAC allows classification of multi-spectral and multi-temporal remote sensing imagery from different sensor types. TWOPAC enables not only pixel-based classification, but also allows classification based on object-based characteristics. Classification is based on a Decision Tree approach (DT for which the well-known C5.0 code has been implemented, which builds decision trees based on the concept of information entropy. TWOPAC enables automatic generation of the decision tree classifier based on a C5.0-retrieved ascii-file, as well as fully automatic validation of the classification output via sample based accuracy assessment.Envisaging the automated generation of standardized land cover products, as well as area-wide classification of large amounts of data in preferably a short processing time, standardized interfaces for process control, Web Processing Services (WPS, as introduced by the Open Geospatial Consortium (OGC, are utilized. TWOPAC’s functionality to process geospatial raster or vector data via web resources (server, network enables TWOPAC’s usability independent of any commercial client or desktop software and allows for large scale data processing on servers. Furthermore, the components of TWOPAC were built-up using open source code components and are implemented as a plug-in for Quantum GIS software for easy handling of the classification process from the user’s perspective.

  8. A new gammagraphic and functional-based classification for hyperthyroidism

    International Nuclear Information System (INIS)

    Sanchez, J.; Lamata, F.; Cerdan, R.; Agilella, V.; Gastaminza, R.; Abusada, R.; Gonzales, M.; Martinez, M.

    2000-01-01

    The absence of an universal classification for hyperthyroidism's (HT), give rise to inadequate interpretation of series and trials, and prevents decision making. We offer a tentative classification based on gammagraphic and functional findings. Clinical records from patients who underwent thyroidectomy in our Department since 1967 to 1997 were reviewed. Those with functional measurements of hyperthyroidism were considered. All were managed according to the same preestablished guidelines. HT was the surgical indication in 694 (27,1%) of the 2559 thyroidectomy. Based on gammagraphic studies, we classified HTs in: parenchymatous increased-uptake, which could be diffuse, diffuse with cold nodules or diffuse with at least one nodule, and nodular increased-uptake (Autonomous Functioning Thyroid Nodes-AFTN), divided into solitary AFTN or toxic adenoma and multiple AFTN o toxic multi-nodular goiter. This gammagraphic-based classification in useful and has high sensitivity to detect these nodules assessing their activity, allowing us to make therapeutic decision making and, in some cases, to choose surgical technique. (authors)

  9. An Ensemble-Based Training Data Refinement for Automatic Crop Discrimination Using WorldView-2 Imagery

    DEFF Research Database (Denmark)

    Chellasamy, Menaka; Ferre, Ty Paul; Greve, Mogens Humlekrog

    2015-01-01

    This paper presents a new approach for refining and selecting training data for satellite imagery-based crop discrimination. The goal of this approach is to automate the pixel-based “multievidence crop classification approach,” proposed by the authors in their previous research. The present study...

  10. Climatic Models Ensemble-based Mid-21st Century Runoff Projections: A Bayesian Framework

    Science.gov (United States)

    Achieng, K. O.; Zhu, J.

    2017-12-01

    There are a number of North American Regional Climate Change Assessment Program (NARCCAP) climatic models that have been used to project surface runoff in the mid-21st century. Statistical model selection techniques are often used to select the model that best fits data. However, model selection techniques often lead to different conclusions. In this study, ten models are averaged in Bayesian paradigm to project runoff. Bayesian Model Averaging (BMA) is used to project and identify effect of model uncertainty on future runoff projections. Baseflow separation - a two-digital filter which is also called Eckhardt filter - is used to separate USGS streamflow (total runoff) into two components: baseflow and surface runoff. We use this surface runoff as the a priori runoff when conducting BMA of runoff simulated from the ten RCM models. The primary objective of this study is to evaluate how well RCM multi-model ensembles simulate surface runoff, in a Bayesian framework. Specifically, we investigate and discuss the following questions: How well do ten RCM models ensemble jointly simulate surface runoff by averaging over all the models using BMA, given a priori surface runoff? What are the effects of model uncertainty on surface runoff simulation?

  11. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    Science.gov (United States)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  12. Assessing a robust ensemble-based Kalman filter for efficient ecosystem data assimilation of the Cretan Sea

    KAUST Repository

    Triantafyllou, George N.; Hoteit, Ibrahim; Luo, Xiaodong; Tsiaras, Kostas P.; Petihakis, George

    2013-01-01

    An application of an ensemble-based robust filter for data assimilation into an ecosystem model of the Cretan Sea is presented and discussed. The ecosystem model comprises two on-line coupled sub-models: the Princeton Ocean Model (POM) and the European Regional Seas Ecosystem Model (ERSEM). The filtering scheme is based on the Singular Evolutive Interpolated Kalman (SEIK) filter which is implemented with a time-local H∞ filtering strategy to enhance robustness and performances during periods of strong ecosystem variability. Assimilation experiments in the Cretan Sea indicate that robustness can be achieved in the SEIK filter by introducing an adaptive inflation scheme of the modes of the filter error covariance matrix. Twin-experiments are performed to evaluate the performance of the assimilation system and to study the benefits of using robust filtering in an ensemble filtering framework. Pseudo-observations of surface chlorophyll, extracted from a model reference run, were assimilated every two days. Simulation results suggest that the adaptive inflation scheme significantly improves the behavior of the SEIK filter during periods of strong ecosystem variability. © 2012 Elsevier B.V.

  13. Assessing a robust ensemble-based Kalman filter for efficient ecosystem data assimilation of the Cretan Sea

    KAUST Repository

    Triantafyllou, George N.

    2013-09-01

    An application of an ensemble-based robust filter for data assimilation into an ecosystem model of the Cretan Sea is presented and discussed. The ecosystem model comprises two on-line coupled sub-models: the Princeton Ocean Model (POM) and the European Regional Seas Ecosystem Model (ERSEM). The filtering scheme is based on the Singular Evolutive Interpolated Kalman (SEIK) filter which is implemented with a time-local H∞ filtering strategy to enhance robustness and performances during periods of strong ecosystem variability. Assimilation experiments in the Cretan Sea indicate that robustness can be achieved in the SEIK filter by introducing an adaptive inflation scheme of the modes of the filter error covariance matrix. Twin-experiments are performed to evaluate the performance of the assimilation system and to study the benefits of using robust filtering in an ensemble filtering framework. Pseudo-observations of surface chlorophyll, extracted from a model reference run, were assimilated every two days. Simulation results suggest that the adaptive inflation scheme significantly improves the behavior of the SEIK filter during periods of strong ecosystem variability. © 2012 Elsevier B.V.

  14. Changing Histopathological Diagnostics by Genome-Based Tumor Classification

    Directory of Open Access Journals (Sweden)

    Michael Kloth

    2014-05-01

    Full Text Available Traditionally, tumors are classified by histopathological criteria, i.e., based on their specific morphological appearances. Consequently, current therapeutic decisions in oncology are strongly influenced by histology rather than underlying molecular or genomic aberrations. The increase of information on molecular changes however, enabled by the Human Genome Project and the International Cancer Genome Consortium as well as the manifold advances in molecular biology and high-throughput sequencing techniques, inaugurated the integration of genomic information into disease classification. Furthermore, in some cases it became evident that former classifications needed major revision and adaption. Such adaptations are often required by understanding the pathogenesis of a disease from a specific molecular alteration, using this molecular driver for targeted and highly effective therapies. Altogether, reclassifications should lead to higher information content of the underlying diagnoses, reflecting their molecular pathogenesis and resulting in optimized and individual therapeutic decisions. The objective of this article is to summarize some particularly important examples of genome-based classification approaches and associated therapeutic concepts. In addition to reviewing disease specific markers, we focus on potentially therapeutic or predictive markers and the relevance of molecular diagnostics in disease monitoring.

  15. G0-WISHART Distribution Based Classification from Polarimetric SAR Images

    Science.gov (United States)

    Hu, G. C.; Zhao, Q. H.

    2017-09-01

    Enormous scientific and technical developments have been carried out to further improve the remote sensing for decades, particularly Polarimetric Synthetic Aperture Radar(PolSAR) technique, so classification method based on PolSAR images has getted much more attention from scholars and related department around the world. The multilook polarmetric G0-Wishart model is a more flexible model which describe homogeneous, heterogeneous and extremely heterogeneous regions in the image. Moreover, the polarmetric G0-Wishart distribution dose not include the modified Bessel function of the second kind. It is a kind of simple statistical distribution model with less parameter. To prove its feasibility, a process of classification has been tested with the full-polarized Synthetic Aperture Radar (SAR) image by the method. First, apply multilook polarimetric SAR data process and speckle filter to reduce speckle influence for classification result. Initially classify the image into sixteen classes by H/A/α decomposition. Using the ICM algorithm to classify feature based on the G0-Wshart distance. Qualitative and quantitative results show that the proposed method can classify polaimetric SAR data effectively and efficiently.

  16. Histological image classification using biologically interpretable shape-based features

    International Nuclear Information System (INIS)

    Kothari, Sonal; Phan, John H; Young, Andrew N; Wang, May D

    2013-01-01

    Automatic cancer diagnostic systems based on histological image classification are important for improving therapeutic decisions. Previous studies propose textural and morphological features for such systems. These features capture patterns in histological images that are useful for both cancer grading and subtyping. However, because many of these features lack a clear biological interpretation, pathologists may be reluctant to adopt these features for clinical diagnosis. We examine the utility of biologically interpretable shape-based features for classification of histological renal tumor images. Using Fourier shape descriptors, we extract shape-based features that capture the distribution of stain-enhanced cellular and tissue structures in each image and evaluate these features using a multi-class prediction model. We compare the predictive performance of the shape-based diagnostic model to that of traditional models, i.e., using textural, morphological and topological features. The shape-based model, with an average accuracy of 77%, outperforms or complements traditional models. We identify the most informative shapes for each renal tumor subtype from the top-selected features. Results suggest that these shapes are not only accurate diagnostic features, but also correlate with known biological characteristics of renal tumors. Shape-based analysis of histological renal tumor images accurately classifies disease subtypes and reveals biologically insightful discriminatory features. This method for shape-based analysis can be extended to other histological datasets to aid pathologists in diagnostic and therapeutic decisions

  17. Design and implementation based on the classification protection vulnerability scanning system

    International Nuclear Information System (INIS)

    Wang Chao; Lu Zhigang; Liu Baoxu

    2010-01-01

    With the application and spread of the classification protection, Network Security Vulnerability Scanning should consider the efficiency and the function expansion. It proposes a kind of a system vulnerability from classification protection, and elaborates the design and implementation of a vulnerability scanning system based on vulnerability classification plug-in technology and oriented classification protection. According to the experiment, the application of classification protection has good adaptability and salability with the system, and it also approves the efficiency of scanning. (authors)

  18. Soil classification basing on the spectral characteristics of topsoil samples

    Science.gov (United States)

    Liu, Huanjun; Zhang, Xiaokang; Zhang, Xinle

    2016-04-01

    Soil taxonomy plays an important role in soil utility and management, but China has only course soil map created based on 1980s data. New technology, e.g. spectroscopy, could simplify soil classification. The study try to classify soils basing on the spectral characteristics of topsoil samples. 148 topsoil samples of typical soils, including Black soil, Chernozem, Blown soil and Meadow soil, were collected from Songnen plain, Northeast China, and the room spectral reflectance in the visible and near infrared region (400-2500 nm) were processed with weighted moving average, resampling technique, and continuum removal. Spectral indices were extracted from soil spectral characteristics, including the second absorption positions of spectral curve, the first absorption vale's area, and slope of spectral curve at 500-600 nm and 1340-1360 nm. Then K-means clustering and decision tree were used respectively to build soil classification model. The results indicated that 1) the second absorption positions of Black soil and Chernozem were located at 610 nm and 650 nm respectively; 2) the spectral curve of the meadow is similar to its adjacent soil, which could be due to soil erosion; 3) decision tree model showed higher classification accuracy, and accuracy of Black soil, Chernozem, Blown soil and Meadow are 100%, 88%, 97%, 50% respectively, and the accuracy of Blown soil could be increased to 100% by adding one more spectral index (the first two vole's area) to the model, which showed that the model could be used for soil classification and soil map in near future.

  19. One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values

    Directory of Open Access Journals (Sweden)

    Jin Xiao

    2014-01-01

    Full Text Available Scientific customer value segmentation (CVS is the base of efficient customer relationship management, and customer credit scoring, fraud detection, and churn prediction all belong to CVS. In real CVS, the customer data usually include lots of missing values, which may affect the performance of CVS model greatly. This study proposes a one-step dynamic classifier ensemble model for missing values (ODCEM model. On the one hand, ODCEM integrates the preprocess of missing values and the classification modeling into one step; on the other hand, it utilizes multiple classifiers ensemble technology in constructing the classification models. The empirical results in credit scoring dataset “German” from UCI and the real customer churn prediction dataset “China churn” show that the ODCEM outperforms four commonly used “two-step” models and the ensemble based model LMF and can provide better decision support for market managers.

  20. [Simulation of cropland soil moisture based on an ensemble Kalman filter].

    Science.gov (United States)

    Liu, Zhao; Zhou, Yan-Lian; Ju, Wei-Min; Gao, Ping

    2011-11-01

    By using an ensemble Kalman filter (EnKF) to assimilate the observed soil moisture data, the modified boreal ecosystem productivity simulator (BEPS) model was adopted to simulate the dynamics of soil moisture in winter wheat root zones at Xuzhou Agro-meteorological Station, Jiangsu Province of China during the growth seasons in 2000-2004. After the assimilation of observed data, the determination coefficient, root mean square error, and average absolute error of simulated soil moisture were in the ranges of 0.626-0.943, 0.018-0.042, and 0.021-0.041, respectively, with the simulation precision improved significantly, as compared with that before assimilation, indicating the applicability of data assimilation in improving the simulation of soil moisture. The experimental results at single point showed that the errors in the forcing data and observations and the frequency and soil depth of the assimilation of observed data all had obvious effects on the simulated soil moisture.

  1. Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

    Science.gov (United States)

    Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

    2018-03-27

    P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

  2. Radiographic classification for fractures of the fifth metatarsal base

    International Nuclear Information System (INIS)

    Mehlhorn, Alexander T.; Zwingmann, Joern; Hirschmueller, Anja; Suedkamp, Norbert P.; Schmal, Hagen

    2014-01-01

    Avulsion fractures of the fifth metatarsal base (MTB5) are common fore foot injuries. Based on a radiomorphometric analysis reflecting the risk for a secondary displacement, a new classification was developed. A cohort of 95 healthy, sportive, and young patients (age ≤ 50 years) with avulsion fractures of the MTB5 was included in the study and divided into groups with non-displaced, primary-displaced, and secondary-displaced fractures. Radiomorphometric data obtained using standard oblique and dorso-plantar views were analyzed in association with secondary displacement. Based on this, a classification was developed and checked for reproducibility. Fractures with a longer distance between the lateral edge of the styloid process and the lateral fracture step-off and fractures with a more medial joint entry of the fracture line at the MTB5 are at higher risk to displace secondarily. Based on these findings, all fractures were divided into three types: type I with a fracture entry in the lateral third; type II in the middle third; and type III in the medial third of the MTB5. Additionally, the three types were subdivided into an A-type with a fracture displacement <2 mm and a B-type with a fracture displacement ≥ 2 mm. A substantial level of interobserver agreement was found in the assignment of all 95 fractures to the six fracture types (κ = 0.72). The secondary displacement of fractures was confirmed by all examiners in 100 %. Radiomorphometric data may identify fractures at risk for secondary displacement of the MTB5. Based on this, a reliable classification was developed. (orig.)

  3. Radiographic classification for fractures of the fifth metatarsal base

    Energy Technology Data Exchange (ETDEWEB)

    Mehlhorn, Alexander T.; Zwingmann, Joern; Hirschmueller, Anja; Suedkamp, Norbert P.; Schmal, Hagen [University of Freiburg Medical Center, Department of Orthopaedic Surgery, Freiburg (Germany)

    2014-04-15

    Avulsion fractures of the fifth metatarsal base (MTB5) are common fore foot injuries. Based on a radiomorphometric analysis reflecting the risk for a secondary displacement, a new classification was developed. A cohort of 95 healthy, sportive, and young patients (age ≤ 50 years) with avulsion fractures of the MTB5 was included in the study and divided into groups with non-displaced, primary-displaced, and secondary-displaced fractures. Radiomorphometric data obtained using standard oblique and dorso-plantar views were analyzed in association with secondary displacement. Based on this, a classification was developed and checked for reproducibility. Fractures with a longer distance between the lateral edge of the styloid process and the lateral fracture step-off and fractures with a more medial joint entry of the fracture line at the MTB5 are at higher risk to displace secondarily. Based on these findings, all fractures were divided into three types: type I with a fracture entry in the lateral third; type II in the middle third; and type III in the medial third of the MTB5. Additionally, the three types were subdivided into an A-type with a fracture displacement <2 mm and a B-type with a fracture displacement ≥ 2 mm. A substantial level of interobserver agreement was found in the assignment of all 95 fractures to the six fracture types (κ = 0.72). The secondary displacement of fractures was confirmed by all examiners in 100 %. Radiomorphometric data may identify fractures at risk for secondary displacement of the MTB5. Based on this, a reliable classification was developed. (orig.)

  4. Risk Classification and Risk-based Safety and Mission Assurance

    Science.gov (United States)

    Leitner, Jesse A.

    2014-01-01

    Recent activities to revamp and emphasize the need to streamline processes and activities for Class D missions across the agency have led to various interpretations of Class D, including the lumping of a variety of low-cost projects into Class D. Sometimes terms such as Class D minus are used. In this presentation, mission risk classifications will be traced to official requirements and definitions as a measure to ensure that projects and programs align with the guidance and requirements that are commensurate for their defined risk posture. As part of this, the full suite of risk classifications, formal and informal will be defined, followed by an introduction to the new GPR 8705.4 that is currently under review.GPR 8705.4 lays out guidance for the mission success activities performed at the Classes A-D for NPR 7120.5 projects as well as for projects not under NPR 7120.5. Furthermore, the trends in stepping from Class A into higher risk posture classifications will be discussed. The talk will conclude with a discussion about risk-based safety and mission assuranceat GSFC.

  5. Overfitting Reduction of Text Classification Based on AdaBELM

    Directory of Open Access Journals (Sweden)

    Xiaoyue Feng

    2017-07-01

    Full Text Available Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM, suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.

  6. Image Classification Based on Convolutional Denoising Sparse Autoencoder

    Directory of Open Access Journals (Sweden)

    Shuangshuang Chen

    2017-01-01

    Full Text Available Image classification aims to group images into corresponding semantic categories. Due to the difficulties of interclass similarity and intraclass variability, it is a challenging issue in computer vision. In this paper, an unsupervised feature learning approach called convolutional denoising sparse autoencoder (CDSAE is proposed based on the theory of visual attention mechanism and deep learning methods. Firstly, saliency detection method is utilized to get training samples for unsupervised feature learning. Next, these samples are sent to the denoising sparse autoencoder (DSAE, followed by convolutional layer and local contrast normalization layer. Generally, prior in a specific task is helpful for the task solution. Therefore, a new pooling strategy—spatial pyramid pooling (SPP fused with center-bias prior—is introduced into our approach. Experimental results on the common two image datasets (STL-10 and CIFAR-10 demonstrate that our approach is effective in image classification. They also demonstrate that none of these three components: local contrast normalization, SPP fused with center-prior, and l2 vector normalization can be excluded from our proposed approach. They jointly improve image representation and classification performance.

  7. Tongue Images Classification Based on Constrained High Dispersal Network

    Directory of Open Access Journals (Sweden)

    Dan Meng

    2017-01-01

    Full Text Available Computer aided tongue diagnosis has a great potential to play important roles in traditional Chinese medicine (TCM. However, the majority of the existing tongue image analyses and classification methods are based on the low-level features, which may not provide a holistic view of the tongue. Inspired by deep convolutional neural network (CNN, we propose a novel feature extraction framework called constrained high dispersal neural networks (CHDNet to extract unbiased features and reduce human labor for tongue diagnosis in TCM. Previous CNN models have mostly focused on learning convolutional filters and adapting weights between them, but these models have two major issues: redundancy and insufficient capability in handling unbalanced sample distribution. We introduce high dispersal and local response normalization operation to address the issue of redundancy. We also add multiscale feature analysis to avoid the problem of sensitivity to deformation. Our proposed CHDNet learns high-level features and provides more classification information during training time, which may result in higher accuracy when predicting testing samples. We tested the proposed method on a set of 267 gastritis patients and a control group of 48 healthy volunteers. Test results show that CHDNet is a promising method in tongue image classification for the TCM study.

  8. An Approach for Leukemia Classification Based on Cooperative Game Theory

    Directory of Open Access Journals (Sweden)

    Atefeh Torkaman

    2011-01-01

    Full Text Available Hematological malignancies are the types of cancer that affect blood, bone marrow and lymph nodes. As these tissues are naturally connected through the immune system, a disease affecting one of them will often affect the others as well. The hematological malignancies include; Leukemia, Lymphoma, Multiple myeloma. Among them, leukemia is a serious malignancy that starts in blood tissues especially the bone marrow, where the blood is made. Researches show, leukemia is one of the common cancers in the world. So, the emphasis on diagnostic techniques and best treatments would be able to provide better prognosis and survival for patients. In this paper, an automatic diagnosis recommender system for classifying leukemia based on cooperative game is presented. Through out this research, we analyze the flow cytometry data toward the classification of leukemia into eight classes. We work on real data set from different types of leukemia that have been collected at Iran Blood Transfusion Organization (IBTO. Generally, the data set contains 400 samples taken from human leukemic bone marrow. This study deals with cooperative game used for classification according to different weights assigned to the markers. The proposed method is versatile as there are no constraints to what the input or output represent. This means that it can be used to classify a population according to their contributions. In other words, it applies equally to other groups of data. The experimental results show the accuracy rate of 93.12%, for classification and compared to decision tree (C4.5 with (90.16% in accuracy. The result demonstrates that cooperative game is very promising to be used directly for classification of leukemia as a part of Active Medical decision support system for interpretation of flow cytometry readout. This system could assist clinical hematologists to properly recognize different kinds of leukemia by preparing suggestions and this could improve the treatment

  9. An approach for leukemia classification based on cooperative game theory.

    Science.gov (United States)

    Torkaman, Atefeh; Charkari, Nasrollah Moghaddam; Aghaeipour, Mahnaz

    2011-01-01

    Hematological malignancies are the types of cancer that affect blood, bone marrow and lymph nodes. As these tissues are naturally connected through the immune system, a disease affecting one of them will often affect the others as well. The hematological malignancies include; Leukemia, Lymphoma, Multiple myeloma. Among them, leukemia is a serious malignancy that starts in blood tissues especially the bone marrow, where the blood is made. Researches show, leukemia is one of the common cancers in the world. So, the emphasis on diagnostic techniques and best treatments would be able to provide better prognosis and survival for patients. In this paper, an automatic diagnosis recommender system for classifying leukemia based on cooperative game is presented. Through out this research, we analyze the flow cytometry data toward the classification of leukemia into eight classes. We work on real data set from different types of leukemia that have been collected at Iran Blood Transfusion Organization (IBTO). Generally, the data set contains 400 samples taken from human leukemic bone marrow. This study deals with cooperative game used for classification according to different weights assigned to the markers. The proposed method is versatile as there are no constraints to what the input or output represent. This means that it can be used to classify a population according to their contributions. In other words, it applies equally to other groups of data. The experimental results show the accuracy rate of 93.12%, for classification and compared to decision tree (C4.5) with (90.16%) in accuracy. The result demonstrates that cooperative game is very promising to be used directly for classification of leukemia as a part of Active Medical decision support system for interpretation of flow cytometry readout. This system could assist clinical hematologists to properly recognize different kinds of leukemia by preparing suggestions and this could improve the treatment of leukemic

  10. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben

    Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based on the po......Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based...

  11. Contaminant classification using cosine distances based on multiple conventional sensors.

    Science.gov (United States)

    Liu, Shuming; Che, Han; Smith, Kate; Chang, Tian

    2015-02-01

    Emergent contamination events have a significant impact on water systems. After contamination detection, it is important to classify the type of contaminant quickly to provide support for remediation attempts. Conventional methods generally either rely on laboratory-based analysis, which requires a long analysis time, or on multivariable-based geometry analysis and sequence analysis, which is prone to being affected by the contaminant concentration. This paper proposes a new contaminant classification method, which discriminates contaminants in a real time manner independent of the contaminant concentration. The proposed method quantifies the similarities or dissimilarities between sensors' responses to different types of contaminants. The performance of the proposed method was evaluated using data from contaminant injection experiments in a laboratory and compared with a Euclidean distance-based method. The robustness of the proposed method was evaluated using an uncertainty analysis. The results show that the proposed method performed better in identifying the type of contaminant than the Euclidean distance based method and that it could classify the type of contaminant in minutes without significantly compromising the correct classification rate (CCR).

  12. Application of Bayesian Classification to Content-Based Data Management

    Science.gov (United States)

    Lynnes, Christopher; Berrick, S.; Gopalan, A.; Hua, X.; Shen, S.; Smith, P.; Yang, K-Y.; Wheeler, K.; Curry, C.

    2004-01-01

    The high volume of Earth Observing System data has proven to be challenging to manage for data centers and users alike. At the Goddard Earth Sciences Distributed Active Archive Center (GES DAAC), about 1 TB of new data are archived each day. Distribution to users is also about 1 TB/day. A substantial portion of this distribution is MODIS calibrated radiance data, which has a wide variety of uses. However, much of the data is not useful for a particular user's needs: for example, ocean color users typically need oceanic pixels that are free of cloud and sun-glint. The GES DAAC is using a simple Bayesian classification scheme to rapidly classify each pixel in the scene in order to support several experimental content-based data services for near-real-time MODIS calibrated radiance products (from Direct Readout stations). Content-based subsetting would allow distribution of, say, only clear pixels to the user if desired. Content-based subscriptions would distribute data to users only when they fit the user's usability criteria in their area of interest within the scene. Content-based cache management would retain more useful data on disk for easy online access. The classification may even be exploited in an automated quality assessment of the geolocation product. Though initially to be demonstrated at the GES DAAC, these techniques have applicability in other resource-limited environments, such as spaceborne data systems.

  13. Object-based Dimensionality Reduction in Land Surface Phenology Classification

    Directory of Open Access Journals (Sweden)

    Brian E. Bunker

    2016-11-01

    Full Text Available Unsupervised classification or clustering of multi-decadal land surface phenology provides a spatio-temporal synopsis of natural and agricultural vegetation response to environmental variability and anthropogenic activities. Notwithstanding the detailed temporal information available in calibrated bi-monthly normalized difference vegetation index (NDVI and comparable time series, typical pre-classification workflows average a pixel’s bi-monthly index within the larger multi-decadal time series. While this process is one practical way to reduce the dimensionality of time series with many hundreds of image epochs, it effectively dampens temporal variation from both intra and inter-annual observations related to land surface phenology. Through a novel application of object-based segmentation aimed at spatial (not temporal dimensionality reduction, all 294 image epochs from a Moderate Resolution Imaging Spectroradiometer (MODIS bi-monthly NDVI time series covering the northern Fertile Crescent were retained (in homogenous landscape units as unsupervised classification inputs. Given the inherent challenges of in situ or manual image interpretation of land surface phenology classes, a cluster validation approach based on transformed divergence enabled comparison between traditional and novel techniques. Improved intra-annual contrast was clearly manifest in rain-fed agriculture and inter-annual trajectories showed increased cluster cohesion, reducing the overall number of classes identified in the Fertile Crescent study area from 24 to 10. Given careful segmentation parameters, this spatial dimensionality reduction technique augments the value of unsupervised learning to generate homogeneous land surface phenology units. By combining recent scalable computational approaches to image segmentation, future work can pursue new global land surface phenology products based on the high temporal resolution signatures of vegetation index time series.

  14. Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences.

    Science.gov (United States)

    Ali, Safdar; Majid, Abdul

    2015-04-01

    The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Hydrophobicity classification of polymeric materials based on fractal dimension

    Directory of Open Access Journals (Sweden)

    Daniel Thomazini

    2008-12-01

    Full Text Available This study proposes a new method to obtain hydrophobicity classification (HC in high voltage polymer insulators. In the method mentioned, the HC was analyzed by fractal dimension (fd and its processing time was evaluated having as a goal the application in mobile devices. Texture images were created from spraying solutions produced of mixtures of isopropyl alcohol and distilled water in proportions, which ranged from 0 to 100% volume of alcohol (%AIA. Based on these solutions, the contact angles of the drops were measured and the textures were used as patterns for fractal dimension calculations.

  16. Parametric classification of handvein patterns based on texture features

    Science.gov (United States)

    Al Mahafzah, Harbi; Imran, Mohammad; Supreetha Gowda H., D.

    2018-04-01

    In this paper, we have developed Biometric recognition system adopting hand based modality Handvein,which has the unique pattern for each individual and it is impossible to counterfeit and fabricate as it is an internal feature. We have opted in choosing feature extraction algorithms such as LBP-visual descriptor, LPQ-blur insensitive texture operator, Log-Gabor-Texture descriptor. We have chosen well known classifiers such as KNN and SVM for classification. We have experimented and tabulated results of single algorithm recognition rate for Handvein under different distance measures and kernel options. The feature level fusion is carried out which increased the performance level.

  17. MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS

    Science.gov (United States)

    Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...

  18. CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.

    Science.gov (United States)

    Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng

    2017-05-18

    Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).

  19. Forest Classification Based on Forest texture in Northwest Yunnan Province

    Science.gov (United States)

    Wang, Jinliang; Gao, Yan; Wang, Xiaohua; Fu, Lei

    2014-03-01

    Forest texture is an intrinsic characteristic and an important visual feature of a forest ecological system. Full utilization of forest texture will be a great help in increasing the accuracy of forest classification based on remote sensed data. Taking Shangri-La as a study area, forest classification has been based on the texture. The results show that: (1) From the texture abundance, texture boundary, entropy as well as visual interpretation, the combination of Grayscale-gradient co-occurrence matrix and wavelet transformation is much better than either one of both ways of forest texture information extraction; (2) During the forest texture information extraction, the size of the texture-suitable window determined by the semi-variogram method depends on the forest type (evergreen broadleaf forest is 3×3, deciduous broadleaf forest is 5×5, etc.). (3)While classifying forest based on forest texture information, the texture factor assembly differs among forests: Variance Heterogeneity and Correlation should be selected when the window is between 3×3 and 5×5 Mean, Correlation, and Entropy should be used when the window in the range of 7×7 to 19×19 and Correlation, Second Moment, and Variance should be used when the range is larger than 21×21.

  20. Forest Classification Based on Forest texture in Northwest Yunnan Province

    International Nuclear Information System (INIS)

    Wang, Jinliang; Gao, Yan; Fu, Lei; Wang, Xiaohua

    2014-01-01

    Forest texture is an intrinsic characteristic and an important visual feature of a forest ecological system. Full utilization of forest texture will be a great help in increasing the accuracy of forest classification based on remote sensed data. Taking Shangri-La as a study area, forest classification has been based on the texture. The results show that: (1) From the texture abundance, texture boundary, entropy as well as visual interpretation, the combination of Grayscale-gradient co-occurrence matrix and wavelet transformation is much better than either one of both ways of forest texture information extraction; (2) During the forest texture information extraction, the size of the texture-suitable window determined by the semi-variogram method depends on the forest type (evergreen broadleaf forest is 3×3, deciduous broadleaf forest is 5×5, etc.). (3)While classifying forest based on forest texture information, the texture factor assembly differs among forests: Variance Heterogeneity and Correlation should be selected when the window is between 3×3 and 5×5; Mean, Correlation, and Entropy should be used when the window in the range of 7×7 to 19×19; and Correlation, Second Moment, and Variance should be used when the range is larger than 21×21

  1. Task Classification Based Energy-Aware Consolidation in Clouds

    Directory of Open Access Journals (Sweden)

    HeeSeok Choi

    2016-01-01

    Full Text Available We consider a cloud data center, in which the service provider supplies virtual machines (VMs on hosts or physical machines (PMs to its subscribers for computation in an on-demand fashion. For the cloud data center, we propose a task consolidation algorithm based on task classification (i.e., computation-intensive and data-intensive and resource utilization (e.g., CPU and RAM. Furthermore, we design a VM consolidation algorithm to balance task execution time and energy consumption without violating a predefined service level agreement (SLA. Unlike the existing research on VM consolidation or scheduling that applies none or single threshold schemes, we focus on a double threshold (upper and lower scheme, which is used for VM consolidation. More specifically, when a host operates with resource utilization below the lower threshold, all the VMs on the host will be scheduled to be migrated to other hosts and then the host will be powered down, while when a host operates with resource utilization above the upper threshold, a VM will be migrated to avoid using 100% of resource utilization. Based on experimental performance evaluations with real-world traces, we prove that our task classification based energy-aware consolidation algorithm (TCEA achieves a significant energy reduction without incurring predefined SLA violations.

  2. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  3. On Ensemble Nonlinear Kalman Filtering with Symmetric Analysis Ensembles

    KAUST Repository

    Luo, Xiaodong

    2010-09-19

    The ensemble square root filter (EnSRF) [1, 2, 3, 4] is a popular method for data assimilation in high dimensional systems (e.g., geophysics models). Essentially the EnSRF is a Monte Carlo implementation of the conventional Kalman filter (KF) [5, 6]. It is mainly different from the KF at the prediction steps, where it is some ensembles, rather then the means and covariance matrices, of the system state that are propagated forward. In doing this, the EnSRF is computationally more efficient than the KF, since propagating a covariance matrix forward in high dimensional systems is prohibitively expensive. In addition, the EnSRF is also very convenient in implementation. By propagating the ensembles of the system state, the EnSRF can be directly applied to nonlinear systems without any change in comparison to the assimilation procedures in linear systems. However, by adopting the Monte Carlo method, the EnSRF also incurs certain sampling errors. One way to alleviate this problem is to introduce certain symmetry to the ensembles, which can reduce the sampling errors and spurious modes in evaluation of the means and covariances of the ensembles [7]. In this contribution, we present two methods to produce symmetric ensembles. One is based on the unscented transform [8, 9], which leads to the unscented Kalman filter (UKF) [8, 9] and its variant, the ensemble unscented Kalman filter (EnUKF) [7]. The other is based on Stirling’s interpolation formula (SIF), which results in the divided difference filter (DDF) [10]. Here we propose a simplified divided difference filter (sDDF) in the context of ensemble filtering. The similarity and difference between the sDDF and the EnUKF will be discussed. Numerical experiments will also be conducted to investigate the performance of the sDDF and the EnUKF, and compare them to a well‐established EnSRF, the ensemble transform Kalman filter (ETKF) [2].

  4. Joint Probability-Based Neuronal Spike Train Classification

    Directory of Open Access Journals (Sweden)

    Yan Chen

    2009-01-01

    Full Text Available Neuronal spike trains are used by the nervous system to encode and transmit information. Euclidean distance-based methods (EDBMs have been applied to quantify the similarity between temporally-discretized spike trains and model responses. In this study, using the same discretization procedure, we developed and applied a joint probability-based method (JPBM to classify individual spike trains of slowly adapting pulmonary stretch receptors (SARs. The activity of individual SARs was recorded in anaesthetized, paralysed adult male rabbits, which were artificially-ventilated at constant rate and one of three different volumes. Two-thirds of the responses to the 600 stimuli presented at each volume were used to construct three response models (one for each stimulus volume consisting of a series of time bins, each with spike probabilities. The remaining one-third of the responses where used as test responses to be classified into one of the three model responses. This was done by computing the joint probability of observing the same series of events (spikes or no spikes, dictated by the test response in a given model and determining which probability of the three was highest. The JPBM generally produced better classification accuracy than the EDBM, and both performed well above chance. Both methods were similarly affected by variations in discretization parameters, response epoch duration, and two different response alignment strategies. Increasing bin widths increased classification accuracy, which also improved with increased observation time, but primarily during periods of increasing lung inflation. Thus, the JPBM is a simple and effective method performing spike train classification.

  5. Quantum ensembles of quantum classifiers.

    Science.gov (United States)

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  6. Ensemble-based assimilation of fractional snow-covered area satellite retrievals to estimate the snow distribution at Arctic sites

    Science.gov (United States)

    Aalstad, Kristoffer; Westermann, Sebastian; Vikhamar Schuler, Thomas; Boike, Julia; Bertino, Laurent

    2018-01-01

    With its high albedo, low thermal conductivity and large water storing capacity, snow strongly modulates the surface energy and water balance, which makes it a critical factor in mid- to high-latitude and mountain environments. However, estimating the snow water equivalent (SWE) is challenging in remote-sensing applications already at medium spatial resolutions of 1 km. We present an ensemble-based data assimilation framework that estimates the peak subgrid SWE distribution (SSD) at the 1 km scale by assimilating fractional snow-covered area (fSCA) satellite retrievals in a simple snow model forced by downscaled reanalysis data. The basic idea is to relate the timing of the snow cover depletion (accessible from satellite products) to the peak SSD. Peak subgrid SWE is assumed to be lognormally distributed, which can be translated to a modeled time series of fSCA through the snow model. Assimilation of satellite-derived fSCA facilitates the estimation of the peak SSD, while taking into account uncertainties in both the model and the assimilated data sets. As an extension to previous studies, our method makes use of the novel (to snow data assimilation) ensemble smoother with multiple data assimilation (ES-MDA) scheme combined with analytical Gaussian anamorphosis to assimilate time series of Moderate Resolution Imaging Spectroradiometer (MODIS) and Sentinel-2 fSCA retrievals. The scheme is applied to Arctic sites near Ny-Ålesund (79° N, Svalbard, Norway) where field measurements of fSCA and SWE distributions are available. The method is able to successfully recover accurate estimates of peak SSD on most of the occasions considered. Through the ES-MDA assimilation, the root-mean-square error (RMSE) for the fSCA, peak mean SWE and peak subgrid coefficient of variation is improved by around 75, 60 and 20 %, respectively, when compared to the prior, yielding RMSEs of 0.01, 0.09 m water equivalent (w.e.) and 0.13, respectively. The ES-MDA either outperforms or at least

  7. Ensemble-based assimilation of fractional snow-covered area satellite retrievals to estimate the snow distribution at Arctic sites

    Directory of Open Access Journals (Sweden)

    K. Aalstad

    2018-01-01

    Full Text Available With its high albedo, low thermal conductivity and large water storing capacity, snow strongly modulates the surface energy and water balance, which makes it a critical factor in mid- to high-latitude and mountain environments. However, estimating the snow water equivalent (SWE is challenging in remote-sensing applications already at medium spatial resolutions of 1 km. We present an ensemble-based data assimilation framework that estimates the peak subgrid SWE distribution (SSD at the 1 km scale by assimilating fractional snow-covered area (fSCA satellite retrievals in a simple snow model forced by downscaled reanalysis data. The basic idea is to relate the timing of the snow cover depletion (accessible from satellite products to the peak SSD. Peak subgrid SWE is assumed to be lognormally distributed, which can be translated to a modeled time series of fSCA through the snow model. Assimilation of satellite-derived fSCA facilitates the estimation of the peak SSD, while taking into account uncertainties in both the model and the assimilated data sets. As an extension to previous studies, our method makes use of the novel (to snow data assimilation ensemble smoother with multiple data assimilation (ES-MDA scheme combined with analytical Gaussian anamorphosis to assimilate time series of Moderate Resolution Imaging Spectroradiometer (MODIS and Sentinel-2 fSCA retrievals. The scheme is applied to Arctic sites near Ny-Ålesund (79° N, Svalbard, Norway where field measurements of fSCA and SWE distributions are available. The method is able to successfully recover accurate estimates of peak SSD on most of the occasions considered. Through the ES-MDA assimilation, the root-mean-square error (RMSE for the fSCA, peak mean SWE and peak subgrid coefficient of variation is improved by around 75, 60 and 20 %, respectively, when compared to the prior, yielding RMSEs of 0.01, 0.09 m water equivalent (w.e. and 0.13, respectively. The ES-MDA either

  8. Dynamic neuronal ensembles: Issues in representing structure change in object-oriented, biologically-based brain models

    Energy Technology Data Exchange (ETDEWEB)

    Vahie, S.; Zeigler, B.P.; Cho, H. [Univ. of Arizona, Tucson, AZ (United States)

    1996-12-31

    This paper describes the structure of dynamic neuronal ensembles (DNEs). DNEs represent a new paradigm for learning, based on biological neural networks that use variable structures. We present a computational neural element that demonstrates biological neuron functionality such as neurotransmitter feedback absolute refractory period and multiple output potentials. More specifically, we will develop a network of neural elements that have the ability to dynamically strengthen, weaken, add and remove interconnections. We demonstrate that the DNE is capable of performing dynamic modifications to neuron connections and exhibiting biological neuron functionality. In addition to its applications for learning, DNEs provide an excellent environment for testing and analysis of biological neural systems. An example of habituation and hyper-sensitization in biological systems, using a neural circuit from a snail is presented and discussed. This paper provides an insight into the DNE paradigm using models developed and simulated in DEVS.

  9. A novel approach for baseline correction in 1H-MRS signals based on ensemble empirical mode decomposition.

    Science.gov (United States)

    Parto Dezfouli, Mohammad Ali; Dezfouli, Mohsen Parto; Rad, Hamidreza Saligheh

    2014-01-01

    Proton magnetic resonance spectroscopy ((1)H-MRS) is a non-invasive diagnostic tool for measuring biochemical changes in the human body. Acquired (1)H-MRS signals may be corrupted due to a wideband baseline signal generated by macromolecules. Recently, several methods have been developed for the correction of such baseline signals, however most of them are not able to estimate baseline in complex overlapped signal. In this study, a novel automatic baseline correction method is proposed for (1)H-MRS spectra based on ensemble empirical mode decomposition (EEMD). This investigation was applied on both the simulated data and the in-vivo (1)H-MRS of human brain signals. Results justify the efficiency of the proposed method to remove the baseline from (1)H-MRS signals.

  10. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity

  11. Chemometric classification of casework arson samples based on gasoline content.

    Science.gov (United States)

    Sinkov, Nikolai A; Sandercock, P Mark L; Harynuk, James J

    2014-02-01

    Detection and identification of ignitable liquids (ILs) in arson debris is a critical part of arson investigations. The challenge of this task is due to the complex and unpredictable chemical nature of arson debris, which also contains pyrolysis products from the fire. ILs, most commonly gasoline, are complex chemical mixtures containing hundreds of compounds that will be consumed or otherwise weathered by the fire to varying extents depending on factors such as temperature, air flow, the surface on which IL was placed, etc. While methods such as ASTM E-1618 are effective, data interpretation can be a costly bottleneck in the analytical process for some laboratories. In this study, we address this issue through the application of chemometric tools. Prior to the application of chemometric tools such as PLS-DA and SIMCA, issues of chromatographic alignment and variable selection need to be addressed. Here we use an alignment strategy based on a ladder consisting of perdeuterated n-alkanes. Variable selection and model optimization was automated using a hybrid backward elimination (BE) and forward selection (FS) approach guided by the cluster resolution (CR) metric. In this work, we demonstrate the automated construction, optimization, and application of chemometric tools to casework arson data. The resulting PLS-DA and SIMCA classification models, trained with 165 training set samples, have provided classification of 55 validation set samples based on gasoline content with 100% specificity and sensitivity. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  12. Interactive classification and content-based retrieval of tissue images

    Science.gov (United States)

    Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

    2002-11-01

    We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.

  13. Drunk driving detection based on classification of multivariate time series.

    Science.gov (United States)

    Li, Zhenlong; Jin, Xue; Zhao, Xiaohua

    2015-09-01

    This paper addresses the problem of detecting drunk driving based on classification of multivariate time series. First, driving performance measures were collected from a test in a driving simulator located in the Traffic Research Center, Beijing University of Technology. Lateral position and steering angle were used to detect drunk driving. Second, multivariate time series analysis was performed to extract the features. A piecewise linear representation was used to represent multivariate time series. A bottom-up algorithm was then employed to separate multivariate time series. The slope and time interval of each segment were extracted as the features for classification. Third, a support vector machine classifier was used to classify driver's state into two classes (normal or drunk) according to the extracted features. The proposed approach achieved an accuracy of 80.0%. Drunk driving detection based on the analysis of multivariate time series is feasible and effective. The approach has implications for drunk driving detection. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.

  14. [Galaxy/quasar classification based on nearest neighbor method].

    Science.gov (United States)

    Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun

    2011-09-01

    With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.

  15. Robust Pedestrian Classification Based on Hierarchical Kernel Sparse Representation

    Directory of Open Access Journals (Sweden)

    Rui Sun

    2016-08-01

    Full Text Available Vision-based pedestrian detection has become an active topic in computer vision and autonomous vehicles. It aims at detecting pedestrians appearing ahead of the vehicle using a camera so that autonomous vehicles can assess the danger and take action. Due to varied illumination and appearance, complex background and occlusion pedestrian detection in outdoor environments is a difficult problem. In this paper, we propose a novel hierarchical feature extraction and weighted kernel sparse representation model for pedestrian classification. Initially, hierarchical feature extraction based on a CENTRIST descriptor is used to capture discriminative structures. A max pooling operation is used to enhance the invariance of varying appearance. Then, a kernel sparse representation model is proposed to fully exploit the discrimination information embedded in the hierarchical local features, and a Gaussian weight function as the measure to effectively handle the occlusion in pedestrian images. Extensive experiments are conducted on benchmark databases, including INRIA, Daimler, an artificially generated dataset and a real occluded dataset, demonstrating the more robust performance of the proposed method compared to state-of-the-art pedestrian classification methods.

  16. Style-based classification of Chinese ink and wash paintings

    Science.gov (United States)

    Sheng, Jiachuan; Jiang, Jianmin

    2013-09-01

    Following the fact that a large collection of ink and wash paintings (IWP) is being digitized and made available on the Internet, their automated content description, analysis, and management are attracting attention across research communities. While existing research in relevant areas is primarily focused on image processing approaches, a style-based algorithm is proposed to classify IWPs automatically by their authors. As IWPs do not have colors or even tones, the proposed algorithm applies edge detection to locate the local region and detect painting strokes to enable histogram-based feature extraction and capture of important cues to reflect the styles of different artists. Such features are then applied to drive a number of neural networks in parallel to complete the classification, and an information entropy balanced fusion is proposed to make an integrated decision for the multiple neural network classification results in which the entropy is used as a pointer to combine the global and local features. Evaluations via experiments support that the proposed algorithm achieves good performances, providing excellent potential for computerized analysis and management of IWPs.

  17. On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis

    Directory of Open Access Journals (Sweden)

    Asriyanti Indah Pratiwi

    2018-01-01

    Full Text Available Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.

  18. Simultaneous escaping of explicit and hidden free energy barriers: application of the orthogonal space random walk strategy in generalized ensemble based conformational sampling.

    Science.gov (United States)

    Zheng, Lianqing; Chen, Mengen; Yang, Wei

    2009-06-21

    To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.

  19. Three-dimensional visualization of ensemble weather forecasts - Part 2: Forecasting warm conveyor belt situations for aircraft-based field campaigns

    Science.gov (United States)

    Rautenhaus, M.; Grams, C. M.; Schäfler, A.; Westermann, R.

    2015-07-01

    We present the application of interactive three-dimensional (3-D) visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 (THORPEX - North Atlantic Waveguide and Downstream Impact Experiment) campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts (WCBs) has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the European Centre for Medium Range Weather Forecasts (ECMWF) ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and grid spacing of the forecast wind field. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (3 to 7 days before take-off).

  20. Three-dimensional visualization of ensemble weather forecasts – Part 2: Forecasting warm conveyor belt situations for aircraft-based field campaigns

    Directory of Open Access Journals (Sweden)

    M. Rautenhaus

    2015-07-01

    Full Text Available We present the application of interactive three-dimensional (3-D visualization of ensemble weather predictions to forecasting warm conveyor belt situations during aircraft-based atmospheric research campaigns. Motivated by forecast requirements of the T-NAWDEX-Falcon 2012 (THORPEX – North Atlantic Waveguide and Downstream Impact Experiment campaign, a method to predict 3-D probabilities of the spatial occurrence of warm conveyor belts (WCBs has been developed. Probabilities are derived from Lagrangian particle trajectories computed on the forecast wind fields of the European Centre for Medium Range Weather Forecasts (ECMWF ensemble prediction system. Integration of the method into the 3-D ensemble visualization tool Met.3D, introduced in the first part of this study, facilitates interactive visualization of WCB features and derived probabilities in the context of the ECMWF ensemble forecast. We investigate the sensitivity of the method with respect to trajectory seeding and grid spacing of the forecast wind field. Furthermore, we propose a visual analysis method to quantitatively analyse the contribution of ensemble members to a probability region and, thus, to assist the forecaster in interpreting the obtained probabilities. A case study, revisiting a forecast case from T-NAWDEX-Falcon, illustrates the practical application of Met.3D and demonstrates the use of 3-D and uncertainty visualization for weather forecasting and for planning flight routes in the medium forecast range (3 to 7 days before take-off.

  1. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Directory of Open Access Journals (Sweden)

    Enrico Glaab

    Full Text Available Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  2. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.

    Science.gov (United States)

    Glaab, Enrico; Bacardit, Jaume; Garibaldi, Jonathan M; Krasnogor, Natalio

    2012-01-01

    Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.

  3. Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    LI Jian-Wei

    2014-08-01

    Full Text Available On the basis of the cluster validity function based on geometric probability in literature [1, 2], propose a cluster analysis method based on geometric probability to process large amount of data in rectangular area. The basic idea is top-down stepwise refinement, firstly categories then subcategories. On all clustering levels, use the cluster validity function based on geometric probability firstly, determine clusters and the gathering direction, then determine the center of clustering and the border of clusters. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.

  4. Bearing Fault Classification Based on Conditional Random Field

    Directory of Open Access Journals (Sweden)

    Guofeng Wang

    2013-01-01

    Full Text Available Condition monitoring of rolling element bearing is paramount for predicting the lifetime and performing effective maintenance of the mechanical equipment. To overcome the drawbacks of the hidden Markov model (HMM and improve the diagnosis accuracy, conditional random field (CRF model based classifier is proposed. In this model, the feature vectors sequences and the fault categories are linked by an undirected graphical model in which their relationship is represented by a global conditional probability distribution. In comparison with the HMM, the main advantage of the CRF model is that it can depict the temporal dynamic information between the observation sequences and state sequences without assuming the independence of the input feature vectors. Therefore, the interrelationship between the adjacent observation vectors can also be depicted and integrated into the model, which makes the classifier more robust and accurate than the HMM. To evaluate the effectiveness of the proposed method, four kinds of bearing vibration signals which correspond to normal, inner race pit, outer race pit and roller pit respectively are collected from the test rig. And the CRF and HMM models are built respectively to perform fault classification by taking the sub band energy features of wavelet packet decomposition (WPD as the observation sequences. Moreover, K-fold cross validation method is adopted to improve the evaluation accuracy of the classifier. The analysis and comparison under different fold times show that the accuracy rate of classification using the CRF model is higher than the HMM. This method brings some new lights on the accurate classification of the bearing faults.

  5. Ensemble Empirical Mode Decomposition based methodology for ultrasonic testing of coarse grain austenitic stainless steels.

    Science.gov (United States)

    Sharma, Govind K; Kumar, Anish; Jayakumar, T; Purnachandra Rao, B; Mariyappa, N

    2015-03-01

    A signal processing methodology is proposed in this paper for effective reconstruction of ultrasonic signals in coarse grained high scattering austenitic stainless steel. The proposed methodology is comprised of the Ensemble Empirical Mode Decomposition (EEMD) processing of ultrasonic signals and application of signal minimisation algorithm on selected Intrinsic Mode Functions (IMFs) obtained by EEMD. The methodology is applied to ultrasonic signals obtained from austenitic stainless steel specimens of different grain size, with and without defects. The influence of probe frequency and data length of a signal on EEMD decomposition is also investigated. For a particular sampling rate and probe frequency, the same range of IMFs can be used to reconstruct the ultrasonic signal, irrespective of the grain size in the range of 30-210 μm investigated in this study. This methodology is successfully employed for detection of defects in a 50mm thick coarse grain austenitic stainless steel specimens. Signal to noise ratio improvement of better than 15 dB is observed for the ultrasonic signal obtained from a 25 mm deep flat bottom hole in 200 μm grain size specimen. For ultrasonic signals obtained from defects at different depths, a minimum of 7 dB extra enhancement in SNR is achieved as compared to the sum of selected IMF approach. The application of minimisation algorithm with EEMD processed signal in the proposed methodology proves to be effective for adaptive signal reconstruction with improved signal to noise ratio. This methodology was further employed for successful imaging of defects in a B-scan. Copyright © 2014. Published by Elsevier B.V.

  6. Future hydroclimatological changes in South America based on an ensemble of regional climate models

    Science.gov (United States)

    Zaninelli, Pablo G.; Menéndez, Claudio G.; Falco, Magdalena; López-Franca, Noelia; Carril, Andrea F.

    2018-05-01

    Changes between two time slices (1961-1990 and 2071-2100) in hydroclimatological conditions for South America have been examined using an ensemble of regional climate models. Annual mean precipitation (P), evapotranspiration (E) and potential evapotranspiration (EP) are jointly considered through the balances of land water and energy. Drying or wetting conditions, associated with changes in land water availability and atmospheric demand, are analysed in the Budyko space. The water supply limit (E limited by P) is exceeded at about 2% of the grid points, while the energy limit to evapotranspiration (E = EP) is overall valid. Most of the continent, except for the southeast and some coastal areas, presents a shift toward drier conditions related to a decrease in water availability (the evaporation rate E/P increases) and, mostly over much of Brazil, to an increase in the aridity index (V = EP/P). These changes suggest less humid conditions with decreasing surface runoff over Amazonia and the Brazilian Highlands. In contrast, Argentina and the coasts of Ecuador and Peru are characterized by a tendency toward wetter conditions associated with an increase of water availability and a decrease of aridity index, primarily due to P increasing faster than both E and EP. This trend towards wetter soil conditions suggest that the chances of having larger periods of flooding and enhanced river discharges would increase over parts of southeastern South America. Interannual variability increases with V (for a given time slice) and with climate change (for a given aridity regimen). There are opposite interannual variability responses to the cliamte change in Argentina and Brazil by which the variability increases over the Brazilian Highlands and decreases in central-eastern Argentina.

  7. Design ensemble machine learning model for breast cancer diagnosis.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Hsieh, Sung-Huai; Cheng, Po-Hsun; Chen, Chi-Huang; Hsu, Kai-Ping; Lee, I-Shun; Wang, Zhenyu; Lai, Feipei

    2012-10-01

    In this paper, we classify the breast cancer of medical diagnostic data. Information gain has been adapted for feature selections. Neural fuzzy (NF), k-nearest neighbor (KNN), quadratic classifier (QC), each single model scheme as well as their associated, ensemble ones have been developed for classifications. In addition, a combined ensemble model with these three schemes has been constructed for further validations. The experimental results indicate that the ensemble learning performs better than individual single ones. Moreover, the combined ensemble model illustrates the highest accuracy of classifications for the breast cancer among all models.

  8. A novel method for human age group classification based on

    Directory of Open Access Journals (Sweden)

    Anuradha Yarlagadda

    2015-10-01

    Full Text Available In the computer vision community, easy categorization of a person’s facial image into various age groups is often quite precise and is not pursued effectively. To address this problem, which is an important area of research, the present paper proposes an innovative method of age group classification system based on the Correlation Fractal Dimension of complex facial image. Wrinkles appear on the face with aging thereby changing the facial edges of the image. The proposed method is rotation and poses invariant. The present paper concentrates on developing an innovative technique that classifies facial images into four categories i.e. child image (0–15, young adult image (15–30, middle-aged adult image (31–50, and senior adult image (>50 based on correlation FD value of a facial edge image.

  9. Automatic classification of visual evoked potentials based on wavelet decomposition

    Science.gov (United States)

    Stasiakiewicz, Paweł; Dobrowolski, Andrzej P.; Tomczykiewicz, Kazimierz

    2017-04-01

    Diagnosis of part of the visual system, that is responsible for conducting compound action potential, is generally based on visual evoked potentials generated as a result of stimulation of the eye by external light source. The condition of patient's visual path is assessed by set of parameters that describe the time domain characteristic extremes called waves. The decision process is compound therefore diagnosis significantly depends on experience of a doctor. The authors developed a procedure - based on wavelet decomposition and linear discriminant analysis - that ensures automatic classification of visual evoked potentials. The algorithm enables to assign individual case to normal or pathological class. The proposed classifier has a 96,4% sensitivity at 10,4% probability of false alarm in a group of 220 cases and area under curve ROC equals to 0,96 which, from the medical point of view, is a very good result.

  10. Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

    Science.gov (United States)

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453

  11. Evaluating model performance of an ensemble-based chemical data assimilation system during INTEX-B field mission

    Directory of Open Access Journals (Sweden)

    A. F. Arellano Jr.

    2007-11-01

    Full Text Available We present a global chemical data assimilation system using a global atmosphere model, the Community Atmosphere Model (CAM3 with simplified chemistry and the Data Assimilation Research Testbed (DART assimilation package. DART is a community software facility for assimilation studies using the ensemble Kalman filter approach. Here, we apply the assimilation system to constrain global tropospheric carbon monoxide (CO by assimilating meteorological observations of temperature and horizontal wind velocity and satellite CO retrievals from the Measurement of Pollution in the Troposphere (MOPITT satellite instrument. We verify the system performance using independent CO observations taken on board the NSF/NCAR C-130 and NASA DC-8 aircrafts during the April 2006 part of the Intercontinental Chemical Transport Experiment (INTEX-B. Our evaluations show that MOPITT data assimilation provides significant improvements in terms of capturing the observed CO variability relative to no MOPITT assimilation (i.e. the correlation improves from 0.62 to 0.71, significant at 99% confidence. The assimilation provides evidence of median CO loading of about 150 ppbv at 700 hPa over the NE Pacific during April 2006. This is marginally higher than the modeled CO with no MOPITT assimilation (~140 ppbv. Our ensemble-based estimates of model uncertainty also show model overprediction over the source region (i.e. China and underprediction over the NE Pacific, suggesting model errors that cannot be readily explained by emissions alone. These results have important implications for improving regional chemical forecasts and for inverse modeling of CO sources and further demonstrate the utility of the assimilation system in comparing non-coincident measurements, e.g. comparing satellite retrievals of CO with in-situ aircraft measurements.

  12. Neighborhood Hypergraph Based Classification Algorithm for Incomplete Information System

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2015-01-01

    Full Text Available The problem of classification in incomplete information system is a hot issue in intelligent information processing. Hypergraph is a new intelligent method for machine learning. However, it is hard to process the incomplete information system by the traditional hypergraph, which is due to two reasons: (1 the hyperedges are generated randomly in traditional hypergraph model; (2 the existing methods are unsuitable to deal with incomplete information system, for the sake of missing values in incomplete information system. In this paper, we propose a novel classification algorithm for incomplete information system based on hypergraph model and rough set theory. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyperedges. After that, we can obtain a good classifier. The proposed approach is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, NavieBayes, and KNN. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC, and F-measure.

  13. Comparison Effectiveness of Pixel Based Classification and Object Based Classification Using High Resolution Image In Floristic Composition Mapping (Study Case: Gunung Tidar Magelang City)

    Science.gov (United States)

    Ardha Aryaguna, Prama; Danoedoro, Projo

    2016-11-01

    Developments of analysis remote sensing have same way with development of technology especially in sensor and plane. Now, a lot of image have high spatial and radiometric resolution, that's why a lot information. Vegetation object analysis such floristic composition got a lot advantage of that development. Floristic composition can be interpreted using a lot of method such pixel based classification and object based classification. The problems for pixel based method on high spatial resolution image are salt and paper who appear in result of classification. The purpose of this research are compare effectiveness between pixel based classification and object based classification for composition vegetation mapping on high resolution image Worldview-2. The results show that pixel based classification using majority 5×5 kernel windows give the highest accuracy between another classifications. The highest accuracy is 73.32% from image Worldview-2 are being radiometric corrected level surface reflectance, but for overall accuracy in every class, object based are the best between another methods. Reviewed from effectiveness aspect, pixel based are more effective then object based for vegetation composition mapping in Tidar forest.

  14. Simple adaptive sparse representation based classification schemes for EEG based brain-computer interface applications.

    Science.gov (United States)

    Shin, Younghak; Lee, Seungchan; Ahn, Minkyu; Cho, Hohyun; Jun, Sung Chan; Lee, Heung-No

    2015-11-01

    One of the main problems related to electroencephalogram (EEG) based brain-computer interface (BCI) systems is the non-stationarity of the underlying EEG signals. This results in the deterioration of the classification performance during experimental sessions. Therefore, adaptive classification techniques are required for EEG based BCI applications. In this paper, we propose simple adaptive sparse representation based classification (SRC) schemes. Supervised and unsupervised dictionary update techniques for new test data and a dictionary modification method by using the incoherence measure of the training data are investigated. The proposed methods are very simple and additional computation for the re-training of the classifier is not needed. The proposed adaptive SRC schemes are evaluated using two BCI experimental datasets. The proposed methods are assessed by comparing classification results with the conventional SRC and other adaptive classification methods. On the basis of the results, we find that the proposed adaptive schemes show relatively improved classification accuracy as compared to conventional methods without requiring additional computation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Radiological classification of renal angiomyolipomas based on 127 tumors

    Directory of Open Access Journals (Sweden)

    Prando Adilson

    2003-01-01

    Full Text Available PURPOSE: Demonstrate radiological findings of 127 angiomyolipomas (AMLs and propose a classification based on the radiological evidence of fat. MATERIALS AND METHODS: The imaging findings of 85 consecutive patients with AMLs: isolated (n = 73, multiple without tuberous sclerosis (TS (n = 4 and multiple with TS (n = 8, were retrospectively reviewed. Eighteen AMLs (14% presented with hemorrhage. All patients were submitted to a dedicated helical CT or magnetic resonance studies. All hemorrhagic and non-hemorrhagic lesions were grouped together since our objective was to analyze the presence of detectable fat. Out of 85 patients, 53 were monitored and 32 were treated surgically due to large perirenal component (n = 13, hemorrhage (n = 11 and impossibility of an adequate preoperative characterization (n = 8. There was not a case of renal cell carcinoma (RCC with fat component in this group of patients. RESULTS: Based on the presence and amount of detectable fat within the lesion, AMLs were classified in 4 distinct radiological patterns: Pattern-I, predominantly fatty (usually less than 2 cm in diameter and intrarenal: 54%; Pattern-II, partially fatty (intrarenal or exophytic: 29%; Pattern-III, minimally fatty (most exophytic and perirenal: 11%; and Pattern-IV, without fat (most exophytic and perirenal: 6%. CONCLUSIONS: This proposed classification might be useful to understand the imaging manifestations of AMLs, their differential diagnosis and determine when further radiological evaluation would be necessary. Small (< 1.5 cm, pattern-I AMLs tend to be intra-renal, homogeneous and predominantly fatty. As they grow they tend to be partially or completely exophytic and heterogeneous (patterns II and III. The rare pattern-IV AMLs, however, can be small or large, intra-renal or exophytic but are always homogeneous and hyperdense mass. Since no renal cell carcinoma was found in our series, from an evidence-based practice, all renal mass with detectable

  16. Representing Color Ensembles.

    Science.gov (United States)

    Chetverikov, Andrey; Campana, Gianluca; Kristjánsson, Árni

    2017-10-01

    Colors are rarely uniform, yet little is known about how people represent color distributions. We introduce a new method for studying color ensembles based on intertrial learning in visual search. Participants looked for an oddly colored diamond among diamonds with colors taken from either uniform or Gaussian color distributions. On test trials, the targets had various distances in feature space from the mean of the preceding distractor color distribution. Targets on test trials therefore served as probes into probabilistic representations of distractor colors. Test-trial response times revealed a striking similarity between the physical distribution of colors and their internal representations. The results demonstrate that the visual system represents color ensembles in a more detailed way than previously thought, coding not only mean and variance but, most surprisingly, the actual shape (uniform or Gaussian) of the distribution of colors in the environment.

  17. Tailored Random Graph Ensembles

    International Nuclear Information System (INIS)

    Roberts, E S; Annibale, A; Coolen, A C C

    2013-01-01

    Tailored graph ensembles are a developing bridge between biological networks and statistical mechanics. The aim is to use this concept to generate a suite of rigorous tools that can be used to quantify and compare the topology of cellular signalling networks, such as protein-protein interaction networks and gene regulation networks. We calculate exact and explicit formulae for the leading orders in the system size of the Shannon entropies of random graph ensembles constrained with degree distribution and degree-degree correlation. We also construct an ergodic detailed balance Markov chain with non-trivial acceptance probabilities which converges to a strictly uniform measure and is based on edge swaps that conserve all degrees. The acceptance probabilities can be generalized to define Markov chains that target any alternative desired measure on the space of directed or undirected graphs, in order to generate graphs with more sophisticated topological features.

  18. On Ensemble Nonlinear Kalman Filtering with Symmetric Analysis Ensembles

    KAUST Repository

    Luo, Xiaodong; Hoteit, Ibrahim; Moroz, Irene M.

    2010-01-01

    However, by adopting the Monte Carlo method, the EnSRF also incurs certain sampling errors. One way to alleviate this problem is to introduce certain symmetry to the ensembles, which can reduce the sampling errors and spurious modes in evaluation of the means and covariances of the ensembles [7]. In this contribution, we present two methods to produce symmetric ensembles. One is based on the unscented transform [8, 9], which leads to the unscented Kalman filter (UKF) [8, 9] and its variant, the ensemble unscented Kalman filter (EnUKF) [7]. The other is based on Stirling’s interpolation formula (SIF), which results in the divided difference filter (DDF) [10]. Here we propose a simplified divided difference filter (sDDF) in the context of ensemble filtering. The similarity and difference between the sDDF and the EnUKF will be discussed. Numerical experiments will also be conducted to investigate the performance of the sDDF and the EnUKF, and compare them to a well‐established EnSRF, the ensemble transform Kalman filter (ETKF) [2].

  19. Deep neural network and noise classification-based speech enhancement

    Science.gov (United States)

    Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

    2017-07-01

    In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.

  20. A robust probabilistic collaborative representation based classification for multimodal biometrics

    Science.gov (United States)

    Zhang, Jing; Liu, Huanxi; Ding, Derui; Xiao, Jianli

    2018-04-01

    Most of the traditional biometric recognition systems perform recognition with a single biometric indicator. These systems have suffered noisy data, interclass variations, unacceptable error rates, forged identity, and so on. Due to these inherent problems, it is not valid that many researchers attempt to enhance the performance of unimodal biometric systems with single features. Thus, multimodal biometrics is investigated to reduce some of these defects. This paper proposes a new multimodal biometric recognition approach by fused faces and fingerprints. For more recognizable features, the proposed method extracts block local binary pattern features for all modalities, and then combines them into a single framework. For better classification, it employs the robust probabilistic collaborative representation based classifier to recognize individuals. Experimental results indicate that the proposed method has improved the recognition accuracy compared to the unimodal biometrics.

  1. Machine Learning Based Localization and Classification with Atomic Magnetometers

    Science.gov (United States)

    Deans, Cameron; Griffin, Lewis D.; Marmugi, Luca; Renzoni, Ferruccio

    2018-01-01

    We demonstrate identification of position, material, orientation, and shape of objects imaged by a Rb 85 atomic magnetometer performing electromagnetic induction imaging supported by machine learning. Machine learning maximizes the information extracted from the images created by the magnetometer, demonstrating the use of hidden data. Localization 2.6 times better than the spatial resolution of the imaging system and successful classification up to 97% are obtained. This circumvents the need of solving the inverse problem and demonstrates the extension of machine learning to diffusive systems, such as low-frequency electrodynamics in media. Automated collection of task-relevant information from quantum-based electromagnetic imaging will have a relevant impact from biomedicine to security.

  2. Fines Classification Based on Sensitivity to Pore-Fluid Chemistry

    KAUST Repository

    Jang, Junbong

    2015-12-28

    The 75-μm particle size is used to discriminate between fine and coarse grains. Further analysis of fine grains is typically based on the plasticity chart. Whereas pore-fluid-chemistry-dependent soil response is a salient and distinguishing characteristic of fine grains, pore-fluid chemistry is not addressed in current classification systems. Liquid limits obtained with electrically contrasting pore fluids (deionized water, 2-M NaCl brine, and kerosene) are combined to define the soil "electrical sensitivity." Liquid limit and electrical sensitivity can be effectively used to classify fine grains according to their fluid-soil response into no-, low-, intermediate-, or high-plasticity fine grains of low, intermediate, or high electrical sensitivity. The proposed methodology benefits from the accumulated experience with liquid limit in the field and addresses the needs of a broader range of geotechnical engineering problems. © ASCE.

  3. DNA methylation-based classification of central nervous system tumours

    DEFF Research Database (Denmark)

    Capper, David; Jones, David T.W.; Sill, Martin

    2018-01-01

    Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging - with substantial inter-observer variabil......Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging - with substantial inter......-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show...

  4. Fines classification based on sensitivity to pore-fluid chemistry

    Science.gov (United States)

    Jang, Junbong; Santamarina, J. Carlos

    2016-01-01

    The 75-μm particle size is used to discriminate between fine and coarse grains. Further analysis of fine grains is typically based on the plasticity chart. Whereas pore-fluid-chemistry-dependent soil response is a salient and distinguishing characteristic of fine grains, pore-fluid chemistry is not addressed in current classification systems. Liquid limits obtained with electrically contrasting pore fluids (deionized water, 2-M NaCl brine, and kerosene) are combined to define the soil “electrical sensitivity.” Liquid limit and electrical sensitivity can be effectively used to classify fine grains according to their fluid-soil response into no-, low-, intermediate-, or high-plasticity fine grains of low, intermediate, or high electrical sensitivity. The proposed methodology benefits from the accumulated experience with liquid limit in the field and addresses the needs of a broader range of geotechnical engineering problems.

  5. Fluorescent Binary Ensemble Based on Pyrene Derivative and Sodium Dodecyl Sulfate Assemblies as a Chemical Tongue for Discriminating Metal Ions and Brand Water.

    Science.gov (United States)

    Zhang, Lijun; Huang, Xinyan; Cao, Yuan; Xin, Yunhong; Ding, Liping

    2017-12-22

    Enormous effort has been put to the detection and recognition of various heavy metal ions due to their involvement in serious environmental pollution and many major diseases. The present work has developed a single fluorescent sensor ensemble that can distinguish and identify a variety of heavy metal ions. A pyrene-based fluorophore (PB) containing a metal ion receptor group was specially designed and synthesized. Anionic surfactant sodium dodecyl sulfate (SDS) assemblies can effectively adjust its fluorescence behavior. The selected binary ensemble based on PB/SDS assemblies can exhibit multiple emission bands and provide wavelength-based cross-reactive responses to a series of metal ions to realize pattern recognition ability. The combination of surfactant assembly modulation and the receptor for metal ions empowers the present sensor ensemble with strong discrimination power, which could well differentiate 13 metal ions, including Cu 2+ , Co 2+ , Ni 2+ , Cr 3+ , Hg 2+ , Fe 3+ , Zn 2+ , Cd 2+ , Al 3+ , Pb 2+ , Ca 2+ , Mg 2+ , and Ba 2+ . Moreover, this single sensing ensemble could be further applied for identifying different brands of drinking water.

  6. New classification system-based visual outcome in Eales′ disease

    Directory of Open Access Journals (Sweden)

    Saxena Sandeep

    2007-01-01

    Full Text Available Purpose: A retrospective tertiary care center-based study was undertaken to evaluate the visual outcome in Eales′ disease, based on a new classification system, for the first time. Materials and Methods: One hundred and fifty-nine consecutive cases of Eales′ disease were included. All the eyes were staged according to the new classification: Stage 1: periphlebitis of small (1a and large (1b caliber vessels with superficial retinal hemorrhages; Stage 2a: capillary non-perfusion, 2b: neovascularization elsewhere/of the disc; Stage 3a: fibrovascular proliferation, 3b: vitreous hemorrhage; Stage 4a: traction/combined rhegmatogenous retinal detachment and 4b: rubeosis iridis, neovascular glaucoma, complicated cataract and optic atrophy. Visual acuity was graded as: Grade I 20/20 or better; Grade II 20/30 to 20/40; Grade III 20/60 to 20/120 and Grade IV 20/200 or worse. All the cases were managed by medical therapy, photocoagulation and/or vitreoretinal surgery. Visual acuity was converted into decimal scale, denoting 20/20=1 and 20/800=0.01. Paired t-test / Wilcoxon signed-rank tests were used for statistical analysis. Results: Vitreous hemorrhage was the commonest presenting feature (49.32%. Cases with Stages 1 to 3 and 4a and 4b achieved final visual acuity ranging from 20/15 to 20/40; 20/80 to 20/400 and 20/200 to 20/400, respectively. Statistically significant improvement in visual acuities was observed in all the stages of the disease except Stages 1a and 4b. Conclusion: Significant improvement in visual acuities was observed in the majority of stages of Eales′ disease following treatment. This study adds further to the little available evidences of treatment effects in literature and may have effect on patient care and health policy in Eales′ disease.

  7. Quality-Oriented Classification of Aircraft Material Based on SVM

    Directory of Open Access Journals (Sweden)

    Hongxia Cai

    2014-01-01

    Full Text Available The existing material classification is proposed to improve the inventory management. However, different materials have the different quality-related attributes, especially in the aircraft industry. In order to reduce the cost without sacrificing the quality, we propose a quality-oriented material classification system considering the material quality character, Quality cost, and Quality influence. Analytic Hierarchy Process helps to make feature selection and classification decision. We use the improved Kraljic Portfolio Matrix to establish the three-dimensional classification model. The aircraft materials can be divided into eight types, including general type, key type, risk type, and leveraged type. Aiming to improve the classification accuracy of various materials, the algorithm of Support Vector Machine is introduced. Finally, we compare the SVM and BP neural network in the application. The results prove that the SVM algorithm is more efficient and accurate and the quality-oriented material classification is valuable.

  8. One-day-ahead streamflow forecasting via super-ensembles of several neural network architectures based on the Multi-Level Diversity Model

    Science.gov (United States)

    Brochero, Darwin; Hajji, Islem; Pina, Jasson; Plana, Queralt; Sylvain, Jean-Daniel; Vergeynst, Jenna; Anctil, Francois

    2015-04-01

    Theories about generalization error with ensembles are mainly based on the diversity concept, which promotes resorting to many members of different properties to support mutually agreeable decisions. Kuncheva (2004) proposed the Multi Level Diversity Model (MLDM) to promote diversity in model ensembles, combining different data subsets, input subsets, models, parameters, and including a combiner level in order to optimize the final ensemble. This work tests the hypothesis about the minimisation of the generalization error with ensembles of Neural Network (NN) structures. We used the MLDM to evaluate two different scenarios: (i) ensembles from a same NN architecture, and (ii) a super-ensemble built by a combination of sub-ensembles of many NN architectures. The time series used correspond to the 12 basins of the MOdel Parameter Estimation eXperiment (MOPEX) project that were used by Duan et al. (2006) and Vos (2013) as benchmark. Six architectures are evaluated: FeedForward NN (FFNN) trained with the Levenberg Marquardt algorithm (Hagan et al., 1996), FFNN trained with SCE (Duan et al., 1993), Recurrent NN trained with a complex method (Weins et al., 2008), Dynamic NARX NN (Leontaritis and Billings, 1985), Echo State Network (ESN), and leak integrator neuron (L-ESN) (Lukosevicius and Jaeger, 2009). Each architecture performs separately an Input Variable Selection (IVS) according to a forward stepwise selection (Anctil et al., 2009) using mean square error as objective function. Post-processing by Predictor Stepwise Selection (PSS) of the super-ensemble has been done following the method proposed by Brochero et al. (2011). IVS results showed that the lagged stream flow, lagged precipitation, and Standardized Precipitation Index (SPI) (McKee et al., 1993) were the most relevant variables. They were respectively selected as one of the firsts three selected variables in 66, 45, and 28 of the 72 scenarios. A relationship between aridity index (Arora, 2002) and NN

  9. GA Based Optimal Feature Extraction Method for Functional Data Classification

    OpenAIRE

    Jun Wan; Zehua Chen; Yingwu Chen; Zhidong Bai

    2010-01-01

    Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper...

  10. Assessing an ensemble Kalman filter inference of Manning’s n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    KAUST Repository

    Siripatana, Adil

    2017-06-08

    Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning’s n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and

  11. Assessing an ensemble Kalman filter inference of Manning's n coefficient of an idealized tidal inlet against a polynomial chaos-based MCMC

    Science.gov (United States)

    Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim

    2017-08-01

    Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and

  12. A One-Step-Ahead Smoothing-Based Joint Ensemble Kalman Filter for State-Parameter Estimation of Hydrological Models

    KAUST Repository

    El Gharamti, Mohamad; Ait-El-Fquih, Boujemaa; Hoteit, Ibrahim

    2015-01-01

    The ensemble Kalman filter (EnKF) recursively integrates field data into simulation models to obtain a better characterization of the model’s state and parameters. These are generally estimated following a state-parameters joint augmentation

  13. The generalization ability of online SVM classification based on Markov sampling.

    Science.gov (United States)

    Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang

    2015-03-01

    In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.

  14. Classification of types of stuttering symptoms based on brain activity.

    Directory of Open Access Journals (Sweden)

    Jing Jiang

    Full Text Available Among the non-fluencies seen in speech, some are more typical (MT of stuttering speakers, whereas others are less typical (LT and are common to both stuttering and fluent speakers. No neuroimaging work has evaluated the neural basis for grouping these symptom types. Another long-debated issue is which type (LT, MT whole-word repetitions (WWR should be placed in. In this study, a sentence completion task was performed by twenty stuttering patients who were scanned using an event-related design. This task elicited stuttering in these patients. Each stuttered trial from each patient was sorted into the MT or LT types with WWR put aside. Pattern classification was employed to train a patient-specific single trial model to automatically classify each trial as MT or LT using the corresponding fMRI data. This model was then validated by using test data that were independent of the training data. In a subsequent analysis, the classification model, just established, was used to determine which type the WWR should be placed in. The results showed that the LT and the MT could be separated with high accuracy based on their brain activity. The brain regions that made most contribution to the separation of the types were: the left inferior frontal cortex and bilateral precuneus, both of which showed higher activity in the MT than in the LT; and the left putamen and right cerebellum which showed the opposite activity pattern. The results also showed that the brain activity for WWR was more similar to that of the LT and fluent speech than to that of the MT. These findings provide a neurological basis for separating the MT and the LT types, and support the widely-used MT/LT symptom grouping scheme. In addition, WWR play a similar role as the LT, and thus should be placed in the LT type.

  15. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  16. User Classification in Crowdsourcing-Based Cooperative Spectrum Sensing

    Directory of Open Access Journals (Sweden)

    Linbo Zhai

    2017-07-01

    Full Text Available This paper studies cooperative spectrum sensing based on crowdsourcing in cognitive radio networks. Since intelligent mobile users such as smartphones and tablets can sense the wireless spectrum, channel sensing tasks can be assigned to these mobile users. This is referred to as the crowdsourcing method. However, there may be some malicious mobile users that send false sensing reports deliberately, for their own purposes. False sensing reports will influence decisions about channel state. Therefore, it is necessary to classify mobile users in order to distinguish malicious users. According to the sensing reports, mobile users should not just be divided into two classes (honest and malicious. There are two reasons for this: on the one hand, honest users in different positions may have different sensing outcomes, as shadowing, multi-path fading, and other issues may influence the sensing results; on the other hand, there may be more than one type of malicious users, acting differently in the network. Therefore, it is necessary to classify mobile users into more than two classes. Due to the lack of prior information of the number of user classes, this paper casts the problem of mobile user classification as a dynamic clustering problem that is NP-hard. The paper uses the interdistance-to-intradistance ratio of clusters as the fitness function, and aims to maximize the fitness function. To cast this optimization problem, this paper proposes a distributed algorithm for user classification in order to obtain bounded close-to-optimal solutions, and analyzes the approximation ratio of the proposed algorithm. Simulations show the distributed algorithm achieves higher performance than other algorithms.

  17. Classification of Types of Stuttering Symptoms Based on Brain Activity

    Science.gov (United States)

    Jiang, Jing; Lu, Chunming; Peng, Danling; Zhu, Chaozhe; Howell, Peter

    2012-01-01

    Among the non-fluencies seen in speech, some are more typical (MT) of stuttering speakers, whereas others are less typical (LT) and are common to both stuttering and fluent speakers. No neuroimaging work has evaluated the neural basis for grouping these symptom types. Another long-debated issue is which type (LT, MT) whole-word repetitions (WWR) should be placed in. In this study, a sentence completion task was performed by twenty stuttering patients who were scanned using an event-related design. This task elicited stuttering in these patients. Each stuttered trial from each patient was sorted into the MT or LT types with WWR put aside. Pattern classification was employed to train a patient-specific single trial model to automatically classify each trial as MT or LT using the corresponding fMRI data. This model was then validated by using test data that were independent of the training data. In a subsequent analysis, the classification model, just established, was used to determine which type the WWR should be placed in. The results showed that the LT and the MT could be separated with high accuracy based on their brain activity. The brain regions that made most contribution to the separation of the types were: the left inferior frontal cortex and bilateral precuneus, both of which showed higher activity in the MT than in the LT; and the left putamen and right cerebellum which showed the opposite activity pattern. The results also showed that the brain activity for WWR was more similar to that of the LT and fluent speech than to that of the MT. These findings provide a neurological basis for separating the MT and the LT types, and support the widely-used MT/LT symptom grouping scheme. In addition, WWR play a similar role as the LT, and thus should be placed in the LT type. PMID:22761887

  18. Credit scoring using ensemble of various classifiers on reduced feature set

    Directory of Open Access Journals (Sweden)

    Dahiya Shashi

    2015-01-01

    Full Text Available Credit scoring methods are widely used for evaluating loan applications in financial and banking institutions. Credit score identifies if applicant customers belong to good risk applicant group or a bad risk applicant group. These decisions are based on the demographic data of the customers, overall business by the customer with bank, and loan payment history of the loan applicants. The advantages of using credit scoring models include reducing the cost of credit analysis, enabling faster credit decisions and diminishing possible risk. Many statistical and machine learning techniques such as Logistic Regression, Support Vector Machines, Neural Networks and Decision tree algorithms have been used independently and as hybrid credit scoring models. This paper proposes an ensemble based technique combining seven individual models to increase the classification accuracy. Feature selection has also been used for selecting important attributes for classification. Cross classification was conducted using three data partitions. German credit dataset having 1000 instances and 21 attributes is used in the present study. The results of the experiments revealed that the ensemble model yielded a very good accuracy when compared to individual models. In all three different partitions, the ensemble model was able to classify more than 80% of the loan customers as good creditors correctly. Also, for 70:30 partition there was a good impact of feature selection on the accuracy of classifiers. The results were improved for almost all individual models including the ensemble model.

  19. Seismic assessment of existing RC buildings under alternative ground motion ensembles compatible to EC8 and NTC 2008

    NARCIS (Netherlands)

    Tanganelli, Marco; Viti, Stefania; Mariani, V.; Pianigiani, Maria

    2017-01-01

    This work investigates the effects of the choice of different ensembles of ground motions on the seismic assessment of existing RC buildings through nonlinear dynamic analysis. Nowadays indeed, all the main International Seismic Codes provide a soil classification which is based on the shear wave

  20. Sequence-based classification and identification of Fungi.

    Science.gov (United States)

    Hibbett, David; Abarenkov, Kessy; Kõljalg, Urmas; Öpik, Maarja; Chai, Benli; Cole, James; Wang, Qiong; Crous, Pedro; Robert, Vincent; Helgason, Thorunn; Herr, Joshua R; Kirk, Paul; Lueschow, Shiloh; O'Donnell, Kerry; Nilsson, R Henrik; Oono, Ryoko; Schoch, Conrad; Smyth, Christopher; Walker, Donald M; Porras-Alfaro, Andrea; Taylor, John W; Geiser, David M

    Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.

  1. A multi-stage intelligent approach based on an ensemble of two-way interaction model for forecasting the global horizontal radiation of India

    International Nuclear Information System (INIS)

    Jiang, He; Dong, Yao; Xiao, Ling

    2017-01-01

    Highlights: • Ensemble learning system is proposed to forecast the global solar radiation. • LASSO is utilized as feature selection method for subset model. • GSO is used to select the weight vector aggregating the response of subset model. • A simple and efficient algorithm is designed based on thresholding function. • Theoretical analysis focusing on error rate is provided. - Abstract: Forecasting of effective solar irradiation has developed a huge interest in recent decades, mainly due to its various applications in grid connect photovoltaic installations. This paper develops and investigates an ensemble learning based multistage intelligent approach to forecast 5 days global horizontal radiation at four given locations of India. The two-way interaction model is considered with purpose of detecting the associated correlation between the features. The main structure of the novel method is the ensemble learning, which is based on Divide and Conquer principle, is applied to enhance the forecasting accuracy and model stability. An efficient feature selection method LASSO is performed in the input space with the regularization parameter selected by Cross-Validation. A weight vector which best represents the importance of each individual model in ensemble system is provided by glowworm swarm optimization. The combination of feature selection and parameter selection are helpful in creating the diversity of the ensemble learning. In order to illustrate the validity of the proposed method, the datasets at four different locations of the India are split into training and test datasets. The results of the real data experiments demonstrate the efficiency and efficacy of the proposed method comparing with other competitors.

  2. A Cutting Pattern Recognition Method for Shearers Based on Improved Ensemble Empirical Mode Decomposition and a Probabilistic Neural Network

    Directory of Open Access Journals (Sweden)

    Jing Xu

    2015-10-01

    Full Text Available In order to guarantee the stable operation of shearers and promote construction of an automatic coal mining working face, an online cutting pattern recognition method with high accuracy and speed based on Improved Ensemble Empirical Mode Decomposition (IEEMD and Probabilistic Neural Network (PNN is proposed. An industrial microphone is installed on the shearer and the cutting sound is collected as the recognition criterion to overcome the disadvantages of giant size, contact measurement and low identification rate of traditional detectors. To avoid end-point effects and get rid of undesirable intrinsic mode function (IMF components in the initial signal, IEEMD is conducted on the sound. The end-point continuation based on the practical storage data is performed first to overcome the end-point effect. Next the average correlation coefficient, which is calculated by the correlation of the first IMF with others, is introduced to select essential IMFs. Then the energy and standard deviation of the reminder IMFs are extracted as features and PNN is applied to classify the cutting patterns. Finally, a simulation example, with an accuracy of 92.67%, and an industrial application prove the efficiency and correctness of the proposed method.

  3. Multimodel GCM-RCM Ensemble-Based Projections of Temperature and Precipitation over West Africa for the Early 21st Century

    Directory of Open Access Journals (Sweden)

    I. Diallo

    2012-01-01

    Full Text Available Reliable climate change scenarios are critical for West Africa, whose economy relies mostly on agriculture and, in this regard, multimodel ensembles are believed to provide the most robust climate change information. Toward this end, we analyze and intercompare the performance of a set of four regional climate models (RCMs driven by two global climate models (GCMs (for a total of 4 different GCM-RCM pairs in simulating present day and future climate over West Africa. The results show that the individual RCM members as well as their ensemble employing the same driving fields exhibit different biases and show mixed results in terms of outperforming the GCM simulation of seasonal temperature and precipitation, indicating a substantial sensitivity of RCMs to regional and local processes. These biases are reduced and GCM simulations improved upon by averaging all four RCM simulations, suggesting that multi-model RCM ensembles based on different driving GCMs help to compensate systematic errors from both the nested and the driving models. This confirms the importance of the multi-model approach for improving robustness of climate change projections. Illustrative examples of such ensemble reveal that the western Sahel undergoes substantial drying in future climate projections mostly due to a decrease in peak monsoon rainfall.

  4. Proposal of a novel ensemble learning based segmentation with a shape prior and its application to spleen segmentation from a 3D abdominal CT volume

    International Nuclear Information System (INIS)

    Shindo, Kiyo; Shimizu, Akinobu; Kobatake, Hidefumi; Nawano, Shigeru; Shinozaki, Kenji

    2010-01-01

    An organ segmentation learned by a conventional ensemble learning algorithm suffers from unnatural errors because each voxel is classified independently in the segmentation process. This paper proposes a novel ensemble learning algorithm that can take into account global shape and location of organs. It estimates the shape and location of an organ from a given image by combining an intermediate segmentation result with a statistical shape model. Once an ensemble learning algorithm could not improve the segmentation performance in the iterative learning process, it estimates the shape and location by finding an optimal model parameter set with maximum degree of correspondence between a statistical shape model and the intermediate segmentation result. Novel weak classifiers are generated based on a signed distance from a boundary of the estimated shape and a distance from a barycenter of the intermediate segmentation result. Subsequently it continues the learning process with the novel weak classifiers. This paper presents experimental results where the proposed ensemble learning algorithm generates a segmentation process that can extract a spleen from a 3D CT image more precisely than a conventional one. (author)

  5. Representing and Reasoning with the Internet of Things: a Modular Rule-Based Model for Ensembles of Context-Aware Smart Things

    Directory of Open Access Journals (Sweden)

    S. W. Loke

    2016-03-01

    Full Text Available Context-aware smart things are capable of computational behaviour based on sensing the physical world, inferring context from the sensed data, and acting on the sensed context. A collection of such things can form what we call a thing-ensemble, when they have the ability to communicate with one another (over a short range network such as Bluetooth, or the Internet, i.e. the Internet of Things (IoT concept, sense each other, and when each of them might play certain roles with respect to each other. Each smart thing in a thing-ensemble might have its own context-aware behaviours which when integrated with other smart things yield behaviours that are not straightforward to reason with. We present Sigma, a language of operators, inspired from modular logic programming, for specifying and reasoning with combined behaviours among smart things in a thing-ensemble. We show numerous examples of the use of Sigma for describing a range of behaviours over a diverse range of thing-ensembles, from sensor networks to smart digital frames, demonstrating the versatility of our approach. We contend that our operator approach abstracts away low-level communication and protocol details, and allows systems of context-aware things to be designed and built in a compositional and incremental manner.

  6. Improving Generalization Based on l1-Norm Regularization for EEG-Based Motor Imagery Classification

    Directory of Open Access Journals (Sweden)

    Yuwei Zhao

    2018-05-01

    Full Text Available Multichannel electroencephalography (EEG is widely used in typical brain-computer interface (BCI systems. In general, a number of parameters are essential for a EEG classification algorithm due to redundant features involved in EEG signals. However, the generalization of the EEG method is often adversely affected by the model complexity, considerably coherent with its number of undetermined parameters, further leading to heavy overfitting. To decrease the complexity and improve the generalization of EEG method, we present a novel l1-norm-based approach to combine the decision value obtained from each EEG channel directly. By extracting the information from different channels on independent frequency bands (FB with l1-norm regularization, the method proposed fits the training data with much less parameters compared to common spatial pattern (CSP methods in order to reduce overfitting. Moreover, an effective and efficient solution to minimize the optimization object is proposed. The experimental results on dataset IVa of BCI competition III and dataset I of BCI competition IV show that, the proposed method contributes to high classification accuracy and increases generalization performance for the classification of MI EEG. As the training set ratio decreases from 80 to 20%, the average classification accuracy on the two datasets changes from 85.86 and 86.13% to 84.81 and 76.59%, respectively. The classification performance and generalization of the proposed method contribute to the practical application of MI based BCI systems.

  7. Data Stream Classification Based on the Gamma Classifier

    Directory of Open Access Journals (Sweden)

    Abril Valeria Uriarte-Arcia

    2015-01-01

    Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

  8. DNA methylation-based classification of central nervous system tumours.

    Science.gov (United States)

    Capper, David; Jones, David T W; Sill, Martin; Hovestadt, Volker; Schrimpf, Daniel; Sturm, Dominik; Koelsche, Christian; Sahm, Felix; Chavez, Lukas; Reuss, David E; Kratz, Annekathrin; Wefers, Annika K; Huang, Kristin; Pajtler, Kristian W; Schweizer, Leonille; Stichel, Damian; Olar, Adriana; Engel, Nils W; Lindenberg, Kerstin; Harter, Patrick N; Braczynski, Anne K; Plate, Karl H; Dohmen, Hildegard; Garvalov, Boyan K; Coras, Roland; Hölsken, Annett; Hewer, Ekkehard; Bewerunge-Hudler, Melanie; Schick, Matthias; Fischer, Roger; Beschorner, Rudi; Schittenhelm, Jens; Staszewski, Ori; Wani, Khalida; Varlet, Pascale; Pages, Melanie; Temming, Petra; Lohmann, Dietmar; Selt, Florian; Witt, Hendrik; Milde, Till; Witt, Olaf; Aronica, Eleonora; Giangaspero, Felice; Rushing, Elisabeth; Scheurlen, Wolfram; Geisenberger, Christoph; Rodriguez, Fausto J; Becker, Albert; Preusser, Matthias; Haberler, Christine; Bjerkvig, Rolf; Cryan, Jane; Farrell, Michael; Deckert, Martina; Hench, Jürgen; Frank, Stephan; Serrano, Jonathan; Kannan, Kasthuri; Tsirigos, Aristotelis; Brück, Wolfgang; Hofer, Silvia; Brehmer, Stefanie; Seiz-Rosenhagen, Marcel; Hänggi, Daniel; Hans, Volkmar; Rozsnoki, Stephanie; Hansford, Jordan R; Kohlhof, Patricia; Kristensen, Bjarne W; Lechner, Matt; Lopes, Beatriz; Mawrin, Christian; Ketter, Ralf; Kulozik, Andreas; Khatib, Ziad; Heppner, Frank; Koch, Arend; Jouvet, Anne; Keohane, Catherine; Mühleisen, Helmut; Mueller, Wolf; Pohl, Ute; Prinz, Marco; Benner, Axel; Zapatka, Marc; Gottardo, Nicholas G; Driever, Pablo Hernáiz; Kramm, Christof M; Müller, Hermann L; Rutkowski, Stefan; von Hoff, Katja; Frühwald, Michael C; Gnekow, Astrid; Fleischhack, Gudrun; Tippelt, Stephan; Calaminus, Gabriele; Monoranu, Camelia-Maria; Perry, Arie; Jones, Chris; Jacques, Thomas S; Radlwimmer, Bernhard; Gessi, Marco; Pietsch, Torsten; Schramm, Johannes; Schackert, Gabriele; Westphal, Manfred; Reifenberger, Guido; Wesseling, Pieter; Weller, Michael; Collins, Vincent Peter; Blümcke, Ingmar; Bendszus, Martin; Debus, Jürgen; Huang, Annie; Jabado, Nada; Northcott, Paul A; Paulus, Werner; Gajjar, Amar; Robinson, Giles W; Taylor, Michael D; Jaunmuktane, Zane; Ryzhova, Marina; Platten, Michael; Unterberg, Andreas; Wick, Wolfgang; Karajannis, Matthias A; Mittelbronn, Michel; Acker, Till; Hartmann, Christian; Aldape, Kenneth; Schüller, Ulrich; Buslei, Rolf; Lichter, Peter; Kool, Marcel; Herold-Mende, Christel; Ellison, David W; Hasselblatt, Martin; Snuderl, Matija; Brandner, Sebastian; Korshunov, Andrey; von Deimling, Andreas; Pfister, Stefan M

    2018-03-22

    Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging-with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology.

  9. Estimation of Compaction Parameters Based on Soil Classification

    Science.gov (United States)

    Lubis, A. S.; Muis, Z. A.; Hastuty, I. P.; Siregar, I. M.

    2018-02-01

    Factors that must be considered in compaction of the soil works were the type of soil material, field control, maintenance and availability of funds. Those problems then raised the idea of how to estimate the density of the soil with a proper implementation system, fast, and economical. This study aims to estimate the compaction parameter i.e. the maximum dry unit weight (γ dmax) and optimum water content (Wopt) based on soil classification. Each of 30 samples were being tested for its properties index and compaction test. All of the data’s from the laboratory test results, were used to estimate the compaction parameter values by using linear regression and Goswami Model. From the research result, the soil types were A4, A-6, and A-7 according to AASHTO and SC, SC-SM, and CL based on USCS. By linear regression, the equation for estimation of the maximum dry unit weight (γdmax *)=1,862-0,005*FINES- 0,003*LL and estimation of the optimum water content (wopt *)=- 0,607+0,362*FINES+0,161*LL. By Goswami Model (with equation Y=mLogG+k), for estimation of the maximum dry unit weight (γdmax *) with m=-0,376 and k=2,482, for estimation of the optimum water content (wopt *) with m=21,265 and k=-32,421. For both of these equations a 95% confidence interval was obtained.

  10. Toward a Safety Risk-Based Classification of Unmanned Aircraft

    Science.gov (United States)

    Torres-Pomales, Wilfredo

    2016-01-01

    There is a trend of growing interest and demand for greater access of unmanned aircraft (UA) to the National Airspace System (NAS) as the ongoing development of UA technology has created the potential for significant economic benefits. However, the lack of a comprehensive and efficient UA regulatory framework has constrained the number and kinds of UA operations that can be performed. This report presents initial results of a study aimed at defining a safety-risk-based UA classification as a plausible basis for a regulatory framework for UA operating in the NAS. Much of the study up to this point has been at a conceptual high level. The report includes a survey of contextual topics, analysis of safety risk considerations, and initial recommendations for a risk-based approach to safe UA operations in the NAS. The next phase of the study will develop and leverage deeper clarity and insight into practical engineering and regulatory considerations for ensuring that UA operations have an acceptable level of safety.

  11. Superpixel-based classification of gastric chromoendoscopy images

    Science.gov (United States)

    Boschetto, Davide; Grisan, Enrico

    2017-03-01

    Chromoendoscopy (CH) is a gastroenterology imaging modality that involves the staining of tissues with methylene blue, which reacts with the internal walls of the gastrointestinal tract, improving the visual contrast in mucosal surfaces and thus enhancing a doctor's ability to screen precancerous lesions or early cancer. This technique helps identify areas that can be targeted for biopsy or treatment and in this work we will focus on gastric cancer detection. Gastric chromoendoscopy for cancer detection has several taxonomies available, one of which classifies CH images into three classes (normal, metaplasia, dysplasia) based on color, shape and regularity of pit patterns. Computer-assisted diagnosis is desirable to help us improve the reliability of the tissue classification and abnormalities detection. However, traditional computer vision methodologies, mainly segmentation, do not translate well to the specific visual characteristics of a gastroenterology imaging scenario. We propose the exploitation of a first unsupervised segmentation via superpixel, which groups pixels into perceptually meaningful atomic regions, used to replace the rigid structure of the pixel grid. For each superpixel, a set of features is extracted and then fed to a random forest based classifier, which computes a model used to predict the class of each superpixel. The average general accuracy of our model is 92.05% in the pixel domain (86.62% in the superpixel domain), while detection accuracies on the normal and abnormal class are respectively 85.71% and 95%. Eventually, the whole image class can be predicted image through a majority vote on each superpixel's predicted class.

  12. Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA

    Directory of Open Access Journals (Sweden)

    Ming-gang Du

    2009-01-01

    Full Text Available Motivation. Independent Components Analysis (ICA maximizes the statistical independence of the representational components of a training gene expression profiles (GEP ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.

  13. Risk assessment of agricultural water requirement based on a multi-model ensemble framework, southwest of Iran

    Science.gov (United States)

    Zamani, Reza; Akhond-Ali, Ali-Mohammad; Roozbahani, Abbas; Fattahi, Rouhollah

    2017-08-01

    Water shortage and climate change are the most important issues of sustainable agricultural and water resources development. Given the importance of water availability in crop production, the present study focused on risk assessment of climate change impact on agricultural water requirement in southwest of Iran, under two emission scenarios (A2 and B1) for the future period (2025-2054). A multi-model ensemble framework based on mean observed temperature-precipitation (MOTP) method and a combined probabilistic approach Long Ashton Research Station-Weather Generator (LARS-WG) and change factor (CF) have been used for downscaling to manage the uncertainty of outputs of 14 general circulation models (GCMs). The results showed an increasing temperature in all months and irregular changes of precipitation (either increasing or decreasing) in the future period. In addition, the results of the calculated annual net water requirement for all crops affected by climate change indicated an increase between 4 and 10 %. Furthermore, an increasing process is also expected regarding to the required water demand volume. The most and the least expected increase in the water demand volume is about 13 and 5 % for A2 and B1 scenarios, respectively. Considering the results and the limited water resources in the study area, it is crucial to provide water resources planning in order to reduce the negative effects of climate change. Therefore, the adaptation scenarios with the climate change related to crop pattern and water consumption should be taken into account.

  14. A Combined Methodology to Eliminate Artifacts in Multichannel Electrogastrogram Based on Independent Component Analysis and Ensemble Empirical Mode Decomposition.

    Science.gov (United States)

    Sengottuvel, S; Khan, Pathan Fayaz; Mariyappa, N; Patel, Rajesh; Saipriya, S; Gireesan, K

    2018-06-01

    Cutaneous measurements of electrogastrogram (EGG) signals are heavily contaminated by artifacts due to cardiac activity, breathing, motion artifacts, and electrode drifts whose effective elimination remains an open problem. A common methodology is proposed by combining independent component analysis (ICA) and ensemble empirical mode decomposition (EEMD) to denoise gastric slow-wave signals in multichannel EGG data. Sixteen electrodes are fixed over the upper abdomen to measure the EGG signals under three gastric conditions, namely, preprandial, postprandial immediately, and postprandial 2 h after food for three healthy subjects and a subject with a gastric disorder. Instantaneous frequencies of intrinsic mode functions that are obtained by applying the EEMD technique are analyzed to individually identify and remove each of the artifacts. A critical investigation on the proposed ICA-EEMD method reveals its ability to provide a higher attenuation of artifacts and lower distortion than those obtained by the ICA-EMD method and conventional techniques, like bandpass and adaptive filtering. Characteristic changes in the slow-wave frequencies across the three gastric conditions could be determined from the denoised signals for all the cases. The results therefore encourage the use of the EEMD-based technique for denoising gastric signals to be used in clinical practice.

  15. Faults Diagnostics of Railway Axle Bearings Based on IMF’s Confidence Index Algorithm for Ensemble EMD

    Science.gov (United States)

    Yi, Cai; Lin, Jianhui; Zhang, Weihua; Ding, Jianming

    2015-01-01

    As train loads and travel speeds have increased over time, railway axle bearings have become critical elements which require more efficient non-destructive inspection and fault diagnostics methods. This paper presents a novel and adaptive procedure based on ensemble empirical mode decomposition (EEMD) and Hilbert marginal spectrum for multi-fault diagnostics of axle bearings. EEMD overcomes the limitations that often hypothesize about data and computational efforts that restrict the application of signal processing techniques. The outputs of this adaptive approach are the intrinsic mode functions that are treated with the Hilbert transform in order to obtain the Hilbert instantaneous frequency spectrum and marginal spectrum. Anyhow, not all the IMFs obtained by the decomposition should be considered into Hilbert marginal spectrum. The IMFs’ confidence index arithmetic proposed in this paper is fully autonomous, overcoming the major limit of selection by user with experience, and allows the development of on-line tools. The effectiveness of the improvement is proven by the successful diagnosis of an axle bearing with a single fault or multiple composite faults, e.g., outer ring fault, cage fault and pin roller fault. PMID:25970256

  16. Complex catalysts from self-repairing ensembles to highly reactive air-based oxidation systems

    Science.gov (United States)

    Craig L. Hill; Laurent Delannoy; Dean C. Duncan; Ira A. Weinstock; Roman F. Renneke; Richard S. Reiner; Rajai H. Atalla; Jong Woo Han; Daniel A. Hillesheim; Rui Cao; Travis M. Anderson; Nelya M. Okun; Djamaladdin G. Musaev; Yurii V. Geletii

    2007-01-01

    Progress in four interrelated catalysis research efforts in our laboratory are summarized: (1) catalytic photochemical functionalization of unactivated CeH bonds by polyoxometalates (POMs); (2) self-repairing catalysts; (3) catalysts for air-based oxidations under ambient conditions; and (4) terminal oxo complexes of the late-transition metal elements and their...

  17. Identifying climate analogues for precipitation extremes for Denmark based on RCM simulations from the ENSEMBLES database

    DEFF Research Database (Denmark)

    Arnbjerg-Nielsen, Karsten; Funder, S. G.; Madsen, H.

    2015-01-01

    Climate analogues, also denoted Space-For-Time, may be used to identify regions where the present climatic conditions resemble conditions of a past or future state of another location or region based on robust climate variable statistics in combination with projections of how these statistics cha...

  18. An Integrated Ensemble-Based Operational Framework to Predict Urban Flooding: A Case Study of Hurricane Sandy in the Passaic and Hackensack River Basins

    Science.gov (United States)

    Saleh, F.; Ramaswamy, V.; Georgas, N.; Blumberg, A. F.; Wang, Y.

    2016-12-01

    Advances in computational resources and modeling techniques are opening the path to effectively integrate existing complex models. In the context of flood prediction, recent extreme events have demonstrated the importance of integrating components of the hydrosystem to better represent the interactions amongst different physical processes and phenomena. As such, there is a pressing need to develop holistic and cross-disciplinary modeling frameworks that effectively integrate existing models and better represent the operative dynamics. This work presents a novel Hydrologic-Hydraulic-Hydrodynamic Ensemble (H3E) flood prediction framework that operationally integrates existing predictive models representing coastal (New York Harbor Observing and Prediction System, NYHOPS), hydrologic (US Army Corps of Engineers Hydrologic Modeling System, HEC-HMS) and hydraulic (2-dimensional River Analysis System, HEC-RAS) components. The state-of-the-art framework is forced with 125 ensemble meteorological inputs from numerical weather prediction models including the Global Ensemble Forecast System, the European Centre for Medium-Range Weather Forecasts (ECMWF), the Canadian Meteorological Centre (CMC), the Short Range Ensemble Forecast (SREF) and the North American Mesoscale Forecast System (NAM). The framework produces, within a 96-hour forecast horizon, on-the-fly Google Earth flood maps that provide critical information for decision makers and emergency preparedness managers. The utility of the framework was demonstrated by retrospectively forecasting an extreme flood event, hurricane Sandy in the Passaic and Hackensack watersheds (New Jersey, USA). Hurricane Sandy caused significant damage to a number of critical facilities in this area including the New Jersey Transit's main storage and maintenance facility. The results of this work demonstrate that ensemble based frameworks provide improved flood predictions and useful information about associated uncertainties, thus

  19. Knowledge-based sea ice classification by polarimetric SAR

    DEFF Research Database (Denmark)

    Skriver, Henning; Dierking, Wolfgang

    2004-01-01

    Polarimetric SAR images acquired at C- and L-band over sea ice in the Greenland Sea, Baltic Sea, and Beaufort Sea have been analysed with respect to their potential for ice type classification. The polarimetric data were gathered by the Danish EMISAR and the US AIRSAR which both are airborne...... systems. A hierarchical classification scheme was chosen for sea ice because our knowledge about magnitudes, variations, and dependences of sea ice signatures can be directly considered. The optimal sequence of classification rules and the rules themselves depend on the ice conditions/regimes. The use...... of the polarimetric phase information improves the classification only in the case of thin ice types but is not necessary for thicker ice (above about 30 cm thickness)...

  20. Palm-vein classification based on principal orientation features.

    Directory of Open Access Journals (Sweden)

    Yujia Zhou

    Full Text Available Personal recognition using palm-vein patterns has emerged as a promising alternative for human recognition because of its uniqueness, stability, live body identification, flexibility, and difficulty to cheat. With the expanding application of palm-vein pattern recognition, the corresponding growth of the database has resulted in a long response time. To shorten the response time of identification, this paper proposes a simple and useful classification for palm-vein identification based on principal direction features. In the registration process, the Gaussian-Radon transform is adopted to extract the orientation matrix and then compute the principal direction of a palm-vein image based on the orientation matrix. The database can be classified into six bins based on the value of the principal direction. In the identification process, the principal direction of the test sample is first extracted to ascertain the corresponding bin. One-by-one matching with the training samples is then performed in the bin. To improve recognition efficiency while maintaining better recognition accuracy, two neighborhood bins of the corresponding bin are continuously searched to identify the input palm-vein image. Evaluation experiments are conducted on three different databases, namely, PolyU, CASIA, and the database of this study. Experimental results show that the searching range of one test sample in PolyU, CASIA and our database by the proposed method for palm-vein identification can be reduced to 14.29%, 14.50%, and 14.28%, with retrieval accuracy of 96.67%, 96.00%, and 97.71%, respectively. With 10,000 training samples in the database, the execution time of the identification process by the traditional method is 18.56 s, while that by the proposed approach is 3.16 s. The experimental results confirm that the proposed approach is more efficient than the traditional method, especially for a large database.

  1. Trace elements based classification on clinkers. Application to Spanish clinkers

    Directory of Open Access Journals (Sweden)

    Tamás, F. D.

    2001-12-01

    Full Text Available The qualitative identification to determine the origin (i.e. manufacturing factory of Spanish clinkers is described. The classification of clinkers produced in different factories can be based on their trace element content. Approximately fifteen clinker sorts are analysed, collected from 11 Spanish cement factories to determine their Mg, Sr, Ba, Mn, Ti, Zr, Zn and V content. An expert system formulated by a binary decision tree is designed based on the collected data. The performance of the obtained classifier was measured by ten-fold cross validation. The results show that the proposed method is useful to identify an easy-to-use expert system that is able to determine the origin of the clinker based on its trace element content.

    En el presente trabajo se describe el procedimiento de identificación cualitativa de clínkeres españoles con el objeto de determinar su origen (fábrica. Esa clasificación de los clínkeres se basa en el contenido de sus elementos traza. Se analizaron 15 clínkeres diferentes procedentes de 11 fábricas de cemento españolas, determinándose los contenidos en Mg, Sr, Ba, Mn, Ti, Zr, Zn y V. Se ha diseñado un sistema experto mediante un árbol de decisión binario basado en los datos recogidos. La clasificación obtenida fue examinada mediante la validación cruzada de 10 valores. Los resultados obtenidos muestran que el modelo propuesto es válido para identificar, de manera fácil, un sistema experto capaz de determinar el origen de un clínker basándose en el contenido de sus elementos traza.

  2. Event-Based User Classification in Weibo Media

    Directory of Open Access Journals (Sweden)

    Liang Guo

    2014-01-01

    Full Text Available Weibo media, known as the real-time microblogging services, has attracted massive attention and support from social network users. Weibo platform offers an opportunity for people to access information and changes the way people acquire and disseminate information significantly. Meanwhile, it enables people to respond to the social events in a more convenient way. Much of the information in Weibo media is related to some events. Users who post different contents, and exert different behavior or attitude may lead to different contribution to the specific event. Therefore, classifying the large amount of uncategorized social circles generated in Weibo media automatically from the perspective of events has been a promising task. Under this circumstance, in order to effectively organize and manage the huge amounts of users, thereby further managing their contents, we address the task of user classification in a more granular, event-based approach in this paper. By analyzing real data collected from Sina Weibo, we investigate the Weibo properties and utilize both content information and social network information to classify the numerous users into four primary groups: celebrities, organizations/media accounts, grassroots stars, and ordinary individuals. The experiments results show that our method identifies the user categories accurately.

  3. Event-based user classification in Weibo media.

    Science.gov (United States)

    Guo, Liang; Wang, Wendong; Cheng, Shiduan; Que, Xirong

    2014-01-01

    Weibo media, known as the real-time microblogging services, has attracted massive attention and support from social network users. Weibo platform offers an opportunity for people to access information and changes the way people acquire and disseminate information significantly. Meanwhile, it enables people to respond to the social events in a more convenient way. Much of the information in Weibo media is related to some events. Users who post different contents, and exert different behavior or attitude may lead to different contribution to the specific event. Therefore, classifying the large amount of uncategorized social circles generated in Weibo media automatically from the perspective of events has been a promising task. Under this circumstance, in order to effectively organize and manage the huge amounts of users, thereby further managing their contents, we address the task of user classification in a more granular, event-based approach in this paper. By analyzing real data collected from Sina Weibo, we investigate the Weibo properties and utilize both content information and social network information to classify the numerous users into four primary groups: celebrities, organizations/media accounts, grassroots stars, and ordinary individuals. The experiments results show that our method identifies the user categories accurately.

  4. Radar-Derived Quantitative Precipitation Estimation Based on Precipitation Classification

    Directory of Open Access Journals (Sweden)

    Lili Yang

    2016-01-01

    Full Text Available A method for improving radar-derived quantitative precipitation estimation is proposed. Tropical vertical profiles of reflectivity (VPRs are first determined from multiple VPRs. Upon identifying a tropical VPR, the event can be further classified as either tropical-stratiform or tropical-convective rainfall by a fuzzy logic (FL algorithm. Based on the precipitation-type fields, the reflectivity values are converted into rainfall rate using a Z-R relationship. In order to evaluate the performance of this rainfall classification scheme, three experiments were conducted using three months of data and two study cases. In Experiment I, the Weather Surveillance Radar-1988 Doppler (WSR-88D default Z-R relationship was applied. In Experiment II, the precipitation regime was separated into convective and stratiform rainfall using the FL algorithm, and corresponding Z-R relationships were used. In Experiment III, the precipitation regime was separated into convective, stratiform, and tropical rainfall, and the corresponding Z-R relationships were applied. The results show that the rainfall rates obtained from all three experiments match closely with the gauge observations, although Experiment II could solve the underestimation, when compared to Experiment I. Experiment III significantly reduced this underestimation and generated the most accurate radar estimates of rain rate among the three experiments.

  5. Treatment of esophageal motility disorders based on the chicago classification.

    Science.gov (United States)

    Maradey-Romero, Carla; Gabbard, Scott; Fass, Ronnie

    2014-12-01

    The Chicago Classification divides esophageal motor disorders based on the recorded value of the integrated relaxation pressure (IRP). The first group includes those with an elevated mean IRP that is associated with peristaltic abnormalities such as achalasia and esophagogastric junction outflow obstruction. The second group includes those with a normal mean IRP that is associated with esophageal hypermotility disorders such as distal esophageal spasm, hypercontractile esophagus (jackhammer esophagus), and hypertensive peristalsis (nutcracker esophagus). The third group includes those with a normal mean IRP that is associated with esophageal hypomotility peristaltic abnormalities such as absent peristalsis, weak peristalsis with small or large breaks, and frequent failed peristalsis. The therapeutic options vary greatly between the different groups of esophageal motor disorders. In achalasia patients, potential treatment strategies comprise medical therapy (calcium channel blockers, nitrates, and phosphodiesterase 5 inhibitors), endoscopic procedures (botulinum toxin A injection, pneumatic dilation, or peroral endoscopic myotomy) or surgery (Heller myotomy). Patients with a normal IRP and esophageal hypermotility disorder are candidates for medical therapy (nitrates, calcium channel blockers, phosphodiesterase 5 inhibitors, cimetropium/ipratropium bromide, proton pump inhibitors, benzodiazepines, tricyclic antidepressants, trazodone, selective serotonin reuptake inhibitors, and serotonin-norepinephrine reuptake inhibitors), endoscopic procedures (botulinum toxin A injection and peroral endoscopic myotomy), or surgery (Heller myotomy). Lastly, in patients with a normal IRP and esophageal hypomotility disorder, treatment is primarily focused on controlling the presence of gastroesophageal reflux with proton pump inhibitors and lifestyle modifications (soft and liquid diet and eating in the upright position) to address patient's dysphagia.

  6. China's Classification-Based Forest Management: Procedures, Problems, and Prospects

    Science.gov (United States)

    Dai, Limin; Zhao, Fuqiang; Shao, Guofan; Zhou, Li; Tang, Lina

    2009-06-01

    China’s new Classification-Based Forest Management (CFM) is a two-class system, including Commodity Forest (CoF) and Ecological Welfare Forest (EWF) lands, so named according to differences in their distinct functions and services. The purposes of CFM are to improve forestry economic systems, strengthen resource management in a market economy, ease the conflicts between wood demands and public welfare, and meet the diversified needs for forest services in China. The formative process of China’s CFM has involved a series of trials and revisions. China’s central government accelerated the reform of CFM in the year 2000 and completed the final version in 2003. CFM was implemented at the provincial level with the aid of subsidies from the central government. About a quarter of the forestland in China was approved as National EWF lands by the State Forestry Administration in 2006 and 2007. Logging is prohibited on National EWF lands, and their landowners or managers receive subsidies of about 70 RMB (US10) per hectare from the central government. CFM represents a new forestry strategy in China and its implementation inevitably faces challenges in promoting the understanding of forest ecological services, generalizing nationwide criteria for identifying EWF and CoF lands, setting up forest-specific compensation mechanisms for ecological benefits, enhancing the knowledge of administrators and the general public about CFM, and sustaining EWF lands under China’s current forestland tenure system. CFM does, however, offer a viable pathway toward sustainable forest management in China.

  7. Classification of CT brain images based on deep learning networks.

    Science.gov (United States)

    Gao, Xiaohong W; Hui, Rui; Tian, Zengmin

    2017-01-01

    While computerised tomography (CT) may have been the first imaging tool to study human brain, it has not yet been implemented into clinical decision making process for diagnosis of Alzheimer's disease (AD). On the other hand, with the nature of being prevalent, inexpensive and non-invasive, CT does present diagnostic features of AD to a great extent. This study explores the significance and impact on the application of the burgeoning deep learning techniques to the task of classification of CT brain images, in particular utilising convolutional neural network (CNN), aiming at providing supplementary information for the early diagnosis of Alzheimer's disease. Towards this end, three categories of CT images (N = 285) are clustered into three groups, which are AD, lesion (e.g. tumour) and normal ageing. In addition, considering the characteristics of this collection with larger thickness along the direction of depth (z) (~3-5 mm), an advanced CNN architecture is established integrating both 2D and 3D CNN networks. The fusion of the two CNN networks is subsequently coordinated based on the average of Softmax scores obtained from both networks consolidating 2D images along spatial axial directions and 3D segmented blocks respectively. As a result, the classification accuracy rates rendered by this elaborated CNN architecture are 85.2%, 80% and 95.3% for classes of AD, lesion and normal respectively with an average of 87.6%. Additionally, this improved CNN network appears to outperform the others when in comparison with 2D version only of CNN network as well as a number of state of the art hand-crafted approaches. As a result, these approaches deliver accuracy rates in percentage of 86.3, 85.6 ± 1.10, 86.3 ± 1.04, 85.2 ± 1.60, 83.1 ± 0.35 for 2D CNN, 2D SIFT, 2D KAZE, 3D SIFT and 3D KAZE respectively. The two major contributions of the paper constitute a new 3-D approach while applying deep learning technique to extract signature information

  8. World Music Ensemble: Kulintang

    Science.gov (United States)

    Beegle, Amy C.

    2012-01-01

    As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

  9. Basic Hand Gestures Classification Based on Surface Electromyography

    Directory of Open Access Journals (Sweden)

    Aleksander Palkowski

    2016-01-01

    Full Text Available This paper presents an innovative classification system for hand gestures using 2-channel surface electromyography analysis. The system developed uses the Support Vector Machine classifier, for which the kernel function and parameter optimisation are conducted additionally by the Cuckoo Search swarm algorithm. The system developed is compared with standard Support Vector Machine classifiers with various kernel functions. The average classification rate of 98.12% has been achieved for the proposed method.

  10. Ligand and structure-based classification models for Prediction of P-glycoprotein inhibitors

    DEFF Research Database (Denmark)

    Klepsch, Freya; Poongavanam, Vasanthanathan; Ecker, Gerhard Franz

    2014-01-01

    an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and non-inhibitors, correctly predicting 73/75 % of the external test set compounds. Classification based on the docking experiments using the scoring function Chem...

  11. Using ensemble weather forecast in a risk based real time optimization of urban drainage systems

    DEFF Research Database (Denmark)

    Courdent, Vianney Augustin Thomas; Vezzaro, Luca; Mikkelsen, Peter Steen

    2015-01-01

    Global Real Time Control (RTC) of urban drainage system is increasingly seen as cost-effective solution in order to respond to increasing performance demand (e.g. reduction of Combined Sewer Overflow, protection of sensitive areas as bathing water etc.). The Dynamic Overflow Risk Assessment (DORA......) strategy was developed to operate Urban Drainage Systems (UDS) in order to minimize the expected overflow risk by considering the water volume presently stored in the drainage network, the expected runoff volume based on a 2-hours radar forecast model and an estimated uncertainty of the runoff forecast....... However, such temporal horizon (1-2 hours) is relatively short when used for the operation of large storage facilities, which may require a few days to be emptied. This limits the performance of the optimization and control in reducing combined sewer overflow and in preparing for possible flooding. Based...

  12. Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts

    OpenAIRE

    Jonsson, Leif; Borg, Markus; Broman, David; Sandahl, Kristian; Eldh, Sigrid; Runeson, Per

    2016-01-01

    Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learni...

  13. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  14. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

    2015-01-01

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  15. Ensemble regression model-based anomaly detection for cyber-physical intrusion detection in smart grids

    DEFF Research Database (Denmark)

    Kosek, Anna Magdalena; Gehrke, Oliver

    2016-01-01

    The shift from centralised large production to distributed energy production has several consequences for current power system operation. The replacement of large power plants by growing numbers of distributed energy resources (DERs) increases the dependency of the power system on small scale......, distributed production. Many of these DERs can be accessed and controlled remotely, posing a cybersecurity risk. This paper investigates an intrusion detection system which evaluates the DER operation in order to discover unauthorized control actions. The proposed anomaly detection method is based...

  16. Research on Remote Sensing Image Classification Based on Feature Level Fusion

    Science.gov (United States)

    Yuan, L.; Zhu, G.

    2018-04-01

    Remote sensing image classification, as an important direction of remote sensing image processing and application, has been widely studied. However, in the process of existing classification algorithms, there still exists the phenomenon of misclassification and missing points, which leads to the final classification accuracy is not high. In this paper, we selected Sentinel-1A and Landsat8 OLI images as data sources, and propose a classification method based on feature level fusion. Compare three kind of feature level fusion algorithms (i.e., Gram-Schmidt spectral sharpening, Principal Component Analysis transform and Brovey transform), and then select the best fused image for the classification experimental. In the classification process, we choose four kinds of image classification algorithms (i.e. Minimum distance, Mahalanobis distance, Support Vector Machine and ISODATA) to do contrast experiment. We use overall classification precision and Kappa coefficient as the classification accuracy evaluation criteria, and the four classification results of fused image are analysed. The experimental results show that the fusion effect of Gram-Schmidt spectral sharpening is better than other methods. In four kinds of classification algorithms, the fused image has the best applicability to Support Vector Machine classification, the overall classification precision is 94.01 % and the Kappa coefficients is 0.91. The fused image with Sentinel-1A and Landsat8 OLI is not only have more spatial information and spectral texture characteristics, but also enhances the distinguishing features of the images. The proposed method is beneficial to improve the accuracy and stability of remote sensing image classification.

  17. A two-stage method of quantitative flood risk analysis for reservoir real-time operation using ensemble-based hydrologic forecasts

    Science.gov (United States)

    Liu, P.

    2013-12-01

    Quantitative analysis of the risk for reservoir real-time operation is a hard task owing to the difficulty of accurate description of inflow uncertainties. The ensemble-based hydrologic forecasts directly depict the inflows not only the marginal distributions but also their persistence via scenarios. This motivates us to analyze the reservoir real-time operating risk with ensemble-based hydrologic forecasts as inputs. A method is developed by using the forecast horizon point to divide the future time into two stages, the forecast lead-time and the unpredicted time. The risk within the forecast lead-time is computed based on counting the failure number of forecast scenarios, and the risk in the unpredicted time is estimated using reservoir routing with the design floods and the reservoir water levels of forecast horizon point. As a result, a two-stage risk analysis method is set up to quantify the entire flood risks by defining the ratio of the number of scenarios that excessive the critical value to the total number of scenarios. The China's Three Gorges Reservoir (TGR) is selected as a case study, where the parameter and precipitation uncertainties are implemented to produce ensemble-based hydrologic forecasts. The Bayesian inference, Markov Chain Monte Carlo, is used to account for the parameter uncertainty. Two reservoir operation schemes, the real operated and scenario optimization, are evaluated for the flood risks and hydropower profits analysis. With the 2010 flood, it is found that the improvement of the hydrologic forecast accuracy is unnecessary to decrease the reservoir real-time operation risk, and most risks are from the forecast lead-time. It is therefore valuable to decrease the avarice of ensemble-based hydrologic forecasts with less bias for a reservoir operational purpose.

  18. An ensemble based top performing approach for NCI-DREAM drug sensitivity prediction challenge.

    Directory of Open Access Journals (Sweden)

    Qian Wan

    Full Text Available We consider the problem of predicting sensitivity of cancer cell lines to new drugs based on supervised learning on genomic profiles. The genetic and epigenetic characterization of a cell line provides observations on various aspects of regulation including DNA copy number variations, gene expression, DNA methylation and protein abundance. To extract relevant information from the various data types, we applied a random forest based approach to generate sensitivity predictions from each type of data and combined the predictions in a linear regression model to generate the final drug sensitivity prediction. Our approach when applied to the NCI-DREAM drug sensitivity prediction challenge was a top performer among 47 teams and produced high accuracy predictions. Our results show that the incorporation of multiple genomic characterizations lowered the mean and variance of the estimated bootstrap prediction error. We also applied our approach to the Cancer Cell Line Encyclopedia database for sensitivity prediction and the ability to extract the top targets of an anti-cancer drug. The results illustrate the effectiveness of our approach in predicting drug sensitivity from heterogeneous genomic datasets.

  19. Intelligence system based classification approach for medical disease diagnosis

    Science.gov (United States)

    Sagir, Abdu Masanawa; Sathasivam, Saratha

    2017-08-01

    The prediction of breast cancer in women who have no signs or symptoms of the disease as well as survivability after undergone certain surgery has been a challenging problem for medical researchers. The decision about presence or absence of diseases depends on the physician's intuition, experience and skill for comparing current indicators with previous one than on knowledge rich data hidden in a database. This measure is a very crucial and challenging task. The goal is to predict patient condition by using an adaptive neuro fuzzy inference system (ANFIS) pre-processed by grid partitioning. To achieve an accurate diagnosis at this complex stage of symptom analysis, the physician may need efficient diagnosis system. A framework describes methodology for designing and evaluation of classification performances of two discrete ANFIS systems of hybrid learning algorithms least square estimates with Modified Levenberg-Marquardt and Gradient descent algorithms that can be used by physicians to accelerate diagnosis process. The proposed method's performance was evaluated based on training and test datasets with mammographic mass and Haberman's survival Datasets obtained from benchmarked datasets of University of California at Irvine's (UCI) machine learning repository. The robustness of the performance measuring total accuracy, sensitivity and specificity is examined. In comparison, the proposed method achieves superior performance when compared to conventional ANFIS based gradient descent algorithm and some related existing methods. The software used for the implementation is MATLAB R2014a (version 8.3) and executed in PC Intel Pentium IV E7400 processor with 2.80 GHz speed and 2.0 GB of RAM.

  20. An ensemble-based dynamic Bayesian averaging approach for discharge simulations using multiple global precipitation products and hydrological models

    Science.gov (United States)

    Qi, Wei; Liu, Junguo; Yang, Hong; Sweetapple, Chris

    2018-03-01

    Global precipitation products are very important datasets in flow simulations, especially in poorly gauged regions. Uncertainties resulting from precipitation products, hydrological models and their combinations vary with time and data magnitude, and undermine their application to flow simulations. However, previous studies have not quantified these uncertainties individually and explicitly. This study developed an ensemble-based dynamic Bayesian averaging approach (e-Bay) for deterministic discharge simulations using multiple global precipitation products and hydrological models. In this approach, the joint probability of precipitation products and hydrological models being correct is quantified based on uncertainties in maximum and mean estimation, posterior probability is quantified as functions of the magnitude and timing of discharges, and the law of total probability is implemented to calculate expected discharges. Six global fine-resolution precipitation products and two hydrological models of different complexities are included in an illustrative application. e-Bay can effectively quantify uncertainties and therefore generate better deterministic discharges than traditional approaches (weighted average methods with equal and varying weights and maximum likelihood approach). The mean Nash-Sutcliffe Efficiency values of e-Bay are up to 0.97 and 0.85 in training and validation periods respectively, which are at least 0.06 and 0.13 higher than traditional approaches. In addition, with increased training data, assessment criteria values of e-Bay show smaller fluctuations than traditional approaches and its performance becomes outstanding. The proposed e-Bay approach bridges the gap between global precipitation products and their pragmatic applications to discharge simulations, and is beneficial to water resources management in ungauged or poorly gauged regions across the world.

  1. A semiautomatic CT-based ensemble segmentation of lung tumors: comparison with oncologists' delineations and with the surgical specimen.

    Science.gov (United States)

    Rios Velazquez, Emmanuel; Aerts, Hugo J W L; Gu, Yuhua; Goldgof, Dmitry B; De Ruysscher, Dirk; Dekker, Andre; Korn, René; Gillies, Robert J; Lambin, Philippe

    2012-11-01

    To assess the clinical relevance of a semiautomatic CT-based ensemble segmentation method, by comparing it to pathology and to CT/PET manual delineations by five independent radiation oncologists in non-small cell lung cancer (NSCLC). For 20 NSCLC patients (stages Ib-IIIb) the primary tumor was delineated manually on CT/PET scans by five independent radiation oncologists and segmented using a CT based semi-automatic tool. Tumor volume and overlap fractions between manual and semiautomatic-segmented volumes were compared. All measurements were correlated with the maximal diameter on macroscopic examination of the surgical specimen. Imaging data are available on www.cancerdata.org. High overlap fractions were observed between the semi-automatically segmented volumes and the intersection (92.5±9.0, mean±SD) and union (94.2±6.8) of the manual delineations. No statistically significant differences in tumor volume were observed between the semiautomatic segmentation (71.4±83.2 cm(3), mean±SD) and manual delineations (81.9±94.1 cm(3); p=0.57). The maximal tumor diameter of the semiautomatic-segmented tumor correlated strongly with the macroscopic diameter of the primary tumor (r=0.96). Semiautomatic segmentation of the primary tumor on CT demonstrated high agreement with CT/PET manual delineations and strongly correlated with the macroscopic diameter considered as the "gold standard". This method may be used routinely in clinical practice and could be employed as a starting point for treatment planning, target definition in multi-center clinical trials or for high throughput data mining research. This method is particularly suitable for peripherally located tumors. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  2. Similarity-based multi-model ensemble approach for 1-15-day advance prediction of monsoon rainfall over India

    Science.gov (United States)

    Jaiswal, Neeru; Kishtawal, C. M.; Bhomia, Swati

    2018-04-01

    The southwest (SW) monsoon season (June, July, August and September) is the major period of rainfall over the Indian region. The present study focuses on the development of a new multi-model ensemble approach based on the similarity criterion (SMME) for the prediction of SW monsoon rainfall in the extended range. This approach is based on the assumption that training with the similar type of conditions may provide the better forecasts in spite of the sequential training which is being used in the conventional MME approaches. In this approach, the training dataset has been selected by matching the present day condition to the archived dataset and days with the most similar conditions were identified and used for training the model. The coefficients thus generated were used for the rainfall prediction. The precipitation forecasts from four general circulation models (GCMs), viz. European Centre for Medium-Range Weather Forecasts (ECMWF), United Kingdom Meteorological Office (UKMO), National Centre for Environment Prediction (NCEP) and China Meteorological Administration (CMA) have been used for developing the SMME forecasts. The forecasts of 1-5, 6-10 and 11-15 days were generated using the newly developed approach for each pentad of June-September during the years 2008-2013 and the skill of the model was analysed using verification scores, viz. equitable skill score (ETS), mean absolute error (MAE), Pearson's correlation coefficient and Nash-Sutcliffe model efficiency index. Statistical analysis of SMME forecasts shows superior forecast skill compared to the conventional MME and the individual models for all the pentads, viz. 1-5, 6-10 and 11-15 days.

  3. Ensemble-based computational approach discriminates functional activity of p53 cancer and rescue mutants.

    Directory of Open Access Journals (Sweden)

    Özlem Demir

    2011-10-01

    Full Text Available The tumor suppressor protein p53 can lose its function upon single-point missense mutations in the core DNA-binding domain ("cancer mutants". Activity can be restored by second-site suppressor mutations ("rescue mutants". This paper relates the functional activity of p53 cancer and rescue mutants to their overall molecular dynamics (MD, without focusing on local structural details. A novel global measure of protein flexibility for the p53 core DNA-binding domain, the number of clusters at a certain RMSD cutoff, was computed by clustering over 0.7 µs of explicitly solvated all-atom MD simulations. For wild-type p53 and a sample of p53 cancer or rescue mutants, the number of clusters was a good predictor of in vivo p53 functional activity in cell-based assays. This number-of-clusters (NOC metric was strongly correlated (r(2 = 0.77 with reported values of experimentally measured ΔΔG protein thermodynamic stability. Interpreting the number of clusters as a measure of protein flexibility: (i p53 cancer mutants were more flexible than wild-type protein, (ii second-site rescue mutations decreased the flexibility of cancer mutants, and (iii negative controls of non-rescue second-site mutants did not. This new method reflects the overall stability of the p53 core domain and can discriminate which second-site mutations restore activity to p53 cancer mutants.

  4. Hierarchical structure for audio-video based semantic classification of sports video sequences

    Science.gov (United States)

    Kolekar, M. H.; Sengupta, S.

    2005-07-01

    A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.

  5. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  6. Reacting to different types of concept drift: the Accuracy Updated Ensemble algorithm.

    Science.gov (United States)

    Brzezinski, Dariusz; Stefanowski, Jerzy

    2014-01-01

    Data stream mining has been receiving increased attention due to its presence in a wide range of applications, such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream's underlying data distribution. Several classification algorithms that cope with concept drift have been put forward, however, most of them specialize in one type of change. In this paper, we propose a new data stream classifier, called the Accuracy Updated Ensemble (AUE2), which aims at reacting equally well to different types of drift. AUE2 combines accuracy-based weighting mechanisms known from block-based ensembles with the incremental nature of Hoeffding Trees. The proposed algorithm is experimentally compared with 11 state-of-the-art stream methods, including single classifiers, block-based and online ensembles, and hybrid approaches in different drift scenarios. Out of all the compared algorithms, AUE2 provided best average classification accuracy while proving to be less memory consuming than other ensemble approaches. Experimental results show that AUE2 can be considered suitable for scenarios, involving many types of drift as well as static environments.

  7. Classification of urine sediment based on convolution neural network

    Science.gov (United States)

    Pan, Jingjing; Jiang, Cunbo; Zhu, Tiantian

    2018-04-01

    By designing a new convolution neural network framework, this paper breaks the constraints of the original convolution neural network framework requiring large training samples and samples of the same size. Move and cropping the input images, generate the same size of the sub-graph. And then, the generated sub-graph uses the method of dropout, increasing the diversity of samples and preventing the fitting generation. Randomly select some proper subset in the sub-graphic set and ensure that the number of elements in the proper subset is same and the proper subset is not the same. The proper subsets are used as input layers for the convolution neural network. Through the convolution layer, the pooling, the full connection layer and output layer, we can obtained the classification loss rate of test set and training set. In the red blood cells, white blood cells, calcium oxalate crystallization classification experiment, the classification accuracy rate of 97% or more.

  8. Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

    Science.gov (United States)

    Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

    Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.

  9. Performance analysis of a Principal Component Analysis ensemble classifier for Emotiv headset P300 spellers.

    Science.gov (United States)

    Elsawy, Amr S; Eldawlatly, Seif; Taher, Mohamed; Aly, Gamal M

    2014-01-01

    The current trend to use Brain-Computer Interfaces (BCIs) with mobile devices mandates the development of efficient EEG data processing methods. In this paper, we demonstrate the performance of a Principal Component Analysis (PCA) ensemble classifier for P300-based spellers. We recorded EEG data from multiple subjects using the Emotiv neuroheadset in the context of a classical oddball P300 speller paradigm. We compare the performance of the proposed ensemble classifier to the performance of traditional feature extraction and classifier methods. Our results demonstrate the capability of the PCA ensemble classifier to classify P300 data recorded using the Emotiv neuroheadset with an average accuracy of 86.29% on cross-validation data. In addition, offline testing of the recorded data reveals an average classification accuracy of 73.3% that is significantly higher than that achieved using traditional methods. Finally, we demonstrate the effect of the parameters of the P300 speller paradigm on the performance of the method.

  10. Data classification based on the hybrid intellectual technology

    Directory of Open Access Journals (Sweden)

    Demidova Liliya

    2018-01-01

    Full Text Available In this paper the data classification technique, implying the consistent application of the SVM and Parzen classifiers, has been suggested. The Parser classifier applies to data which can be both correctly and erroneously classified using the SVM classifier, and are located in the experimentally defined subareas near the hyperplane which separates the classes. A herewith, the SVM classifier is used with the default parameters values, and the optimal parameters values of the Parser classifier are determined using the genetic algorithm. The experimental results confirming the effectiveness of the proposed hybrid intellectual data classification technology have been presented.

  11. Woven fabric defects detection based on texture classification algorithm

    International Nuclear Information System (INIS)

    Ben Salem, Y.; Nasri, S.

    2011-01-01

    In this paper we have compared two famous methods in texture classification to solve the problem of recognition and classification of defects occurring in a textile manufacture. We have compared local binary patterns method with co-occurrence matrix. The classifier used is the support vector machines (SVM). The system has been tested using TILDA database. The results obtained are interesting and show that LBP is a good method for the problems of recognition and classifcation defects, it gives a good running time especially for the real time applications.

  12. Classification of Gait Types Based on the Duty-factor

    DEFF Research Database (Denmark)

    Fihl, Preben; Moeslund, Thomas B.

    2007-01-01

    on the speed of the human, the cameras setup etc. and hence a robust descriptor for gait classification. The dutyfactor is basically a matter of measuring the ground support of the feet with respect to the stride. We estimate this by comparing the incoming silhouettes to a database of silhouettes with known...... ground support. Silhouettes are extracted using the Codebook method and represented using Shape Contexts. The matching with database silhouettes is done using the Hungarian method. While manually estimated duty-factors show a clear classification the presented system contains misclassifications due...

  13. SVM-based Partial Discharge Pattern Classification for GIS

    Science.gov (United States)

    Ling, Yin; Bai, Demeng; Wang, Menglin; Gong, Xiaojin; Gu, Chao

    2018-01-01

    Partial discharges (PD) occur when there are localized dielectric breakdowns in small regions of gas insulated substations (GIS). It is of high importance to recognize the PD patterns, through which we can diagnose the defects caused by different sources so that predictive maintenance can be conducted to prevent from unplanned power outage. In this paper, we propose an approach to perform partial discharge pattern classification. It first recovers the PRPD matrices from the PRPD2D images; then statistical features are extracted from the recovered PRPD matrix and fed into SVM for classification. Experiments conducted on a dataset containing thousands of images demonstrates the high effectiveness of the method.

  14. An application-based classification to understand buyer-seller interaction in business services

    NARCIS (Netherlands)

    Valk, van der W.; Wynstra, J.Y.F.; Axelsson, B.

    2006-01-01

    Abstract: Purpose – Most existing classifications of business services have taken the perspective of the supplier as opposed to that of the buyer. To address this imbalance, the purpose of this paper is to propose a classification of business services based on how the buying company applies the

  15. Initial steps towards an evidence-based classification system for golfers with a physical impairment

    NARCIS (Netherlands)

    Stoter, Inge K.; Hettinga, Florentina J.; Altmann, Viola; Eisma, Wim; Arendzen, Hans; Bennett, Tony; van der Woude, Lucas H.; Dekker, Rienk

    2017-01-01

    Purpose: The present narrative review aims to make a first step towards an evidence-based classification system in handigolf following the International Paralympic Committee (IPC). It intends to create a conceptual framework of classification for handigolf and an agenda for future research. Method:

  16. Vision-Based Perception and Classification of Mosquitoes Using Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Masataka Fuchida

    2017-01-01

    Full Text Available The need for a novel automated mosquito perception and classification method is becoming increasingly essential in recent years, with steeply increasing number of mosquito-borne diseases and associated casualties. There exist remote sensing and GIS-based methods for mapping potential mosquito inhabitants and locations that are prone to mosquito-borne diseases, but these methods generally do not account for species-wise identification of mosquitoes in closed-perimeter regions. Traditional methods for mosquito classification involve highly manual processes requiring tedious sample collection and supervised laboratory analysis. In this research work, we present the design and experimental validation of an automated vision-based mosquito classification module that can deploy in closed-perimeter mosquito inhabitants. The module is capable of identifying mosquitoes from other bugs such as bees and flies by extracting the morphological features, followed by support vector machine-based classification. In addition, this paper presents the results of three variants of support vector machine classifier in the context of mosquito classification problem. This vision-based approach to the mosquito classification problem presents an efficient alternative to the conventional methods for mosquito surveillance, mapping and sample image collection. Experimental results involving classification between mosquitoes and a predefined set of other bugs using multiple classification strategies demonstrate the efficacy and validity of the proposed approach with a maximum recall of 98%.

  17. A multi-scale ensemble-based framework for forecasting compound coastal-riverine flooding: The Hackensack-Passaic watershed and Newark Bay

    Science.gov (United States)

    Saleh, F.; Ramaswamy, V.; Wang, Y.; Georgas, N.; Blumberg, A.; Pullen, J.

    2017-12-01

    Estuarine regions can experience compound impacts from coastal storm surge and riverine flooding. The challenges in forecasting flooding in such areas are multi-faceted due to uncertainties associated with meteorological drivers and interactions between hydrological and coastal processes. The objective of this work is to evaluate how uncertainties from meteorological predictions propagate through an ensemble-based flood prediction framework and translate into uncertainties in simulated inundation extents. A multi-scale framework, consisting of hydrologic, coastal and hydrodynamic models, was used to simulate two extreme flood events at the confluence of the Passaic and Hackensack rivers and Newark Bay. The events were Hurricane Irene (2011), a combination of inland flooding and coastal storm surge, and Hurricane Sandy (2012) where coastal storm surge was the dominant component. The hydrodynamic component of the framework was first forced with measured streamflow and ocean water level data to establish baseline inundation extents with the best available forcing data. The coastal and hydrologic models were then forced with meteorological predictions from 21 ensemble members of the Global Ensemble Forecast System (GEFS) to retrospectively represent potential future conditions up to 96 hours prior to the events. Inundation extents produced by the hydrodynamic model, forced with the 95th percentile of the ensemble-based coastal and hydrologic boundary conditions, were in good agreement with baseline conditions for both events. The USGS reanalysis of Hurricane Sandy inundation extents was encapsulated between the 50th and 95th percentile of the forecasted inundation extents, and that of Hurricane Irene was similar but with caveats associated with data availability and reliability. This work highlights the importance of accounting for meteorological uncertainty to represent a range of possible future inundation extents at high resolution (∼m).

  18. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  19. Colour based off-road environment and terrain type classification

    NARCIS (Netherlands)

    Jansen, P.; Mark, W. van der; Heuvel, J.C. van den; Groen, F.C.A.

    2005-01-01

    Terrain classification is an important problem that still remains to be solved for off-road autonomous robot vehicle guidance. Often, obstacle detection systems are used which cannot distinguish between solid obstacles such as rocks or soft obstacles such as tall patches of grass. Terrain

  20. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  1. A vegetation-based hierarchical classification for seasonally pulsed ...

    African Journals Online (AJOL)

    A classification scheme is presented for seasonal floodplains of the Boro-Xudum distributary of the Okavango Delta, Botswana. This distributary is subject to an annual flood-pulse, the inundated area varying from a mean low of 3 600 km2 to a mean high of 5 400 km2 between 2000 and 2006. A stratified random sample of ...

  2. A Classification System for Hospital-Based Infection Outbreaks

    Directory of Open Access Journals (Sweden)

    Paul S. Ganney

    2010-01-01

    Full Text Available Outbreaks of infection within semi-closed environments such as hospitals, whether inherent in the environment (such as Clostridium difficile (C.Diff or Methicillinresistant Staphylococcus aureus (MRSA or imported from the wider community (such as Norwalk-like viruses (NLVs, are difficult to manage. As part of our work on modelling such outbreaks, we have developed a classification system to describe the impact of a particular outbreak upon an organization. This classification system may then be used in comparing appropriate computer models to real outbreaks, as well as in comparing different real outbreaks in, for example, the comparison of differing management and containment techniques and strategies. Data from NLV outbreaks in the Hull and East Yorkshire Hospitals NHS Trust (the Trust over several previous years are analysed and classified, both for infection within staff (where the end of infection date may not be known and within patients (where it generally is known. A classification system consisting of seven elements is described, along with a goodness-of-fit method for comparing a new classification to previously known ones, for use in evaluating a simulation against history and thereby determining how ‘realistic’ (or otherwise it is.

  3. A classification system for hospital-based infection outbreaks.

    Science.gov (United States)

    Ganney, Paul S; Madeo, Maurice; Phillips, Roger

    2010-12-01

    Outbreaks of infection within semi-closed environments such as hospitals, whether inherent in the environment (such as Clostridium difficile (C.Diff) or Methicillin-resistant Staphylococcus aureus (MRSA) or imported from the wider community (such as Norwalk-like viruses (NLVs)), are difficult to manage. As part of our work on modelling such outbreaks, we have developed a classification system to describe the impact of a particular outbreak upon an organization. This classification system may then be used in comparing appropriate computer models to real outbreaks, as well as in comparing different real outbreaks in, for example, the comparison of differing management and containment techniques and strategies. Data from NLV outbreaks in the Hull and East Yorkshire Hospitals NHS Trust (the Trust) over several previous years are analysed and classified, both for infection within staff (where the end of infection date may not be known) and within patients (where it generally is known). A classification system consisting of seven elements is described, along with a goodness-of-fit method for comparing a new classification to previously known ones, for use in evaluating a simulation against history and thereby determining how 'realistic' (or otherwise) it is.

  4. A probabilistic approach of the Flash Flood Early Warning System (FF-EWS) in Catalonia based on radar ensemble generation

    Science.gov (United States)

    Velasco, David; Sempere-Torres, Daniel; Corral, Carles; Llort, Xavier; Velasco, Enrique

    2010-05-01

    probabilistic component to the FF-EWS. As a first step, we have incorporated the uncertainty in rainfall estimates and forecasts based on an ensemble of equiprobable rainfall scenarios. The presented study has focused on a number of rainfall events and the performance of the FF-EWS evaluated in terms of its ability to produce probabilistic hazard warnings for decision-making support.

  5. Proposing a Hybrid Model Based on Robson's Classification for Better Impact on Trends of Cesarean Deliveries.

    Science.gov (United States)

    Hans, Punit; Rohatgi, Renu

    2017-06-01

    To construct a hybrid model classification for cesarean section (CS) deliveries based on the woman-characteristics (Robson's classification with additional layers of indications for CS, keeping in view low-resource settings available in India). This is a cross-sectional study conducted at Nalanda Medical College, Patna. All the women delivered from January 2016 to May 2016 in the labor ward were included. Results obtained were compared with the values obtained for India, from secondary analysis of WHO multi-country survey (2010-2011) by Joshua Vogel and colleagues' study published in "The Lancet Global Health." The three classifications (indication-based, Robson's and hybrid model) applied for categorization of the cesarean deliveries from the same sample of data and a semiqualitative evaluations done, considering the main characteristics, strengths and weaknesses of each classification system. The total number of women delivered during study period was 1462, out of which CS deliveries were 471. Overall, CS rate calculated for NMCH, hospital in this specified period, was 32.21% ( p  = 0.001). Hybrid model scored 23/23, and scores of Robson classification and indication-based classification were 21/23 and 10/23, respectively. Single-study centre and referral bias are the limitations of the study. Given the flexibility of the classifications, we constructed a hybrid model based on the woman-characteristics system with additional layers of other classification. Indication-based classification answers why, Robson classification answers on whom, while through our hybrid model we get to know why and on whom cesarean deliveries are being performed.

  6. Applying Topographic Classification, Based on the Hydrological Process, to Design Habitat Linkages for Climate Change

    Directory of Open Access Journals (Sweden)

    Yongwon Mo

    2017-11-01

    Full Text Available The use of biodiversity surrogates has been discussed in the context of designing habitat linkages to support the migration of species affected by climate change. Topography has been proposed as a useful surrogate in the coarse-filter approach, as the hydrological process caused by topography such as erosion and accumulation is the basis of ecological processes. However, some studies that have designed topographic linkages as habitat linkages, so far have focused much on the shape of the topography (morphometric topographic classification with little emphasis on the hydrological processes (generic topographic classification to find such topographic linkages. We aimed to understand whether generic classification was valid for designing these linkages. First, we evaluated whether topographic classification is more appropriate for describing actual (coniferous and deciduous and potential (mammals and amphibians habitat distributions. Second, we analyzed the difference in the linkages between the morphometric and generic topographic classifications. The results showed that the generic classification represented the actual distribution of the trees, but neither the morphometric nor the generic classification could represent the potential animal distributions adequately. Our study demonstrated that the topographic classes, according to the generic classification, were arranged successively according to the flow of water, nutrients, and sediment; therefore, it would be advantageous to secure linkages with a width of 1 km or more. In addition, the edge effect would be smaller than with the morphometric classification. Accordingly, we suggest that topographic characteristics, based on the hydrological process, are required to design topographic linkages for climate change.

  7. Intelligent classification of electrocardiogram (ECG) signal using extended Kalman Filter (EKF) based neuro fuzzy system.

    Science.gov (United States)

    Meau, Yeong Pong; Ibrahim, Fatimah; Narainasamy, Selvanathan A L; Omar, Razali

    2006-05-01

    This study presents the development of a hybrid system consisting of an ensemble of Extended Kalman Filter (EKF) based Multi Layer Perceptron Network (MLPN) and a one-pass learning Fuzzy Inference System using Look-up Table Scheme for the recognition of electrocardiogram (ECG) signals. This system can distinguish various types of abnormal ECG signals such as Ventricular Premature Cycle (VPC), T wave inversion (TINV), ST segment depression (STDP), and Supraventricular Tachycardia (SVT) from normal sinus rhythm (NSR) ECG signal.

  8. Hydrologic-Process-Based Soil Texture Classifications for Improved Visualization of Landscape Function

    Science.gov (United States)

    Groenendyk, Derek G.; Ferré, Ty P.A.; Thorp, Kelly R.; Rice, Amy K.

    2015-01-01

    Soils lie at the interface between the atmosphere and the subsurface and are a key component that control ecosystem services, food production, and many other processes at the Earth’s surface. There is a long-established convention for identifying and mapping soils by texture. These readily available, georeferenced soil maps and databases are used widely in environmental sciences. Here, we show that these traditional soil classifications can be inappropriate, contributing to bias and uncertainty in applications from slope stability to water resource management. We suggest a new approach to soil classification, with a detailed example from the science of hydrology. Hydrologic simulations based on common meteorological conditions were performed using HYDRUS-1D, spanning textures identified by the United States Department of Agriculture soil texture triangle. We consider these common conditions to be: drainage from saturation, infiltration onto a drained soil, and combined infiltration and drainage events. Using a k-means clustering algorithm, we created soil classifications based on the modeled hydrologic responses of these soils. The hydrologic-process-based classifications were compared to those based on soil texture and a single hydraulic property, Ks. Differences in classifications based on hydrologic response versus soil texture demonstrate that traditional soil texture classification is a poor predictor of hydrologic response. We then developed a QGIS plugin to construct soil maps combining a classification with georeferenced soil data from the Natural Resource Conservation Service. The spatial patterns of hydrologic response were more immediately informative, much simpler, and less ambiguous, for use in applications ranging from trafficability to irrigation management to flood control. The ease with which hydrologic-process-based classifications can be made, along with the improved quantitative predictions of soil responses and visualization of landscape

  9. Comparison of hand-craft feature based SVM and CNN based deep learning framework for automatic polyp classification.

    Science.gov (United States)

    Younghak Shin; Balasingham, Ilangko

    2017-07-01

    Colonoscopy is a standard method for screening polyps by highly trained physicians. Miss-detected polyps in colonoscopy are potential risk factor for colorectal cancer. In this study, we investigate an automatic polyp classification framework. We aim to compare two different approaches named hand-craft feature method and convolutional neural network (CNN) based deep learning method. Combined shape and color features are used for hand craft feature extraction and support vector machine (SVM) method is adopted for classification. For CNN approach, three convolution and pooling based deep learning framework is used for classification purpose. The proposed framework is evaluated using three public polyp databases. From the experimental results, we have shown that the CNN based deep learning framework shows better classification performance than the hand-craft feature based methods. It achieves over 90% of classification accuracy, sensitivity, specificity and precision.

  10. Cell-based therapy technology classifications and translational challenges

    Science.gov (United States)

    Mount, Natalie M.; Ward, Stephen J.; Kefalas, Panos; Hyllner, Johan

    2015-01-01

    Cell therapies offer the promise of treating and altering the course of diseases which cannot be addressed adequately by existing pharmaceuticals. Cell therapies are a diverse group across cell types and therapeutic indications and have been an active area of research for many years but are now strongly emerging through translation and towards successful commercial development and patient access. In this article, we present a description of a classification of cell therapies on the basis of their underlying technologies rather than the more commonly used classification by cell type because the regulatory path and manufacturing solutions are often similar within a technology area due to the nature of the methods used. We analyse the progress of new cell therapies towards clinical translation, examine how they are addressing the clinical, regulatory, manufacturing and reimbursement requirements, describe some of the remaining challenges and provide perspectives on how the field may progress for the future. PMID:26416686

  11. Advanced Atmospheric Ensemble Modeling Techniques

    Energy Technology Data Exchange (ETDEWEB)

    Buckley, R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Chiswell, S. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Kurzeja, R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Maze, G. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Viner, B. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Werth, D. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL)

    2017-09-29

    Ensemble modeling (EM), the creation of multiple atmospheric simulations for a given time period, has become an essential tool for characterizing uncertainties in model predictions. We explore two novel ensemble modeling techniques: (1) perturbation of model parameters (Adaptive Programming, AP), and (2) data assimilation (Ensemble Kalman Filter, EnKF). The current research is an extension to work from last year and examines transport on a small spatial scale (<100 km) in complex terrain, for more rigorous testing of the ensemble technique. Two different release cases were studied, a coastal release (SF6) and an inland release (Freon) which consisted of two release times. Observations of tracer concentration and meteorology are used to judge the ensemble results. In addition, adaptive grid techniques have been developed to reduce required computing resources for transport calculations. Using a 20- member ensemble, the standard approach generated downwind transport that was quantitatively good for both releases; however, the EnKF method produced additional improvement for the coastal release where the spatial and temporal differences due to interior valley heating lead to the inland movement of the plume. The AP technique showed improvements for both release cases, with more improvement shown in the inland release. This research demonstrated that transport accuracy can be improved when models are adapted to a particular location/time or when important local data is assimilated into the simulation and enhances SRNL’s capability in atmospheric transport modeling in support of its current customer base and local site missions, as well as our ability to attract new customers within the intelligence community.

  12. Support Vector Machine Based Tool for Plant Species Taxonomic Classification

    OpenAIRE

    Manimekalai .K; Vijaya.MS

    2014-01-01

    Plant species are living things and are generally categorized in terms of Domain, Kingdom, Phylum, Class, Order, Family, Genus and name of Species in a hierarchical fashion. This paper formulates the taxonomic leaf categorization problem as the hierarchical classification task and provides a suitable solution using a supervised learning technique namely support vector machine. Features are extracted from scanned images of plant leaves and trained using SVM. Only class, order, family of plants...

  13. Image Analysis and Classification Based on Soil Strength

    Science.gov (United States)

    2016-08-01

    Impact Hammer, which is light, easy to operate, and cost effective . The Clegg Impact Hammer measures stiffness of the soil surface by drop- ping a... effect on out-of-scene classifications. More statistical analy- sis should, however, be done to compare the measured field spectra, the WV2 training...DISCLAIMER: The contents of this report are not to be used for advertising , publication, or promotional purposes. Ci- tation of trade names does not

  14. Three-Class Mammogram Classification Based on Descriptive CNN Features

    Directory of Open Access Journals (Sweden)

    M. Mohsin Jadoon

    2017-01-01

    Full Text Available In this paper, a novel classification technique for large data set of mammograms using a deep learning method is proposed. The proposed model targets a three-class classification study (normal, malignant, and benign cases. In our model we have presented two methods, namely, convolutional neural network-discrete wavelet (CNN-DW and convolutional neural network-curvelet transform (CNN-CT. An augmented data set is generated by using mammogram patches. To enhance the contrast of mammogram images, the data set is filtered by contrast limited adaptive histogram equalization (CLAHE. In the CNN-DW method, enhanced mammogram images are decomposed as its four subbands by means of two-dimensional discrete wavelet transform (2D-DWT, while in the second method discrete curvelet transform (DCT is used. In both methods, dense scale invariant feature (DSIFT for all subbands is extracted. Input data matrix containing these subband features of all the mammogram patches is created that is processed as input to convolutional neural network (CNN. Softmax layer and support vector machine (SVM layer are used to train CNN for classification. Proposed methods have been compared with existing methods in terms of accuracy rate, error rate, and various validation assessment measures. CNN-DW and CNN-CT have achieved accuracy rate of 81.83% and 83.74%, respectively. Simulation results clearly validate the significance and impact of our proposed model as compared to other well-known existing techniques.

  15. Interrater reliability of a Pilates movement-based classification system.

    Science.gov (United States)

    Yu, Kwan Kenny; Tulloch, Evelyn; Hendrick, Paul

    2015-01-01

    To determine the interrater reliability for identification of a specific movement pattern using a Pilates Classification system. Videos of 5 subjects performing specific movement tasks were sent to raters trained in the DMA-CP classification system. Ninety-six raters completed the survey. Interrater reliability for the detection of a directional bias was excellent (Pi = 0.92, and K(free) = 0.89). Interrater reliability for classifying an individual into a specific subgroup was moderate (Pi = 0.64, K(free) = 0.55) however raters who had completed levels 1-4 of the DMA-CP training and reported using the assessment daily demonstrated excellent reliability (Pi = 0.89 and K(free) = 0.87). The reliability of the classification system demonstrated almost perfect agreement in determining the existence of a specific movement pattern and classifying into a subgroup for experienced raters. There was a trend for greater reliability associated with increased levels of training and experience of the raters. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. SB certification handout material requirements, test methods, responsibilities, and minimum classification levels for mixture-based specification for flexible base.

    Science.gov (United States)

    2012-10-01

    A handout with tables representing the material requirements, test methods, responsibilities, and minimum classification levels mixture-based specification for flexible base and details on aggregate and test methods employed, along with agency and co...

  17. An ensemble machine learning approach to predict survival in breast cancer.

    Science.gov (United States)

    Djebbari, Amira; Liu, Ziying; Phan, Sieu; Famili, Fazel

    2008-01-01

    Current breast cancer predictive signatures are not unique. Can we use this fact to our advantage to improve prediction? From the machine learning perspective, it is well known that combining multiple classifiers can improve classification performance. We propose an ensemble machine learning approach which consists of choosing feature subsets and learning predictive models from them. We then combine models based on certain model fusion criteria and we also introduce a tuning parameter to control sensitivity. Our method significantly improves classification performance with a particular emphasis on sensitivity which is critical to avoid misclassifying poor prognosis patients as good prognosis.

  18. Constraining a compositional flow model with flow-chemical data using an ensemble-based Kalman filter

    KAUST Repository

    Gharamti, M. E.; Kadoura, A.; Valstar, J.; Sun, S.; Hoteit, Ibrahim

    2014-01-01

    Isothermal compositional flow models require coupling transient compressible flows and advective transport systems of various chemical species in subsurface porous media. Building such numerical models is quite challenging and may be subject to many sources of uncertainties because of possible incomplete representation of some geological parameters that characterize the system's processes. Advanced data assimilation methods, such as the ensemble Kalman filter (EnKF), can be used to calibrate these models by incorporating available data. In this work, we consider the problem of estimating reservoir permeability using information about phase pressure as well as the chemical properties of fluid components. We carry out state-parameter estimation experiments using joint and dual updating schemes in the context of the EnKF with a two-dimensional single-phase compositional flow model (CFM). Quantitative and statistical analyses are performed to evaluate and compare the performance of the assimilation schemes. Our results indicate that including chemical composition data significantly enhances the accuracy of the permeability estimates. In addition, composition data provide more information to estimate system states and parameters than do standard pressure data. The dual state-parameter estimation scheme provides about 10% more accurate permeability estimates on average than the joint scheme when implemented with the same ensemble members, at the cost of twice more forward model integrations. At similar computational cost, the dual approach becomes only beneficial after using large enough ensembles.

  19. Constraining a compositional flow model with flow-chemical data using an ensemble-based Kalman filter

    KAUST Repository

    Gharamti, M. E.

    2014-03-01

    Isothermal compositional flow models require coupling transient compressible flows and advective transport systems of various chemical species in subsurface porous media. Building such numerical models is quite challenging and may be subject to many sources of uncertainties because of possible incomplete representation of some geological parameters that characterize the system\\'s processes. Advanced data assimilation methods, such as the ensemble Kalman filter (EnKF), can be used to calibrate these models by incorporating available data. In this work, we consider the problem of estimating reservoir permeability using information about phase pressure as well as the chemical properties of fluid components. We carry out state-parameter estimation experiments using joint and dual updating schemes in the context of the EnKF with a two-dimensional single-phase compositional flow model (CFM). Quantitative and statistical analyses are performed to evaluate and compare the performance of the assimilation schemes. Our results indicate that including chemical composition data significantly enhances the accuracy of the permeability estimates. In addition, composition data provide more information to estimate system states and parameters than do standard pressure data. The dual state-parameter estimation scheme provides about 10% more accurate permeability estimates on average than the joint scheme when implemented with the same ensemble members, at the cost of twice more forward model integrations. At similar computational cost, the dual approach becomes only beneficial after using large enough ensembles.

  20. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    Science.gov (United States)

    S K, Somasundaram; P, Alli

    2017-11-09

    of DR screening system using Bagging Ensemble Classifier (BEC) is investigated. With the help of voting the process in ML-BEC, bagging minimizes the error due to variance of the base classifier. With the publicly available retinal image databases, our classifier is trained with 25% of RI. Results show that the ensemble classifier can achieve better classification accuracy (CA) than single classification models. Empirical experiments suggest that the machine learning-based ensemble classifier is efficient for further reducing DR classification time (CT).

  1. Finding diversity for building one-day ahead Hydrological Ensemble Prediction System based on artificial neural network stacks

    Science.gov (United States)

    Brochero, Darwin; Anctil, Francois; Gagné, Christian; López, Karol

    2013-04-01

    In this study, we addressed the application of Artificial Neural Networks (ANN) in the context of Hydrological Ensemble Prediction Systems (HEPS). Such systems have become popular in the past years as a tool to include the forecast uncertainty in the decision making process. HEPS considers fundamentally the uncertainty cascade model [4] for uncertainty representation. Analogously, the machine learning community has proposed models of multiple classifier systems that take into account the variability in datasets, input space, model structures, and parametric configuration [3]. This approach is based primarily on the well-known "no free lunch theorem" [1]. Consequently, we propose a framework based on two separate but complementary topics: data stratification and input variable selection (IVS). Thus, we promote an ANN prediction stack in which each predictor is trained based on input spaces defined by the IVS application on different stratified sub-samples. All this, added to the inherent variability of classical ANN optimization, leads us to our ultimate goal: diversity in the prediction, defined as the complementarity of the individual predictors. The stratification application on the 12 basins used in this study, which originate from the second and third workshop of the MOPEX project [2], shows that the informativeness of the data is far more important than the quantity used for ANN training. Additionally, the input space variability leads to ANN stacks that outperform an ANN stack model trained with 100% of the available information but with a random selection of dataset used in the early stopping method (scenario R100P). The results show that from a deterministic view, the main advantage focuses on the efficient selection of the training information, which is an equally important concept for the calibration of conceptual hydrological models. On the other hand, the diversity achieved is reflected in a substantial improvement in the scores that define the

  2. Support Vector Machine and Parametric Wavelet-Based Texture Classification of Stem Cell Images

    National Research Council Canada - National Science Library

    Jeffreys, Christopher

    2004-01-01

    .... Since colony texture is a major discriminating feature in determining quality, we introduce a non-invasive, semi-automated texture-based stem cell colony classification methodology to aid researchers...

  3. Single-labelled music genre classification using content-based features

    CSIR Research Space (South Africa)

    Ajoodha, R

    2015-11-01

    Full Text Available In this paper we use content-based features to perform automatic classification of music pieces into genres. We categorise these features into four groups: features extracted from the Fourier transform’s magnitude spectrum, features designed...

  4. Desert plains classification based on Geomorphometrical parameters (Case study: Aghda, Yazd)

    Science.gov (United States)

    Tazeh, mahdi; Kalantari, Saeideh

    2013-04-01

    This research focuses on plains. There are several tremendous methods and classification which presented for plain classification. One of The natural resource based classification which is mostly using in Iran, classified plains into three types, Erosional Pediment, Denudation Pediment Aggradational Piedmont. The qualitative and quantitative factors to differentiate them from each other are also used appropriately. In this study effective Geomorphometrical parameters in differentiate landforms were applied for plain. Geomorphometrical parameters are calculable and can be extracted using mathematical equations and the corresponding relations on digital elevation model. Geomorphometrical parameters used in this study included Percent of Slope, Plan Curvature, Profile Curvature, Minimum Curvature, the Maximum Curvature, Cross sectional Curvature, Longitudinal Curvature and Gaussian Curvature. The results indicated that the most important affecting Geomorphometrical parameters for plain and desert classifications includes: Percent of Slope, Minimum Curvature, Profile Curvature, and Longitudinal Curvature. Key Words: Plain, Geomorphometry, Classification, Biophysical, Yazd Khezarabad.

  5. [Classification of cell-based medicinal products and legal implications: An overview and an update].

    Science.gov (United States)

    Scherer, Jürgen; Flory, Egbert

    2015-11-01

    In general, cell-based medicinal products do not represent a uniform class of medicinal products, but instead comprise medicinal products with diverse regulatory classification as advanced-therapy medicinal products (ATMP), medicinal products (MP), tissue preparations, or blood products. Due to the legal and scientific consequences of the development and approval of MPs, classification should be clarified as early as possible. This paper describes the legal situation in Germany and highlights specific criteria and concepts for classification, with a focus on, but not limited to, ATMPs and non-ATMPs. Depending on the stage of product development and the specific application submitted to a competent authority, legally binding classification is done by the German Länder Authorities, Paul-Ehrlich-Institut, or European Medicines Agency. On request by the applicants, the Committee for Advanced Therapies may issue scientific recommendations for classification.

  6. Polsar Land Cover Classification Based on Hidden Polarimetric Features in Rotation Domain and Svm Classifier

    Science.gov (United States)

    Tao, C.-S.; Chen, S.-W.; Li, Y.-Z.; Xiao, S.-P.

    2017-09-01

    Land cover classification is an important application for polarimetric synthetic aperture radar (PolSAR) data utilization. Rollinvariant polarimetric features such as H / Ani / text-decoration: overline">α / Span are commonly adopted in PolSAR land cover classification. However, target orientation diversity effect makes PolSAR images understanding and interpretation difficult. Only using the roll-invariant polarimetric features may introduce ambiguity in the interpretation of targets' scattering mechanisms and limit the followed classification accuracy. To address this problem, this work firstly focuses on hidden polarimetric feature mining in the rotation domain alo