WorldWideScience

Sample records for high-dimensional feature space

  1. A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

    Directory of Open Access Journals (Sweden)

    Yongjun Piao

    2015-01-01

    Full Text Available Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.

  2. Lip-reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space.

    Science.gov (United States)

    Ma, Wei Ji; Zhou, Xiang; Ross, Lars A; Foxe, John J; Parra, Lucas C

    2009-01-01

    Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

  3. Lip-reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space.

    Directory of Open Access Journals (Sweden)

    Wei Ji Ma

    Full Text Available Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness, one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

  4. High Dimensional Classification Using Features Annealed Independence Rules.

    Science.gov (United States)

    Fan, Jianqing; Fan, Yingying

    2008-01-01

    Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

  5. Manifold learning to interpret JET high-dimensional operational space

    International Nuclear Information System (INIS)

    Cannas, B; Fanni, A; Pau, A; Sias, G; Murari, A

    2013-01-01

    In this paper, the problem of visualization and exploration of JET high-dimensional operational space is considered. The data come from plasma discharges selected from JET campaigns from C15 (year 2005) up to C27 (year 2009). The aim is to learn the possible manifold structure embedded in the data and to create some representations of the plasma parameters on low-dimensional maps, which are understandable and which preserve the essential properties owned by the original data. A crucial issue for the design of such mappings is the quality of the dataset. This paper reports the details of the criteria used to properly select suitable signals downloaded from JET databases in order to obtain a dataset of reliable observations. Moreover, a statistical analysis is performed to recognize the presence of outliers. Finally data reduction, based on clustering methods, is performed to select a limited and representative number of samples for the operational space mapping. The high-dimensional operational space of JET is mapped using a widely used manifold learning method, the self-organizing maps. The results are compared with other data visualization methods. The obtained maps can be used to identify characteristic regions of the plasma scenario, allowing to discriminate between regions with high risk of disruption and those with low risk of disruption. (paper)

  6. On High Dimensional Searching Spaces and Learning Methods

    DEFF Research Database (Denmark)

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Choros, Kazimierz

    2017-01-01

    , and similarity functions and discuss the pros and cons of using each of them. Conventional similarity functions evaluate objects in the vector space. Contrarily, Weighted Feature Distance (WFD) functions compare data objects in both feature and vector spaces, preventing the system from being affected by some...

  7. Reinforcement learning on slow features of high-dimensional input streams.

    Directory of Open Access Journals (Sweden)

    Robert Legenstein

    Full Text Available Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.

  8. The literary uses of high-dimensional space

    Directory of Open Access Journals (Sweden)

    Ted Underwood

    2015-12-01

    Full Text Available Debates over “Big Data” shed more heat than light in the humanities, because the term ascribes new importance to statistical methods without explaining how those methods have changed. What we badly need instead is a conversation about the substantive innovations that have made statistical modeling useful for disciplines where, in the past, it truly wasn’t. These innovations are partly technical, but more fundamentally expressed in what Leo Breiman calls a new “culture” of statistical modeling. Where 20th-century methods often required humanists to squeeze our unstructured texts, sounds, or images into some special-purpose data model, new methods can handle unstructured evidence more directly by modeling it in a high-dimensional space. This opens a range of research opportunities that humanists have barely begun to discuss. To date, topic modeling has received most attention, but in the long run, supervised predictive models may be even more important. I sketch their potential by describing how Jordan Sellers and I have begun to model poetic distinction in the long 19th century—revealing an arc of gradual change much longer than received literary histories would lead us to expect.

  9. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles; Schwartz, Scott; Chapkin, Robert S.; Carroll, Raymond J.; Ivanov, Ivan

    2012-01-01

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  10. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  11. Data analysis in high-dimensional sparse spaces

    DEFF Research Database (Denmark)

    Clemmensen, Line Katrine Harder

    classification techniques for high-dimensional problems are presented: Sparse discriminant analysis, sparse mixture discriminant analysis and orthogonality constrained support vector machines. The first two introduces sparseness to the well known linear and mixture discriminant analysis and thereby provide low...... are applied to classifications of fish species, ear canal impressions used in the hearing aid industry, microbiological fungi species, and various cancerous tissues and healthy tissues. In addition, novel applications of sparse regressions (also called the elastic net) to the medical, concrete, and food...

  12. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    KAUST Repository

    Tao, Yufei; Yi, Ke; Sheng, Cheng; Kalnis, Panos

    2010-01-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii

  13. Compound Structure-Independent Activity Prediction in High-Dimensional Target Space.

    Science.gov (United States)

    Balfer, Jenny; Hu, Ye; Bajorath, Jürgen

    2014-08-01

    Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Compact Representation of High-Dimensional Feature Vectors for Large-Scale Image Recognition and Retrieval.

    Science.gov (United States)

    Zhang, Yu; Wu, Jianxin; Cai, Jianfei

    2016-05-01

    In large-scale visual recognition and image retrieval tasks, feature vectors, such as Fisher vector (FV) or the vector of locally aggregated descriptors (VLAD), have achieved state-of-the-art results. However, the combination of the large numbers of examples and high-dimensional vectors necessitates dimensionality reduction, in order to reduce its storage and CPU costs to a reasonable range. In spite of the popularity of various feature compression methods, this paper shows that the feature (dimension) selection is a better choice for high-dimensional FV/VLAD than the feature (dimension) compression methods, e.g., product quantization. We show that strong correlation among the feature dimensions in the FV and the VLAD may not exist, which renders feature selection a natural choice. We also show that, many dimensions in FV/VLAD are noise. Throwing them away using feature selection is better than compressing them and useful dimensions altogether using feature compression methods. To choose features, we propose an efficient importance sorting algorithm considering both the supervised and unsupervised cases, for visual recognition and image retrieval, respectively. Combining with the 1-bit quantization, feature selection has achieved both higher accuracy and less computational cost than feature compression methods, such as product quantization, on the FV and the VLAD image representations.

  15. Aspects of high-dimensional theories in embedding spaces

    International Nuclear Information System (INIS)

    Maia, M.D.; Mecklenburg, W.

    1983-01-01

    The question of whether physical meaning may be attributed to the extra dimensions provided by embedding procedures as applied to physical space-times is discussed. The similarities and differences of the present picture to that of conventional Kaluza-Klein pictures are commented. (Author) [pt

  16. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  17. A Feature Subset Selection Method Based On High-Dimensional Mutual Information

    Directory of Open Access Journals (Sweden)

    Chee Keong Kwoh

    2011-04-01

    Full Text Available Feature selection is an important step in building accurate classifiers and provides better understanding of the data sets. In this paper, we propose a feature subset selection method based on high-dimensional mutual information. We also propose to use the entropy of the class attribute as a criterion to determine the appropriate subset of features when building classifiers. We prove that if the mutual information between a feature set X and the class attribute Y equals to the entropy of Y , then X is a Markov Blanket of Y . We show that in some cases, it is infeasible to approximate the high-dimensional mutual information with algebraic combinations of pairwise mutual information in any forms. In addition, the exhaustive searches of all combinations of features are prerequisite for finding the optimal feature subsets for classifying these kinds of data sets. We show that our approach outperforms existing filter feature subset selection methods for most of the 24 selected benchmark data sets.

  18. AN EFFECTIVE MULTI-CLUSTERING ANONYMIZATION APPROACH USING DISCRETE COMPONENT TASK FOR NON-BINARY HIGH DIMENSIONAL DATA SPACES

    Directory of Open Access Journals (Sweden)

    L.V. Arun Shalin

    2016-01-01

    Full Text Available Clustering is a process of grouping elements together, designed in such a way that the elements assigned to similar data points in a cluster are more comparable to each other than the remaining data points in a cluster. During clustering certain difficulties related when dealing with high dimensional data are ubiquitous and abundant. Works concentrated using anonymization method for high dimensional data spaces failed to address the problem related to dimensionality reduction during the inclusion of non-binary databases. In this work we study methods for dimensionality reduction for non-binary database. By analyzing the behavior of dimensionality reduction for non-binary database, results in performance improvement with the help of tag based feature. An effective multi-clustering anonymization approach called Discrete Component Task Specific Multi-Clustering (DCTSM is presented for dimensionality reduction on non-binary database. To start with we present the analysis of attribute in the non-binary database and cluster projection identifies the sparseness degree of dimensions. Additionally with the quantum distribution on multi-cluster dimension, the solution for relevancy of attribute and redundancy on non-binary data spaces is provided resulting in performance improvement on the basis of tag based feature. Multi-clustering tag based feature reduction extracts individual features and are correspondingly replaced by the equivalent feature clusters (i.e. tag clusters. During training, the DCTSM approach uses multi-clusters instead of individual tag features and then during decoding individual features is replaced by corresponding multi-clusters. To measure the effectiveness of the method, experiments are conducted on existing anonymization method for high dimensional data spaces and compared with the DCTSM approach using Statlog German Credit Data Set. Improved tag feature extraction and minimum error rate compared to conventional anonymization

  19. A comprehensive analysis of earthquake damage patterns using high dimensional model representation feature selection

    Science.gov (United States)

    Taşkin Kaya, Gülşen

    2013-10-01

    Recently, earthquake damage assessment using satellite images has been a very popular ongoing research direction. Especially with the availability of very high resolution (VHR) satellite images, a quite detailed damage map based on building scale has been produced, and various studies have also been conducted in the literature. As the spatial resolution of satellite images increases, distinguishability of damage patterns becomes more cruel especially in case of using only the spectral information during classification. In order to overcome this difficulty, textural information needs to be involved to the classification to improve the visual quality and reliability of damage map. There are many kinds of textural information which can be derived from VHR satellite images depending on the algorithm used. However, extraction of textural information and evaluation of them have been generally a time consuming process especially for the large areas affected from the earthquake due to the size of VHR image. Therefore, in order to provide a quick damage map, the most useful features describing damage patterns needs to be known in advance as well as the redundant features. In this study, a very high resolution satellite image after Iran, Bam earthquake was used to identify the earthquake damage. Not only the spectral information, textural information was also used during the classification. For textural information, second order Haralick features were extracted from the panchromatic image for the area of interest using gray level co-occurrence matrix with different size of windows and directions. In addition to using spatial features in classification, the most useful features representing the damage characteristic were selected with a novel feature selection method based on high dimensional model representation (HDMR) giving sensitivity of each feature during classification. The method called HDMR was recently proposed as an efficient tool to capture the input

  20. Inference for feature selection using the Lasso with high-dimensional data

    DEFF Research Database (Denmark)

    Brink-Jensen, Kasper; Ekstrøm, Claus Thorn

    2014-01-01

    Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do...... not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen...... by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute $p$-values of the expected magnitude with simulated data using a multitude of scenarios...

  1. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    Science.gov (United States)

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  2. Variable kernel density estimation in high-dimensional feature spaces

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2017-02-01

    Full Text Available Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high...

  3. Distribution of high-dimensional entanglement via an intra-city free-space link.

    Science.gov (United States)

    Steinlechner, Fabian; Ecker, Sebastian; Fink, Matthias; Liu, Bo; Bavaresco, Jessica; Huber, Marcus; Scheidl, Thomas; Ursin, Rupert

    2017-07-24

    Quantum entanglement is a fundamental resource in quantum information processing and its distribution between distant parties is a key challenge in quantum communications. Increasing the dimensionality of entanglement has been shown to improve robustness and channel capacities in secure quantum communications. Here we report on the distribution of genuine high-dimensional entanglement via a 1.2-km-long free-space link across Vienna. We exploit hyperentanglement, that is, simultaneous entanglement in polarization and energy-time bases, to encode quantum information, and observe high-visibility interference for successive correlation measurements in each degree of freedom. These visibilities impose lower bounds on entanglement in each subspace individually and certify four-dimensional entanglement for the hyperentangled system. The high-fidelity transmission of high-dimensional entanglement under real-world atmospheric link conditions represents an important step towards long-distance quantum communications with more complex quantum systems and the implementation of advanced quantum experiments with satellite links.

  4. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    KAUST Repository

    Tao, Yufei

    2010-07-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders ofmagnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial

  5. High-dimensional free-space optical communications based on orbital angular momentum coding

    Science.gov (United States)

    Zou, Li; Gu, Xiaofan; Wang, Le

    2018-03-01

    In this paper, we propose a high-dimensional free-space optical communication scheme using orbital angular momentum (OAM) coding. In the scheme, the transmitter encodes N-bits information by using a spatial light modulator to convert a Gaussian beam to a superposition mode of N OAM modes and a Gaussian mode; The receiver decodes the information through an OAM mode analyser which consists of a MZ interferometer with a rotating Dove prism, a photoelectric detector and a computer carrying out the fast Fourier transform. The scheme could realize a high-dimensional free-space optical communication, and decodes the information much fast and accurately. We have verified the feasibility of the scheme by exploiting 8 (4) OAM modes and a Gaussian mode to implement a 256-ary (16-ary) coding free-space optical communication to transmit a 256-gray-scale (16-gray-scale) picture. The results show that a zero bit error rate performance has been achieved.

  6. Individual-based models for adaptive diversification in high-dimensional phenotype spaces.

    Science.gov (United States)

    Ispolatov, Iaroslav; Madhok, Vaibhav; Doebeli, Michael

    2016-02-07

    Most theories of evolutionary diversification are based on equilibrium assumptions: they are either based on optimality arguments involving static fitness landscapes, or they assume that populations first evolve to an equilibrium state before diversification occurs, as exemplified by the concept of evolutionary branching points in adaptive dynamics theory. Recent results indicate that adaptive dynamics may often not converge to equilibrium points and instead generate complicated trajectories if evolution takes place in high-dimensional phenotype spaces. Even though some analytical results on diversification in complex phenotype spaces are available, to study this problem in general we need to reconstruct individual-based models from the adaptive dynamics generating the non-equilibrium dynamics. Here we first provide a method to construct individual-based models such that they faithfully reproduce the given adaptive dynamics attractor without diversification. We then show that a propensity to diversify can be introduced by adding Gaussian competition terms that generate frequency dependence while still preserving the same adaptive dynamics. For sufficiently strong competition, the disruptive selection generated by frequency-dependence overcomes the directional evolution along the selection gradient and leads to diversification in phenotypic directions that are orthogonal to the selection gradient. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. High-dimensional structured light coding/decoding for free-space optical communications free of obstructions.

    Science.gov (United States)

    Du, Jing; Wang, Jian

    2015-11-01

    Bessel beams carrying orbital angular momentum (OAM) with helical phase fronts exp(ilφ)(l=0;±1;±2;…), where φ is the azimuthal angle and l corresponds to the topological number, are orthogonal with each other. This feature of Bessel beams provides a new dimension to code/decode data information on the OAM state of light, and the theoretical infinity of topological number enables possible high-dimensional structured light coding/decoding for free-space optical communications. Moreover, Bessel beams are nondiffracting beams having the ability to recover by themselves in the face of obstructions, which is important for free-space optical communications relying on line-of-sight operation. By utilizing the OAM and nondiffracting characteristics of Bessel beams, we experimentally demonstrate 12 m distance obstruction-free optical m-ary coding/decoding using visible Bessel beams in a free-space optical communication system. We also study the bit error rate (BER) performance of hexadecimal and 32-ary coding/decoding based on Bessel beams with different topological numbers. After receiving 500 symbols at the receiver side, a zero BER of hexadecimal coding/decoding is observed when the obstruction is placed along the propagation path of light.

  8. A study of metaheuristic algorithms for high dimensional feature selection on microarray data

    Science.gov (United States)

    Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna

    2017-11-01

    Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.

  9. A proposed framework on hybrid feature selection techniques for handling high dimensional educational data

    Science.gov (United States)

    Shahiri, Amirah Mohamed; Husain, Wahidah; Rashid, Nur'Aini Abd

    2017-10-01

    Huge amounts of data in educational datasets may cause the problem in producing quality data. Recently, data mining approach are increasingly used by educational data mining researchers for analyzing the data patterns. However, many research studies have concentrated on selecting suitable learning algorithms instead of performing feature selection process. As a result, these data has problem with computational complexity and spend longer computational time for classification. The main objective of this research is to provide an overview of feature selection techniques that have been used to analyze the most significant features. Then, this research will propose a framework to improve the quality of students' dataset. The proposed framework uses filter and wrapper based technique to support prediction process in future study.

  10. TripAdvisor^{N-D}: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail.

    Science.gov (United States)

    Nam, Julia EunJu; Mueller, Klaus

    2013-02-01

    Gaining a true appreciation of high-dimensional space remains difficult since all of the existing high-dimensional space exploration techniques serialize the space travel in some way. This is not so foreign to us since we, when traveling, also experience the world in a serial fashion. But we typically have access to a map to help with positioning, orientation, navigation, and trip planning. Here, we propose a multivariate data exploration tool that compares high-dimensional space navigation with a sightseeing trip. It decomposes this activity into five major tasks: 1) Identify the sights: use a map to identify the sights of interest and their location; 2) Plan the trip: connect the sights of interest along a specifyable path; 3) Go on the trip: travel along the route; 4) Hop off the bus: experience the location, look around, zoom into detail; and 5) Orient and localize: regain bearings in the map. We describe intuitive and interactive tools for all of these tasks, both global navigation within the map and local exploration of the data distributions. For the latter, we describe a polygonal touchpad interface which enables users to smoothly tilt the projection plane in high-dimensional space to produce multivariate scatterplots that best convey the data relationships under investigation. Motion parallax and illustrative motion trails aid in the perception of these transient patterns. We describe the use of our system within two applications: 1) the exploratory discovery of data configurations that best fit a personal preference in the presence of tradeoffs and 2) interactive cluster analysis via cluster sculpting in N-D.

  11. Extending the Generalised Pareto Distribution for Novelty Detection in High-Dimensional Spaces.

    Science.gov (United States)

    Clifton, David A; Clifton, Lei; Hugueny, Samuel; Tarassenko, Lionel

    2014-01-01

    Novelty detection involves the construction of a "model of normality", and then classifies test data as being either "normal" or "abnormal" with respect to that model. For this reason, it is often termed one-class classification. The approach is suitable for cases in which examples of "normal" behaviour are commonly available, but in which cases of "abnormal" data are comparatively rare. When performing novelty detection, we are typically most interested in the tails of the normal model, because it is in these tails that a decision boundary between "normal" and "abnormal" areas of data space usually lies. Extreme value statistics provides an appropriate theoretical framework for modelling the tails of univariate (or low-dimensional) distributions, using the generalised Pareto distribution (GPD), which can be demonstrated to be the limiting distribution for data occurring within the tails of most practically-encountered probability distributions. This paper provides an extension of the GPD, allowing the modelling of probability distributions of arbitrarily high dimension, such as occurs when using complex, multimodel, multivariate distributions for performing novelty detection in most real-life cases. We demonstrate our extension to the GPD using examples from patient physiological monitoring, in which we have acquired data from hospital patients in large clinical studies of high-acuity wards, and in which we wish to determine "abnormal" patient data, such that early warning of patient physiological deterioration may be provided.

  12. Mining High-Dimensional Data

    Science.gov (United States)

    Wang, Wei; Yang, Jiong

    With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.

  13. Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score

    Science.gov (United States)

    Lestari, A. W.; Rustam, Z.

    2017-07-01

    In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.

  14. Computer-aided diagnosis for phase-contrast X-ray computed tomography: quantitative characterization of human patellar cartilage with high-dimensional geometric features.

    Science.gov (United States)

    Nagarajan, Mahesh B; Coan, Paola; Huber, Markus B; Diemoz, Paul C; Glaser, Christian; Wismüller, Axel

    2014-02-01

    Phase-contrast computed tomography (PCI-CT) has shown tremendous potential as an imaging modality for visualizing human cartilage with high spatial resolution. Previous studies have demonstrated the ability of PCI-CT to visualize (1) structural details of the human patellar cartilage matrix and (2) changes to chondrocyte organization induced by osteoarthritis. This study investigates the use of high-dimensional geometric features in characterizing such chondrocyte patterns in the presence or absence of osteoarthritic damage. Geometrical features derived from the scaling index method (SIM) and statistical features derived from gray-level co-occurrence matrices were extracted from 842 regions of interest (ROI) annotated on PCI-CT images of ex vivo human patellar cartilage specimens. These features were subsequently used in a machine learning task with support vector regression to classify ROIs as healthy or osteoarthritic; classification performance was evaluated using the area under the receiver-operating characteristic curve (AUC). SIM-derived geometrical features exhibited the best classification performance (AUC, 0.95 ± 0.06) and were most robust to changes in ROI size. These results suggest that such geometrical features can provide a detailed characterization of the chondrocyte organization in the cartilage matrix in an automated and non-subjective manner, while also enabling classification of cartilage as healthy or osteoarthritic with high accuracy. Such features could potentially serve as imaging markers for evaluating osteoarthritis progression and its response to different therapeutic intervention strategies.

  15. Unique features of space reactors

    International Nuclear Information System (INIS)

    Buden, D.

    1990-01-01

    This paper reports on space reactors that are designed to meet a unique set of requirements; they must be sufficiently compact to be launched in a rocket to their operational location, operate for many years without maintenance and servicing, operate in extreme environments, and reject heat by radiation to space. To meet these restrictions, operating temperatures are much greater than in terrestrial power plants, and the reactors tend to have a fast neutron spectrum. Currently, a new generation of space reactor power plants is being developed. The major effort is in the SP-100 program, where the power plant is being designed for seven years of full power, and no maintenance operation at a reactor outlet operating temperature of 1350 K

  16. Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.

    Science.gov (United States)

    Vogt, Martin; Bajorath, Jürgen

    2008-01-01

    Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.

  17. Clustering high dimensional data

    DEFF Research Database (Denmark)

    Assent, Ira

    2012-01-01

    High-dimensional data, i.e., data described by a large number of attributes, pose specific challenges to clustering. The so-called ‘curse of dimensionality’, coined originally to describe the general increase in complexity of various computational problems as dimensionality increases, is known...... to render traditional clustering algorithms ineffective. The curse of dimensionality, among other effects, means that with increasing number of dimensions, a loss of meaningful differentiation between similar and dissimilar objects is observed. As high-dimensional objects appear almost alike, new approaches...... for clustering are required. Consequently, recent research has focused on developing techniques and clustering algorithms specifically for high-dimensional data. Still, open research issues remain. Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Each cluster...

  18. Emotion-based Music Rretrieval on a Well-reduced Audio Feature Space

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Chua, Bee Yong; Nanopoulos, Alexandros

    2009-01-01

    -emotion. However, the real-time systems that retrieve music over large music databases, can achieve order of magnitude performance increase, if applying multidimensional indexing over a dimensionally reduced audio feature space. To meet this performance achievement, in this paper, extensive studies are conducted......Music expresses emotion. A number of audio extracted features have influence on the perceived emotional expression of music. These audio features generate a high-dimensional space, on which music similarity retrieval can be performed effectively, with respect to human perception of the music...... on a number of dimensionality reduction algorithms, including both classic and novel approaches. The paper clearly envisages which dimensionality reduction techniques on the considered audio feature space, can preserve in average the accuracy of the emotion-based music retrieval....

  19. High dimensional entanglement

    CSIR Research Space (South Africa)

    Mc

    2012-07-01

    Full Text Available stream_source_info McLaren_2012.pdf.txt stream_content_type text/plain stream_size 2190 Content-Encoding ISO-8859-1 stream_name McLaren_2012.pdf.txt Content-Type text/plain; charset=ISO-8859-1 High dimensional... entanglement M. McLAREN1,2, F.S. ROUX1 & A. FORBES1,2,3 1. CSIR National Laser Centre, PO Box 395, Pretoria 0001 2. School of Physics, University of the Stellenbosch, Private Bag X1, 7602, Matieland 3. School of Physics, University of Kwazulu...

  20. hdm: High-dimensional metrics

    OpenAIRE

    Chernozhukov, Victor; Hansen, Christian; Spindler, Martin

    2016-01-01

    In this article the package High-dimensional Metrics (\\texttt{hdm}) is introduced. It is a collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e...

  1. Elucidating high-dimensional cancer hallmark annotation via enriched ontology.

    Science.gov (United States)

    Yan, Shankai; Wong, Ka-Chun

    2017-09-01

    Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as a complementary approach that can retrieve knowledge from massive text information, advancing numerous focused studies in cancer research. Nonetheless, the high-dimensional nature of cancer hallmark annotation imposes a unique challenge. To address the curse of dimensionality, we compared multiple cancer hallmark annotation methods on 1580 PubMed abstracts. Based on the insights, a novel approach, UDT-RF, which makes use of ontological features is proposed. It expands the feature space via the Medical Subject Headings (MeSH) ontology graph and utilizes novel feature selections for elucidating the high-dimensional cancer hallmark annotation space. To demonstrate its effectiveness, state-of-the-art methods are compared and evaluated by a multitude of performance metrics, revealing the full performance spectrum on the full set of cancer hallmarks. Several case studies are conducted, demonstrating how the proposed approach could reveal novel insights into cancers. https://github.com/cskyan/chmannot. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification.

    Science.gov (United States)

    Arif, Muhammad

    2012-06-01

    In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.

  3. Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction.

    Science.gov (United States)

    Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen

    2018-04-01

    Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Searching Fragment Spaces with feature trees.

    Science.gov (United States)

    Lessel, Uta; Wellenzohn, Bernd; Lilienthal, Markus; Claussen, Holger

    2009-02-01

    Virtual combinatorial chemistry easily produces billions of compounds, for which conventional virtual screening cannot be performed even with the fastest methods available. An efficient solution for such a scenario is the generation of Fragment Spaces, which encode huge numbers of virtual compounds by their fragments/reagents and rules of how to combine them. Similarity-based searches can be performed in such spaces without ever fully enumerating all virtual products. Here we describe the generation of a huge Fragment Space encoding about 5 * 10(11) compounds based on established in-house synthesis protocols for combinatorial libraries, i.e., we encode practically evaluated combinatorial chemistry protocols in a machine readable form, rendering them accessible to in silico search methods. We show how such searches in this Fragment Space can be integrated as a first step in an overall workflow. It reduces the extremely huge number of virtual products by several orders of magnitude so that the resulting list of molecules becomes more manageable for further more elaborated and time-consuming analysis steps. Results of a case study are presented and discussed, which lead to some general conclusions for an efficient expansion of the chemical space to be screened in pharmaceutical companies.

  5. The formation method of the feature space for the identification of fatigued bills

    Science.gov (United States)

    Kang, Dongshik; Oshiro, Ayumu; Ozawa, Kenji; Mitsui, Ikugo

    2014-10-01

    Fatigued bills make a trouble such as the paper jam in a bill handling machine. In the discrimination of fatigued bills using an acoustic signal, the variation of an observed bill sound is considered to be one of causes in misclassification. Therefore a technique has demanded in order to make the classification of fatigued bills more efficient. In this paper, we proposed the algorithm that extracted feature quantity of bill sound from acoustic signal using the frequency difference, and carried out discrimination experiment of fatigued bill money by Support Vector Machine(SVM). The feature quantity of frequency difference can represent the frequency components of an acoustic signal is varied by the fatigued degree of bill money. The generalization performance of SVM does not depend on the size of dimensions of the feature space, even in a high dimensional feature space such as bill-acoustic signals. Furthermore, SVM can induce an optimal classifier which considers the combination of features by the virtue of polynomial kernel functions.

  6. A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Hongchao Song

    2017-01-01

    Full Text Available Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE and an ensemble k-nearest neighbor graphs- (K-NNG- based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.

  7. Oversampling the Minority Class in the Feature Space.

    Science.gov (United States)

    Perez-Ortiz, Maria; Gutierrez, Pedro Antonio; Tino, Peter; Hervas-Martinez, Cesar

    2016-09-01

    The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.

  8. High-dimensional covariance estimation with high-dimensional data

    CERN Document Server

    Pourahmadi, Mohsen

    2013-01-01

    Methods for estimating sparse and large covariance matrices Covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. High-Dimensional Covariance Estimation provides accessible and comprehensive coverage of the classical and modern approaches for estimating covariance matrices as well as their applications to the rapidly developing areas lying at the intersection of statistics and mac

  9. Quantum magnification of classical sub-Planck phase space features

    International Nuclear Information System (INIS)

    Hensinger, W.K.; Heckenberg, N.; Rubinsztein-Dunlop, H.; Delande, D.

    2002-01-01

    Full text: To understand the relationship between quantum mechanics and classical physics a crucial question to be answered is how distinct classical dynamical phase space features translate into the quantum picture. This problem becomes even more interesting if these phase space features occupy a much smaller volume than ℎ in a phase space spanned by two non-commuting variables such as position and momentum. The question whether phase space structures in quantum mechanics associated with sub-Planck scales have physical signatures has recently evoked a lot of discussion. Here we will show that sub-Planck classical dynamical phase space structures, for example regions of regular motion, can give rise to states whose phase space representation is of size ℎ or larger. This is illustrated using period-1 regions of regular motion (modes of oscillatory motion of a particle in a modulated well) whose volume is distinctly smaller than Planck's constant. They are magnified in the quantum picture and appear as states whose phase space representation is of size h or larger. Cold atoms provide an ideal test bed to probe such fundamental aspects of quantum and classical dynamics. In the experiment a Bose-Einstein condensate is loaded into a far detuned optical lattice. The lattice depth is modulated resulting in the emergence of regions of regular motion surrounded by chaotic motion in the phase space spanned by position and momentum of the atoms along the standing wave. Sub-Planck scaled phase space features in the classical phase space are magnified and appear as distinct broad peaks in the atomic momentum distribution. The corresponding quantum analysis shows states of size Ti which can be associated with much smaller classical dynamical phase space features. This effect may considered as the dynamical equivalent of the Goldstone and Jaffe theorem which predicts the existence of at least one bound state at a bend in a two or three dimensional spatial potential

  10. Variance inflation in high dimensional Support Vector Machines

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie; Hansen, Lars Kai

    2013-01-01

    Many important machine learning models, supervised and unsupervised, are based on simple Euclidean distance or orthogonal projection in a high dimensional feature space. When estimating such models from small training sets we face the problem that the span of the training data set input vectors...... the case of Support Vector Machines (SVMS) and we propose a non-parametric scheme to restore proper generalizability. We illustrate the algorithm and its ability to restore performance on a wide range of benchmark data sets....... follow a different probability law with less variance. While the problem and basic means to reconstruct and deflate are well understood in unsupervised learning, the case of supervised learning is less well understood. We here investigate the effect of variance inflation in supervised learning including...

  11. A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.

    Science.gov (United States)

    Ceccarelli, Michele; d'Acierno, Antonio; Facchiano, Angelo

    2009-10-15

    Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. peaks) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics. We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962. We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from http://medeaserver.isa.cnr.it/dacierno/spectracode.htm.

  12. Space Station services and design features for users

    Science.gov (United States)

    Kurzhals, Peter R.; Mckinney, Royce L.

    1987-01-01

    The operational design features and services planned for the NASA Space Station will furnish, in addition to novel opportunities and facilities, lower costs through interface standardization and automation and faster access by means of computer-aided integration and control processes. By furnishing a basis for large-scale space exploitation, the Space Station will possess industrial production and operational services capabilities that may be used by the private sector for commercial ventures; it could also ultimately support lunar and planetary exploration spacecraft assembly and launch facilities.

  13. A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space.

    Science.gov (United States)

    Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng

    2016-01-01

    To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more

  14. Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data.

    Science.gov (United States)

    Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok

    2016-12-05

    High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.

  15. The features of space-planning and outfitting decisions

    Energy Technology Data Exchange (ETDEWEB)

    Voronov, N.A.; Bezrukov, A.K.

    1982-01-01

    The features of space-planning and outfitting solutions for a primary housing which was assembled with the No 1 auxillary housing are examined. The primary factors which have given rise to an unusual design decision on the depth of the structure of the main housing (12 meters) are noted.

  16. Space moving target detection using time domain feature

    Science.gov (United States)

    Wang, Min; Chen, Jin-yong; Gao, Feng; Zhao, Jin-yu

    2018-01-01

    The traditional space target detection methods mainly use the spatial characteristics of the star map to detect the targets, which can not make full use of the time domain information. This paper presents a new space moving target detection method based on time domain features. We firstly construct the time spectral data of star map, then analyze the time domain features of the main objects (target, stars and the background) in star maps, finally detect the moving targets using single pulse feature of the time domain signal. The real star map target detection experimental results show that the proposed method can effectively detect the trajectory of moving targets in the star map sequence, and the detection probability achieves 99% when the false alarm rate is about 8×10-5, which outperforms those of compared algorithms.

  17. Modeling high dimensional multichannel brain signals

    KAUST Repository

    Hu, Lechuan

    2017-03-27

    In this paper, our goal is to model functional and effective (directional) connectivity in network of multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The primary challenges here are twofold: first, there are major statistical and computational difficulties for modeling and analyzing high dimensional multichannel brain signals; second, there is no set of universally-agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with sufficiently high order so that complex lead-lag temporal dynamics between the channels can be accurately characterized. However, such a model contains a large number of parameters. Thus, we will estimate the high dimensional VAR parameter space by our proposed hybrid LASSLE method (LASSO+LSE) which is imposes regularization on the first step (to control for sparsity) and constrained least squares estimation on the second step (to improve bias and mean-squared error of the estimator). Then to characterize connectivity between channels in a brain network, we will use various measures but put an emphasis on partial directed coherence (PDC) in order to capture directional connectivity between channels. PDC is a directed frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative all possible receivers in the network. Using the proposed modeling approach, we have achieved some insights on learning in a rat engaged in a non-spatial memory task.

  18. Modeling high dimensional multichannel brain signals

    KAUST Repository

    Hu, Lechuan; Fortin, Norbert; Ombao, Hernando

    2017-01-01

    In this paper, our goal is to model functional and effective (directional) connectivity in network of multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The primary challenges here are twofold: first, there are major statistical and computational difficulties for modeling and analyzing high dimensional multichannel brain signals; second, there is no set of universally-agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with sufficiently high order so that complex lead-lag temporal dynamics between the channels can be accurately characterized. However, such a model contains a large number of parameters. Thus, we will estimate the high dimensional VAR parameter space by our proposed hybrid LASSLE method (LASSO+LSE) which is imposes regularization on the first step (to control for sparsity) and constrained least squares estimation on the second step (to improve bias and mean-squared error of the estimator). Then to characterize connectivity between channels in a brain network, we will use various measures but put an emphasis on partial directed coherence (PDC) in order to capture directional connectivity between channels. PDC is a directed frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative all possible receivers in the network. Using the proposed modeling approach, we have achieved some insights on learning in a rat engaged in a non-spatial memory task.

  19. The feature-weighted receptive field: an interpretable encoding model for complex feature spaces.

    Science.gov (United States)

    St-Yves, Ghislain; Naselaris, Thomas

    2017-06-20

    We introduce the feature-weighted receptive field (fwRF), an encoding model designed to balance expressiveness, interpretability and scalability. The fwRF is organized around the notion of a feature map-a transformation of visual stimuli into visual features that preserves the topology of visual space (but not necessarily the native resolution of the stimulus). The key assumption of the fwRF model is that activity in each voxel encodes variation in a spatially localized region across multiple feature maps. This region is fixed for all feature maps; however, the contribution of each feature map to voxel activity is weighted. Thus, the model has two separable sets of parameters: "where" parameters that characterize the location and extent of pooling over visual features, and "what" parameters that characterize tuning to visual features. The "where" parameters are analogous to classical receptive fields, while "what" parameters are analogous to classical tuning functions. By treating these as separable parameters, the fwRF model complexity is independent of the resolution of the underlying feature maps. This makes it possible to estimate models with thousands of high-resolution feature maps from relatively small amounts of data. Once a fwRF model has been estimated from data, spatial pooling and feature tuning can be read-off directly with no (or very little) additional post-processing or in-silico experimentation. We describe an optimization algorithm for estimating fwRF models from data acquired during standard visual neuroimaging experiments. We then demonstrate the model's application to two distinct sets of features: Gabor wavelets and features supplied by a deep convolutional neural network. We show that when Gabor feature maps are used, the fwRF model recovers receptive fields and spatial frequency tuning functions consistent with known organizational principles of the visual cortex. We also show that a fwRF model can be used to regress entire deep

  20. Features of the Gravity Probe B Space Vehicle

    Science.gov (United States)

    Reeve, William; Green, Gaylord

    2007-04-01

    Space vehicle performance enabled successful relativity data collection throughout the Gravity Probe B mission. Precision pointing and drag-free translation control was maintained using proportional helium micro-thrusters. Electrical power was provided by rigid, double sided solar arrays. The 1.8 kelvin science instrument temperature was maintained using the largest cryogenic liquid helium dewar ever flown in space. The flight software successfully performed autonomous operations and safemode protection. Features of the Gravity Probe B Space Vehicle mechanisms include: 1) sixteen helium micro-thrusters, the first proportional thrusters flown in space, and large-orifice thruster isolation valves, 2) seven precision and high-authority mass trim mechanisms, 3) four non-pyrotechnic, highly reliable solar array deployment and release mechanism sets. Early incremental prototyping was used extensively to reduce spacecraft development risk. All spacecraft systems were redundant and provided multiple failure tolerance in critical systems. Lockheed Martin performed the spacecraft design, systems engineering, hardware and software integration, environmental testing and launch base operations, as well as on-orbit operations support for the Gravity Probe B space science experiment.

  1. High-Dimensional Metrics in R

    OpenAIRE

    Chernozhukov, Victor; Hansen, Chris; Spindler, Martin

    2016-01-01

    The package High-dimensional Metrics (\\Rpackage{hdm}) is an evolving collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or poli...

  2. Reducing the Complexity of Genetic Fuzzy Classifiers in Highly-Dimensional Classification Problems

    Directory of Open Access Journals (Sweden)

    DimitrisG. Stavrakoudis

    2012-04-01

    Full Text Available This paper introduces the Fast Iterative Rule-based Linguistic Classifier (FaIRLiC, a Genetic Fuzzy Rule-Based Classification System (GFRBCS which targets at reducing the structural complexity of the resulting rule base, as well as its learning algorithm's computational requirements, especially when dealing with high-dimensional feature spaces. The proposed methodology follows the principles of the iterative rule learning (IRL approach, whereby a rule extraction algorithm (REA is invoked in an iterative fashion, producing one fuzzy rule at a time. The REA is performed in two successive steps: the first one selects the relevant features of the currently extracted rule, whereas the second one decides the antecedent part of the fuzzy rule, using the previously selected subset of features. The performance of the classifier is finally optimized through a genetic tuning post-processing stage. Comparative results in a hyperspectral remote sensing classification as well as in 12 real-world classification datasets indicate the effectiveness of the proposed methodology in generating high-performing and compact fuzzy rule-based classifiers, even for very high-dimensional feature spaces.

  3. Inferring biological tasks using Pareto analysis of high-dimensional data.

    Science.gov (United States)

    Hart, Yuval; Sheftel, Hila; Hausser, Jean; Szekely, Pablo; Ben-Moshe, Noa Bossel; Korem, Yael; Tendler, Avichai; Mayo, Avraham E; Alon, Uri

    2015-03-01

    We present the Pareto task inference method (ParTI; http://www.weizmann.ac.il/mcb/UriAlon/download/ParTI) for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks.

  4. Feature extraction algorithm for space targets based on fractal theory

    Science.gov (United States)

    Tian, Balin; Yuan, Jianping; Yue, Xiaokui; Ning, Xin

    2007-11-01

    In order to offer a potential for extending the life of satellites and reducing the launch and operating costs, satellite servicing including conducting repairs, upgrading and refueling spacecraft on-orbit become much more frequently. Future space operations can be more economically and reliably executed using machine vision systems, which can meet real time and tracking reliability requirements for image tracking of space surveillance system. Machine vision was applied to the research of relative pose for spacecrafts, the feature extraction algorithm was the basis of relative pose. In this paper fractal geometry based edge extraction algorithm which can be used in determining and tracking the relative pose of an observed satellite during proximity operations in machine vision system was presented. The method gets the gray-level image distributed by fractal dimension used the Differential Box-Counting (DBC) approach of the fractal theory to restrain the noise. After this, we detect the consecutive edge using Mathematical Morphology. The validity of the proposed method is examined by processing and analyzing images of space targets. The edge extraction method not only extracts the outline of the target, but also keeps the inner details. Meanwhile, edge extraction is only processed in moving area to reduce computation greatly. Simulation results compared edge detection using the method which presented by us with other detection methods. The results indicate that the presented algorithm is a valid method to solve the problems of relative pose for spacecrafts.

  5. Supporting Dynamic Quantization for High-Dimensional Data Analytics.

    Science.gov (United States)

    Guzun, Gheorghi; Canahuate, Guadalupe

    2017-05-01

    Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.

  6. Efficient Divide-And-Conquer Classification Based on Feature-Space Decomposition

    OpenAIRE

    Guo, Qi; Chen, Bo-Wei; Jiang, Feng; Ji, Xiangyang; Kung, Sun-Yuan

    2015-01-01

    This study presents a divide-and-conquer (DC) approach based on feature space decomposition for classification. When large-scale datasets are present, typical approaches usually employed truncated kernel methods on the feature space or DC approaches on the sample space. However, this did not guarantee separability between classes, owing to overfitting. To overcome such problems, this work proposes a novel DC approach on feature spaces consisting of three steps. Firstly, we divide the feature ...

  7. Modeling High-Dimensional Multichannel Brain Signals

    KAUST Repository

    Hu, Lechuan; Fortin, Norbert J.; Ombao, Hernando

    2017-01-01

    aspects: first, there are major statistical and computational challenges for modeling and analyzing high-dimensional multichannel brain signals; second, there is no set of universally agreed measures for characterizing connectivity. To model multichannel

  8. High dimensional neurocomputing growth, appraisal and applications

    CERN Document Server

    Tripathi, Bipin Kumar

    2015-01-01

    The book presents a coherent understanding of computational intelligence from the perspective of what is known as "intelligent computing" with high-dimensional parameters. It critically discusses the central issue of high-dimensional neurocomputing, such as quantitative representation of signals, extending the dimensionality of neuron, supervised and unsupervised learning and design of higher order neurons. The strong point of the book is its clarity and ability of the underlying theory to unify our understanding of high-dimensional computing where conventional methods fail. The plenty of application oriented problems are presented for evaluating, monitoring and maintaining the stability of adaptive learning machine. Author has taken care to cover the breadth and depth of the subject, both in the qualitative as well as quantitative way. The book is intended to enlighten the scientific community, ranging from advanced undergraduates to engineers, scientists and seasoned researchers in computational intelligenc...

  9. Asymptotically Honest Confidence Regions for High Dimensional

    DEFF Research Database (Denmark)

    Caner, Mehmet; Kock, Anders Bredahl

    While variable selection and oracle inequalities for the estimation and prediction error have received considerable attention in the literature on high-dimensional models, very little work has been done in the area of testing and construction of confidence bands in high-dimensional models. However...... develop an oracle inequality for the conservative Lasso only assuming the existence of a certain number of moments. This is done by means of the Marcinkiewicz-Zygmund inequality which in our context provides sharper bounds than Nemirovski's inequality. As opposed to van de Geer et al. (2014) we allow...

  10. Feature-space transformation improves supervised segmentation across scanners

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Achterberg, Hakim C.; de Bruijne, Marleen

    2015-01-01

    Image-segmentation techniques based on supervised classification generally perform well on the condition that training and test samples have the same feature distribution. However, if training and test images are acquired with different scanners or scanning parameters, their feature distributions...

  11. Space discretization in SN methods: Features, improvements and convergence patterns

    International Nuclear Information System (INIS)

    Coppa, G.G.M.; Lapenta, G.; Ravetto, P.

    1990-01-01

    A comparative analysis of the space discretization schemes currently used in S N methods is performed and special attention is devoted to direct integration techniques. Some improvements are proposed in one- and two-dimensional applications, which are based on suitable choices for the spatial variation of the collision source. A study of the convergence pattern is carried out for eigenvalue calculations within the space asymptotic approximation by means of both analytical and numerical investigations. (orig.) [de

  12. The additive hazards model with high-dimensional regressors

    DEFF Research Database (Denmark)

    Martinussen, Torben; Scheike, Thomas

    2009-01-01

    This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study...... model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary...

  13. High-dimensional cluster analysis with the Masked EM Algorithm

    Science.gov (United States)

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  14. Scalable Nearest Neighbor Algorithms for High Dimensional Data.

    Science.gov (United States)

    Muja, Marius; Lowe, David G

    2014-11-01

    For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

  15. Introduction to high-dimensional statistics

    CERN Document Server

    Giraud, Christophe

    2015-01-01

    Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for ha

  16. Estimating High-Dimensional Time Series Models

    DEFF Research Database (Denmark)

    Medeiros, Marcelo C.; Mendes, Eduardo F.

    We study the asymptotic properties of the Adaptive LASSO (adaLASSO) in sparse, high-dimensional, linear time-series models. We assume both the number of covariates in the model and candidate variables can increase with the number of observations and the number of candidate variables is, possibly......, larger than the number of observations. We show the adaLASSO consistently chooses the relevant variables as the number of observations increases (model selection consistency), and has the oracle property, even when the errors are non-Gaussian and conditionally heteroskedastic. A simulation study shows...

  17. High dimensional classifiers in the imbalanced case

    DEFF Research Database (Denmark)

    Bak, Britta Anker; Jensen, Jens Ledet

    We consider the binary classification problem in the imbalanced case where the number of samples from the two groups differ. The classification problem is considered in the high dimensional case where the number of variables is much larger than the number of samples, and where the imbalance leads...... to a bias in the classification. A theoretical analysis of the independence classifier reveals the origin of the bias and based on this we suggest two new classifiers that can handle any imbalance ratio. The analytical results are supplemented by a simulation study, where the suggested classifiers in some...

  18. Topology of high-dimensional manifolds

    Energy Technology Data Exchange (ETDEWEB)

    Farrell, F T [State University of New York, Binghamton (United States); Goettshe, L [Abdus Salam ICTP, Trieste (Italy); Lueck, W [Westfaelische Wilhelms-Universitaet Muenster, Muenster (Germany)

    2002-08-15

    The School on High-Dimensional Manifold Topology took place at the Abdus Salam ICTP, Trieste from 21 May 2001 to 8 June 2001. The focus of the school was on the classification of manifolds and related aspects of K-theory, geometry, and operator theory. The topics covered included: surgery theory, algebraic K- and L-theory, controlled topology, homology manifolds, exotic aspherical manifolds, homeomorphism and diffeomorphism groups, and scalar curvature. The school consisted of 2 weeks of lecture courses and one week of conference. Thwo-part lecture notes volume contains the notes of most of the lecture courses.

  19. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  20. CT features of invasion of sublingual space by malignant oropharyngeal tumors

    International Nuclear Information System (INIS)

    Wei Yi; Xiao Jiahe; Zhou Xiangping; Deng Kaihong

    2003-01-01

    Objective: To investigate the CT features of the invasion of sublingual space by malignant oropharyngeal tumors in order to provide more accurate information for clinical treatment. Methods: Fifty-eight cases of pathologically proven malignant oropharyngeal tumors were collected and retrospectively analyzed. Results: Among all the cases, invasion of sublingual space by malignant oropharyngeal tumors could be seen in 14 cases, of which, 7 cases got access to sublingual space through tongue base, 3 cases through parapharyngeal space, 2 cases through pterygomandibular raphe, 2 cases through uncertain routes. Invasion of sublingual space manifested on CT scan as obliteration of fat plane in sublingual space and involvement of the sublingual vessels in the space. Conclusion: Malignant oropharyngeal tumors can invade the adjacent sublingual space via tongue base, pterygomandibular raphe, and parapharyngeal space. The invasion of sublingual space by malignant oropharyngeal tumors manifests in CT as effacement of sublingual fat plane and envelopment of hyoid artery

  1. Approximating high-dimensional dynamics by barycentric coordinates with linear programming

    Energy Technology Data Exchange (ETDEWEB)

    Hirata, Yoshito, E-mail: yoshito@sat.t.u-tokyo.ac.jp; Aihara, Kazuyuki; Suzuki, Hideyuki [Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505 (Japan); Department of Mathematical Informatics, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656 (Japan); CREST, JST, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012 (Japan); Shiro, Masanori [Department of Mathematical Informatics, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656 (Japan); Mathematical Neuroinformatics Group, Advanced Industrial Science and Technology, Tsukuba, Ibaraki 305-8568 (Japan); Takahashi, Nozomu; Mas, Paloma [Center for Research in Agricultural Genomics (CRAG), Consorci CSIC-IRTA-UAB-UB, Barcelona 08193 (Spain)

    2015-01-15

    The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics of the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data.

  2. Approximating high-dimensional dynamics by barycentric coordinates with linear programming

    International Nuclear Information System (INIS)

    Hirata, Yoshito; Aihara, Kazuyuki; Suzuki, Hideyuki; Shiro, Masanori; Takahashi, Nozomu; Mas, Paloma

    2015-01-01

    The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics of the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data

  3. Approximating high-dimensional dynamics by barycentric coordinates with linear programming.

    Science.gov (United States)

    Hirata, Yoshito; Shiro, Masanori; Takahashi, Nozomu; Aihara, Kazuyuki; Suzuki, Hideyuki; Mas, Paloma

    2015-01-01

    The increasing development of novel methods and techniques facilitates the measurement of high-dimensional time series but challenges our ability for accurate modeling and predictions. The use of a general mathematical model requires the inclusion of many parameters, which are difficult to be fitted for relatively short high-dimensional time series observed. Here, we propose a novel method to accurately model a high-dimensional time series. Our method extends the barycentric coordinates to high-dimensional phase space by employing linear programming, and allowing the approximation errors explicitly. The extension helps to produce free-running time-series predictions that preserve typical topological, dynamical, and/or geometric characteristics of the underlying attractors more accurately than the radial basis function model that is widely used. The method can be broadly applied, from helping to improve weather forecasting, to creating electronic instruments that sound more natural, and to comprehensively understanding complex biological data.

  4. High-Dimensional Adaptive Particle Swarm Optimization on Heterogeneous Systems

    International Nuclear Information System (INIS)

    Wachowiak, M P; Sarlo, B B; Foster, A E Lambe

    2014-01-01

    Much work has recently been reported in parallel GPU-based particle swarm optimization (PSO). Motivated by the encouraging results of these investigations, while also recognizing the limitations of GPU-based methods for big problems using a large amount of data, this paper explores the efficacy of employing other types of parallel hardware for PSO. Most commodity systems feature a variety of architectures whose high-performance capabilities can be exploited. In this paper, high-dimensional problems and those that employ a large amount of external data are explored within the context of heterogeneous systems. Large problems are decomposed into constituent components, and analyses are undertaken of which components would benefit from multi-core or GPU parallelism. The current study therefore provides another demonstration that ''supercomputing on a budget'' is possible when subtasks of large problems are run on hardware most suited to these tasks. Experimental results show that large speedups can be achieved on high dimensional, data-intensive problems. Cost functions must first be analysed for parallelization opportunities, and assigned hardware based on the particular task

  5. High-dimensional single-cell cancer biology.

    Science.gov (United States)

    Irish, Jonathan M; Doxie, Deon B

    2014-01-01

    Cancer cells are distinguished from each other and from healthy cells by features that drive clonal evolution and therapy resistance. New advances in high-dimensional flow cytometry make it possible to systematically measure mechanisms of tumor initiation, progression, and therapy resistance on millions of cells from human tumors. Here we describe flow cytometry techniques that enable a "single-cell " view of cancer. High-dimensional techniques like mass cytometry enable multiplexed single-cell analysis of cell identity, clinical biomarkers, signaling network phospho-proteins, transcription factors, and functional readouts of proliferation, cell cycle status, and apoptosis. This capability pairs well with a signaling profiles approach that dissects mechanism by systematically perturbing and measuring many nodes in a signaling network. Single-cell approaches enable study of cellular heterogeneity of primary tissues and turn cell subsets into experimental controls or opportunities for new discovery. Rare populations of stem cells or therapy-resistant cancer cells can be identified and compared to other types of cells within the same sample. In the long term, these techniques will enable tracking of minimal residual disease (MRD) and disease progression. By better understanding biological systems that control development and cell-cell interactions in healthy and diseased contexts, we can learn to program cells to become therapeutic agents or target malignant signaling events to specifically kill cancer cells. Single-cell approaches that provide deep insight into cell signaling and fate decisions will be critical to optimizing the next generation of cancer treatments combining targeted approaches and immunotherapy.

  6. Clustering high dimensional data using RIA

    Energy Technology Data Exchange (ETDEWEB)

    Aziz, Nazrina [School of Quantitative Sciences, College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah (Malaysia)

    2015-05-15

    Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily and hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.

  7. Built spaces and features associated with user satisfaction in maternity waiting homes in Malawi.

    Science.gov (United States)

    McIntosh, Nathalie; Gruits, Patricia; Oppel, Eva; Shao, Amie

    2018-07-01

    To assess satisfaction with maternity waiting home built spaces and features in women who are at risk for underutilizing maternity waiting homes (i.e. residential facilities that temporarily house near-term pregnant mothers close to healthcare facilities that provide obstetrical care). Specifically we wanted to answer the questions: (1) Are built spaces and features associated with maternity waiting home user satisfaction? (2) Can built spaces and features designed to improve hygiene, comfort, privacy and function improve maternity waiting home user satisfaction? And (3) Which built spaces and features are most important for maternity waiting home user satisfaction? A cross-sectional study comparing satisfaction with standard and non-standard maternity waiting home designs. Between December 2016 and February 2017 we surveyed expectant mothers at two maternity waiting homes that differed in their design of built spaces and features. We used bivariate analyses to assess if built spaces and features were associated with satisfaction. We compared ratings of built spaces and features between the two maternity waiting homes using chi-squares and t-tests to assess if design features to improve hygiene, comfort, privacy and function were associated with higher satisfaction. We used exploratory robust regression analysis to examine the relationship between built spaces and features and maternity waiting home satisfaction. Two maternity waiting homes in Malawi, one that incorporated non-standardized design features to improve hygiene, comfort, privacy, and function (Kasungu maternity waiting home) and the other that had a standard maternity waiting home design (Dowa maternity waiting home). 322 expectant mothers at risk for underutilizing maternity waiting homes (i.e. first-time mothers and those with no pregnancy risk factors) who had stayed at the Kasungu or Dowa maternity waiting homes. There were significant differences in ratings of built spaces and features between the

  8. A modular CUDA-based framework for scale-space feature detection in video streams

    International Nuclear Information System (INIS)

    Kinsner, M; Capson, D; Spence, A

    2010-01-01

    Multi-scale image processing techniques enable extraction of features where the size of a feature is either unknown or changing, but the requirement to process image data at multiple scale levels imposes a substantial computational load. This paper describes the architecture and emerging results from the implementation of a GPGPU-accelerated scale-space feature detection framework for video processing. A discrete scale-space representation is generated for image frames within a video stream, and multi-scale feature detection metrics are applied to detect ridges and Gaussian blobs at video frame rates. A modular structure is adopted, in which common feature extraction tasks such as non-maximum suppression and local extrema search may be reused across a variety of feature detectors. Extraction of ridge and blob features is achieved at faster than 15 frames per second on video sequences from a machine vision system, utilizing an NVIDIA GTX 480 graphics card. By design, the framework is easily extended to additional feature classes through the inclusion of feature metrics to be applied to the scale-space representation, and using common post-processing modules to reduce the required CPU workload. The framework is scalable across multiple and more capable GPUs, and enables previously intractable image processing at video frame rates using commodity computational hardware.

  9. An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

    Directory of Open Access Journals (Sweden)

    Yu Wang

    2014-01-01

    Full Text Available Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH, to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

  10. Space-dependent step features: Transient breakdown of slow-roll, homogeneity, and isotropy during inflation

    International Nuclear Information System (INIS)

    Lerner, Rose N.; McDonald, John

    2009-01-01

    A step feature in the inflaton potential can model a transient breakdown of slow-roll inflation. Here we generalize the step feature to include space-dependence, allowing it also to model a breakdown of homogeneity and isotropy. The space-dependent inflaton potential generates a classical curvature perturbation mode characterized by the wave number of the step inhomogeneity. For inhomogeneities small compared with the horizon at the step, space-dependence has a small effect on the curvature perturbation. Therefore, the smoothly oscillating quantum power spectrum predicted by the homogeneous step is robust with respect to subhorizon space-dependence. For inhomogeneities equal to or greater than the horizon at the step, the space-dependent classical mode can dominate, producing a curvature perturbation in which modes of wave number determined by the step inhomogeneity are superimposed on the oscillating power spectrum. Generation of a space-dependent step feature may therefore provide a mechanism to introduce primordial anisotropy into the curvature perturbation. Space-dependence also modifies the quantum fluctuations, in particular, via resonancelike features coming from mode coupling to amplified superhorizon modes. However, these effects are small relative to the classical modes.

  11. Quantum correlation of high dimensional system in a dephasing environment

    Science.gov (United States)

    Ji, Yinghua; Ke, Qiang; Hu, Juju

    2018-05-01

    For a high dimensional spin-S system embedded in a dephasing environment, we theoretically analyze the time evolutions of quantum correlation and entanglement via Frobenius norm and negativity. The quantum correlation dynamics can be considered as a function of the decoherence parameters, including the ratio between the system oscillator frequency ω0 and the reservoir cutoff frequency ωc , and the different environment temperature. It is shown that the quantum correlation can not only measure nonclassical correlation of the considered system, but also perform a better robustness against the dissipation. In addition, the decoherence presents the non-Markovian features and the quantum correlation freeze phenomenon. The former is much weaker than that in the sub-Ohmic or Ohmic thermal reservoir environment.

  12. Overall feature of EAST operation space by using simple Core-SOL-Divertor model

    International Nuclear Information System (INIS)

    Hiwatari, R.; Hatayama, A.; Zhu, S.; Takizuka, T.; Tomita, Y.

    2005-01-01

    We have developed a simple Core-SOL-Divertor (C-S-D) model to investigate qualitatively the overall features of the operational space for the integrated core and edge plasma. To construct the simple C-S-D model, a simple core plasma model of ITER physics guidelines and a two-point SOL-divertor model are used. The simple C-S-D model is applied to the study of the EAST operational space with lower hybrid current drive experiments under various kinds of trade-off for the basic plasma parameters. Effective methods for extending the operation space are also presented. As shown by this study for the EAST operation space, it is evident that the C-S-D model is a useful tool to understand qualitatively the overall features of the plasma operation space. (author)

  13. Evaluating Clustering in Subspace Projections of High Dimensional Data

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Günnemann, Stephan; Assent, Ira

    2009-01-01

    Clustering high dimensional data is an emerging research field. Subspace clustering or projected clustering group similar objects in subspaces, i.e. projections, of the full space. In the past decade, several clustering paradigms have been developed in parallel, without thorough evaluation...... and comparison between these paradigms on a common basis. Conclusive evaluation and comparison is challenged by three major issues. First, there is no ground truth that describes the "true" clusters in real world data. Second, a large variety of evaluation measures have been used that reflect different aspects...... of the clustering result. Finally, in typical publications authors have limited their analysis to their favored paradigm only, while paying other paradigms little or no attention. In this paper, we take a systematic approach to evaluate the major paradigms in a common framework. We study representative clustering...

  14. Visual scan-path analysis with feature space transient fixation moments

    Science.gov (United States)

    Dempere-Marco, Laura; Hu, Xiao-Peng; Yang, Guang-Zhong

    2003-05-01

    The study of eye movements provides useful insight into the cognitive processes underlying visual search tasks. The analysis of the dynamics of eye movements has often been approached from a purely spatial perspective. In many cases, however, it may not be possible to define meaningful or consistent dynamics without considering the features underlying the scan paths. In this paper, the definition of the feature space has been attempted through the concept of visual similarity and non-linear low dimensional embedding, which defines a mapping from the image space into a low dimensional feature manifold that preserves the intrinsic similarity of image patterns. This has enabled the definition of perceptually meaningful features without the use of domain specific knowledge. Based on this, this paper introduces a new concept called Feature Space Transient Fixation Moments (TFM). The approach presented tackles the problem of feature space representation of visual search through the use of TFM. We demonstrate the practical values of this concept for characterizing the dynamics of eye movements in goal directed visual search tasks. We also illustrate how this model can be used to elucidate the fundamental steps involved in skilled search tasks through the evolution of transient fixation moments.

  15. An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection

    International Nuclear Information System (INIS)

    Zhang, Liangwei; Lin, Jing; Karim, Ramin

    2015-01-01

    The accuracy of traditional anomaly detection techniques implemented on full-dimensional spaces degrades significantly as dimensionality increases, thereby hampering many real-world applications. This work proposes an approach to selecting meaningful feature subspace and conducting anomaly detection in the corresponding subspace projection. The aim is to maintain the detection accuracy in high-dimensional circumstances. The suggested approach assesses the angle between all pairs of two lines for one specific anomaly candidate: the first line is connected by the relevant data point and the center of its adjacent points; the other line is one of the axis-parallel lines. Those dimensions which have a relatively small angle with the first line are then chosen to constitute the axis-parallel subspace for the candidate. Next, a normalized Mahalanobis distance is introduced to measure the local outlier-ness of an object in the subspace projection. To comprehensively compare the proposed algorithm with several existing anomaly detection techniques, we constructed artificial datasets with various high-dimensional settings and found the algorithm displayed superior accuracy. A further experiment on an industrial dataset demonstrated the applicability of the proposed algorithm in fault detection tasks and highlighted another of its merits, namely, to provide preliminary interpretation of abnormality through feature ordering in relevant subspaces. - Highlights: • An anomaly detection approach for high-dimensional reliability data is proposed. • The approach selects relevant subspaces by assessing vectorial angles. • The novel ABSAD approach displays superior accuracy over other alternatives. • Numerical illustration approves its efficacy in fault detection applications

  16. Large anterior temporal Virchow-Robin spaces: unique MR imaging features

    Energy Technology Data Exchange (ETDEWEB)

    Lim, Anthony T. [Monash University, Neuroradiology Service, Monash Imaging, Monash Health, Melbourne, Victoria (Australia); Chandra, Ronil V. [Monash University, Neuroradiology Service, Monash Imaging, Monash Health, Melbourne, Victoria (Australia); Monash University, Department of Surgery, Faculty of Medicine, Nursing and Health Sciences, Melbourne (Australia); Trost, Nicholas M. [St Vincent' s Hospital, Neuroradiology Service, Melbourne (Australia); McKelvie, Penelope A. [St Vincent' s Hospital, Anatomical Pathology, Melbourne (Australia); Stuckey, Stephen L. [Monash University, Neuroradiology Service, Monash Imaging, Monash Health, Melbourne, Victoria (Australia); Monash University, Southern Clinical School, Faculty of Medicine, Nursing and Health Sciences, Melbourne (Australia)

    2015-05-01

    Large Virchow-Robin (VR) spaces may mimic cystic tumor. The anterior temporal subcortical white matter is a recently described preferential location, with only 18 reported cases. Our aim was to identify unique MR features that could increase prospective diagnostic confidence. Thirty-nine cases were identified between November 2003 and February 2014. Demographic, clinical data and the initial radiological report were retrospectively reviewed. Two neuroradiologists reviewed all MR imaging; a neuropathologist reviewed histological data. Median age was 58 years (range 24-86 years); the majority (69 %) was female. There were no clinical symptoms that could be directly referable to the lesion. Two thirds were considered to be VR spaces on the initial radiological report. Mean maximal size was 9 mm (range 5-17 mm); majority (79 %) had perilesional T2 or fluid-attenuated inversion recovery (FLAIR) hyperintensity. The following were identified as potential unique MR features: focal cortical distortion by an adjacent branch of the middle cerebral artery (92 %), smaller adjacent VR spaces (26 %), and a contiguous cerebrospinal fluid (CSF) intensity tract (21 %). Surgery was performed in three asymptomatic patients; histopathology confirmed VR spaces. Unique MR features were retrospectively identified in all three patients. Large anterior temporal lobe VR spaces commonly demonstrate perilesional T2 or FLAIR signal and can be misdiagnosed as cystic tumor. Potential unique MR features that could increase prospective diagnostic confidence include focal cortical distortion by an adjacent branch of the middle cerebral artery, smaller adjacent VR spaces, and a contiguous CSF intensity tract. (orig.)

  17. High-dimensional orbital angular momentum entanglement concentration based on Laguerre–Gaussian mode selection

    International Nuclear Information System (INIS)

    Zhang, Wuhong; Su, Ming; Wu, Ziwen; Lu, Meng; Huang, Bingwei; Chen, Lixiang

    2013-01-01

    Twisted photons enable the definition of a Hilbert space beyond two dimensions by orbital angular momentum (OAM) eigenstates. Here we propose a feasible entanglement concentration experiment, to enhance the quality of high-dimensional entanglement shared by twisted photon pairs. Our approach is started from the full characterization of entangled spiral bandwidth, and is then based on the careful selection of the Laguerre–Gaussian (LG) modes with specific radial and azimuthal indices p and ℓ. In particular, we demonstrate the possibility of high-dimensional entanglement concentration residing in the OAM subspace of up to 21 dimensions. By means of LabVIEW simulations with spatial light modulators, we show that the Shannon dimensionality could be employed to quantify the quality of the present concentration. Our scheme holds promise in quantum information applications defined in high-dimensional Hilbert space. (letter)

  18. Detection of Subtle Context-Dependent Model Inaccuracies in High-Dimensional Robot Domains.

    Science.gov (United States)

    Mendoza, Juan Pablo; Simmons, Reid; Veloso, Manuela

    2016-12-01

    Autonomous robots often rely on models of their sensing and actions for intelligent decision making. However, when operating in unconstrained environments, the complexity of the world makes it infeasible to create models that are accurate in every situation. This article addresses the problem of using potentially large and high-dimensional sets of robot execution data to detect situations in which a robot model is inaccurate-that is, detecting context-dependent model inaccuracies in a high-dimensional context space. To find inaccuracies tractably, the robot conducts an informed search through low-dimensional projections of execution data to find parametric Regions of Inaccurate Modeling (RIMs). Empirical evidence from two robot domains shows that this approach significantly enhances the detection power of existing RIM-detection algorithms in high-dimensional spaces.

  19. Modeling High-Dimensional Multichannel Brain Signals

    KAUST Repository

    Hu, Lechuan

    2017-12-12

    Our goal is to model and measure functional and effective (directional) connectivity in multichannel brain physiological signals (e.g., electroencephalograms, local field potentials). The difficulties from analyzing these data mainly come from two aspects: first, there are major statistical and computational challenges for modeling and analyzing high-dimensional multichannel brain signals; second, there is no set of universally agreed measures for characterizing connectivity. To model multichannel brain signals, our approach is to fit a vector autoregressive (VAR) model with potentially high lag order so that complex lead-lag temporal dynamics between the channels can be captured. Estimates of the VAR model will be obtained by our proposed hybrid LASSLE (LASSO + LSE) method which combines regularization (to control for sparsity) and least squares estimation (to improve bias and mean-squared error). Then we employ some measures of connectivity but put an emphasis on partial directed coherence (PDC) which can capture the directional connectivity between channels. PDC is a frequency-specific measure that explains the extent to which the present oscillatory activity in a sender channel influences the future oscillatory activity in a specific receiver channel relative to all possible receivers in the network. The proposed modeling approach provided key insights into potential functional relationships among simultaneously recorded sites during performance of a complex memory task. Specifically, this novel method was successful in quantifying patterns of effective connectivity across electrode locations, and in capturing how these patterns varied across trial epochs and trial types.

  20. A qualitative numerical study of high dimensional dynamical systems

    Science.gov (United States)

    Albers, David James

    Since Poincare, the father of modern mathematical dynamical systems, much effort has been exerted to achieve a qualitative understanding of the physical world via a qualitative understanding of the functions we use to model the physical world. In this thesis, we construct a numerical framework suitable for a qualitative, statistical study of dynamical systems using the space of artificial neural networks. We analyze the dynamics along intervals in parameter space, separating the set of neural networks into roughly four regions: the fixed point to the first bifurcation; the route to chaos; the chaotic region; and a transition region between chaos and finite-state neural networks. The study is primarily with respect to high-dimensional dynamical systems. We make the following general conclusions as the dimension of the dynamical system is increased: the probability of the first bifurcation being of type Neimark-Sacker is greater than ninety-percent; the most probable route to chaos is via a cascade of bifurcations of high-period periodic orbits, quasi-periodic orbits, and 2-tori; there exists an interval of parameter space such that hyperbolicity is violated on a countable, Lebesgue measure 0, "increasingly dense" subset; chaos is much more likely to persist with respect to parameter perturbation in the chaotic region of parameter space as the dimension is increased; moreover, as the number of positive Lyapunov exponents is increased, the likelihood that any significant portion of these positive exponents can be perturbed away decreases with increasing dimension. The maximum Kaplan-Yorke dimension and the maximum number of positive Lyapunov exponents increases linearly with dimension. The probability of a dynamical system being chaotic increases exponentially with dimension. The results with respect to the first bifurcation and the route to chaos comment on previous results of Newhouse, Ruelle, Takens, Broer, Chenciner, and Iooss. Moreover, results regarding the high-dimensional

  1. High-Dimensional Quantum Information Processing with Linear Optics

    Science.gov (United States)

    Fitzpatrick, Casey A.

    Quantum information processing (QIP) is an interdisciplinary field concerned with the development of computers and information processing systems that utilize quantum mechanical properties of nature to carry out their function. QIP systems have become vastly more practical since the turn of the century. Today, QIP applications span imaging, cryptographic security, computation, and simulation (quantum systems that mimic other quantum systems). Many important strategies improve quantum versions of classical information system hardware, such as single photon detectors and quantum repeaters. Another more abstract strategy engineers high-dimensional quantum state spaces, so that each successful event carries more information than traditional two-level systems allow. Photonic states in particular bring the added advantages of weak environmental coupling and data transmission near the speed of light, allowing for simpler control and lower system design complexity. In this dissertation, numerous novel, scalable designs for practical high-dimensional linear-optical QIP systems are presented. First, a correlated photon imaging scheme using orbital angular momentum (OAM) states to detect rotational symmetries in objects using measurements, as well as building images out of those interactions is reported. Then, a statistical detection method using chains of OAM superpositions distributed according to the Fibonacci sequence is established and expanded upon. It is shown that the approach gives rise to schemes for sorting, detecting, and generating the recursively defined high-dimensional states on which some quantum cryptographic protocols depend. Finally, an ongoing study based on a generalization of the standard optical multiport for applications in quantum computation and simulation is reported upon. The architecture allows photons to reverse momentum inside the device. This in turn enables realistic implementation of controllable linear-optical scattering vertices for

  2. High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps

    International Nuclear Information System (INIS)

    Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.; Chen, Xiao

    2017-01-01

    This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. It relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.

  3. Finding and Visualizing Relevant Subspaces for Clustering High-Dimensional Astronomical Data Using Connected Morphological Operators

    NARCIS (Netherlands)

    Ferdosi, Bilkis J.; Buddelmeijer, Hugo; Trager, Scott; Wilkinson, Michael H.F.; Roerdink, Jos B.T.M.

    2010-01-01

    Data sets in astronomy are growing to enormous sizes. Modern astronomical surveys provide not only image data but also catalogues of millions of objects (stars, galaxies), each object with hundreds of associated parameters. Exploration of this very high-dimensional data space poses a huge challenge.

  4. Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest.

    Science.gov (United States)

    Ma, Suliang; Chen, Mingxuan; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

    2018-04-16

    Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods.

  5. Reduced basis ANOVA methods for partial differential equations with high-dimensional random inputs

    Energy Technology Data Exchange (ETDEWEB)

    Liao, Qifeng, E-mail: liaoqf@shanghaitech.edu.cn [School of Information Science and Technology, ShanghaiTech University, Shanghai 200031 (China); Lin, Guang, E-mail: guanglin@purdue.edu [Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907 (United States)

    2016-07-15

    In this paper we present a reduced basis ANOVA approach for partial deferential equations (PDEs) with random inputs. The ANOVA method combined with stochastic collocation methods provides model reduction in high-dimensional parameter space through decomposing high-dimensional inputs into unions of low-dimensional inputs. In this work, to further reduce the computational cost, we investigate spatial low-rank structures in the ANOVA-collocation method, and develop efficient spatial model reduction techniques using hierarchically generated reduced bases. We present a general mathematical framework of the methodology, validate its accuracy and demonstrate its efficiency with numerical experiments.

  6. A systematic exploration of the micro-blog feature space for teens stress detection.

    Science.gov (United States)

    Zhao, Liang; Li, Qi; Xue, Yuanyuan; Jia, Jia; Feng, Ling

    2016-01-01

    In the modern stressful society, growing teenagers experience severe stress from different aspects from school to friends, from self-cognition to inter-personal relationship, which negatively influences their smooth and healthy development. Being timely and accurately aware of teenagers psychological stress and providing effective measures to help immature teenagers to cope with stress are highly valuable to both teenagers and human society. Previous work demonstrates the feasibility to sense teenagers' stress from their tweeting contents and context on the open social media platform-micro-blog. However, a tweet is still too short for teens to express their stressful status in a comprehensive way. Considering the topic continuity from the tweeting content to the follow-up comments and responses between the teenager and his/her friends, we combine the content of comments and responses under the tweet to supplement the tweet content. Also, such friends' caring comments like "what happened?", "Don't worry!", "Cheer up!", etc. provide hints to teenager's stressful status. Hence, in this paper, we propose to systematically explore the micro-blog feature space, comprised of four kinds of features [tweeting content features (FW), posting features (FP), interaction features (FI), and comment-response features (FC) between teenagers and friends] for teenager' stress category and stress level detection. We extract and analyze these feature values and their impacts on teens stress detection. We evaluate the framework through a real user study of 36 high school students aged 17. Different classifiers are employed to detect potential stress categories and corresponding stress levels. Experimental results show that all the features in the feature space positively affect stress detection, and linguistic negative emotion, proportion of negative sentences, friends' caring comments and teen's reply rate play more significant roles than the rest features. Micro-blog platform provides

  7. Quality and efficiency in high dimensional Nearest neighbor search

    KAUST Repository

    Tao, Yufei; Yi, Ke; Sheng, Cheng; Kalnis, Panos

    2009-01-01

    Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sub-linearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. Rigorous-LSH ensures good quality of query results, but requires expensive space and query cost. Although adhoc-LSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive B-tree (LSB-tree) that enables fast highdimensional NN search with excellent quality. The combination of several LSB-trees leads to a structure called the LSB-forest that ensures the same result quality as rigorous-LSH, but reduces its space and query cost dramatically. The LSB-forest also outperforms adhoc-LSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSB-tree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSB-tree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linear-space) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality. © 2009 ACM.

  8. Computing and visualizing time-varying merge trees for high-dimensional data

    Energy Technology Data Exchange (ETDEWEB)

    Oesterling, Patrick [Univ. of Leipzig (Germany); Heine, Christian [Univ. of Kaiserslautern (Germany); Weber, Gunther H. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Morozov, Dmitry [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Scheuermann, Gerik [Univ. of Leipzig (Germany)

    2017-06-03

    We introduce a new method that identifies and tracks features in arbitrary dimensions using the merge tree -- a structure for identifying topological features based on thresholding in scalar fields. This method analyzes the evolution of features of the function by tracking changes in the merge tree and relates features by matching subtrees between consecutive time steps. Using the time-varying merge tree, we present a structural visualization of the changing function that illustrates both features and their temporal evolution. We demonstrate the utility of our approach by applying it to temporal cluster analysis of high-dimensional point clouds.

  9. Feature-Space Clustering for fMRI Meta-Analysis

    DEFF Research Database (Denmark)

    Goutte, Cyril; Hansen, Lars Kai; Liptrot, Mathew G.

    2001-01-01

    MRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular......, shows interesting differences between individual voxel analysis performed with traditional methods. © 2001 Wiley-Liss, Inc....

  10. Features of public open spaces and physical activity among children: findings from the CLAN study.

    Science.gov (United States)

    Timperio, Anna; Giles-Corti, Billie; Crawford, David; Andrianopoulos, Nick; Ball, Kylie; Salmon, Jo; Hume, Clare

    2008-11-01

    To examine associations between features of public open spaces, and children's physical activity. 163 children aged 8-9 years and 334 adolescents aged 13-15 years from Melbourne, Australia participated in 2004. A Geographic Information System was used to identify all public open spaces (POS) within 800 m of participants' homes and their closest POS. The features of all POS identified were audited in 2004/5. Accelerometers measured moderate-to-vigorous physical activity (MVPA) after school and on weekends. Linear regression analyses examined associations between features of the closest POS and participants' MVPA. Most participants had a POS within 800 m of their home. The presence of playgrounds was positively associated with younger boys' weekend MVPA (B=24.9 min/day; pPOS were associated with participants' MVPA, although mixed associations were evident. Further research is required to clarify these complex relationships.

  11. Model-based Clustering of High-Dimensional Data in Astrophysics

    Science.gov (United States)

    Bouveyron, C.

    2016-05-01

    The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.

  12. An alternative to scale-space representation for extracting local features in image recognition

    DEFF Research Database (Denmark)

    Andersen, Hans Jørgen; Nguyen, Phuong Giang

    2012-01-01

    In image recognition, the common approach for extracting local features using a scale-space representation has usually three main steps; first interest points are extracted at different scales, next from a patch around each interest point the rotation is calculated with corresponding orientation...... and compensation, and finally a descriptor is computed for the derived patch (i.e. feature of the patch). To avoid the memory and computational intensive process of constructing the scale-space, we use a method where no scale-space is required This is done by dividing the given image into a number of triangles...... with sizes dependent on the content of the image, at the location of each triangle. In this paper, we will demonstrate that by rotation of the interest regions at the triangles it is possible in grey scale images to achieve a recognition precision comparable with that of MOPS. The test of the proposed method...

  13. Effects of dependence in high-dimensional multiple testing problems

    Directory of Open Access Journals (Sweden)

    van de Wiel Mark A

    2008-02-01

    Full Text Available Abstract Background We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices. Results We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable. Conclusion We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on π0 or FDR estimation in a dependency context.

  14. On equivalent parameter learning in simplified feature space based on Bayesian asymptotic analysis.

    Science.gov (United States)

    Yamazaki, Keisuke

    2012-07-01

    Parametric models for sequential data, such as hidden Markov models, stochastic context-free grammars, and linear dynamical systems, are widely used in time-series analysis and structural data analysis. Computation of the likelihood function is one of primary considerations in many learning methods. Iterative calculation of the likelihood such as the model selection is still time-consuming though there are effective algorithms based on dynamic programming. The present paper studies parameter learning in a simplified feature space to reduce the computational cost. Simplifying data is a common technique seen in feature selection and dimension reduction though an oversimplified space causes adverse learning results. Therefore, we mathematically investigate a condition of the feature map to have an asymptotically equivalent convergence point of estimated parameters, referred to as the vicarious map. As a demonstration to find vicarious maps, we consider the feature space, which limits the length of data, and derive a necessary length for parameter learning in hidden Markov models. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Spiral CT features and anatomic basis of posterior pararenal space involvement in acute pancreatitis

    International Nuclear Information System (INIS)

    Min Pengqiu; Yan Zhihan; Yang Hengxuan; Liu Zaiyi; Song Bin; Wu Bing; Zhang Jin; Liu Rongbo

    2005-01-01

    Objective: To evaluate spiral CT features and anatomic basis of the posterior pararenal space (PPS) involvement in acute pancreatitis (AP). Methods: CT images of 87 cases with AP were retrospectively studied with focus on spiral CT features, incidence of the PPS involvement, and its correlations with the posterior renal fascia or lateroconal fascia. Results: Our study showed that the incidence of the PPS involvement was 47% (41/87), with Grade A 53% (46/87), Grade B 24%(21/87), and Grade C 23% (20/87), and Grade 0 53% (46/87), Grade I 22% (19/87), and Grade II 25% (22/87), respectively. The pancreatitis fluid collection in the PPS was continuous with that in the anterior pararenal space or with the fluid between the two laminae of the posterior renal fascia. In 3 follow-up cases, pseudocysts in the PPS were continuous with that in anterior pararenal space below the cone of renal fascia. Conclusion: Spiral CT features of the PPS involvement varies from mild inflammatory changes to fluid collection or phlegmonous mass. Fluid within anterior pararenal space in AP flows into the PPS by three routes. (authors)

  16. Optimal Feature Space Selection in Detecting Epileptic Seizure based on Recurrent Quantification Analysis and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saleh LAshkari

    2016-06-01

    Full Text Available Selecting optimal features based on nature of the phenomenon and high discriminant ability is very important in the data classification problems. Since it doesn't require any assumption about stationary condition and size of the signal and the noise in Recurrent Quantification Analysis (RQA, it may be useful for epileptic seizure Detection. In this study, RQA was used to discriminate ictal EEG from the normal EEG where optimal features selected by combination of algorithm genetic and Bayesian Classifier. Recurrence plots of hundred samples in each two categories were obtained with five distance norms in this study: Euclidean, Maximum, Minimum, Normalized and Fixed Norm. In order to choose optimal threshold for each norm, ten threshold of ε was generated and then the best feature space was selected by genetic algorithm in combination with a bayesian classifier. The results shown that proposed method is capable of discriminating the ictal EEG from the normal EEG where for Minimum norm and 0.1˂ε˂1, accuracy was 100%. In addition, the sensitivity of proposed framework to the ε and the distance norm parameters was low. The optimal feature presented in this study is Trans which it was selected in most feature spaces with high accuracy.

  17. TH-CD-207A-07: Prediction of High Dimensional State Subject to Respiratory Motion: A Manifold Learning Approach

    International Nuclear Information System (INIS)

    Liu, W; Sawant, A; Ruan, D

    2016-01-01

    Purpose: The development of high dimensional imaging systems (e.g. volumetric MRI, CBCT, photogrammetry systems) in image-guided radiotherapy provides important pathways to the ultimate goal of real-time volumetric/surface motion monitoring. This study aims to develop a prediction method for the high dimensional state subject to respiratory motion. Compared to conventional linear dimension reduction based approaches, our method utilizes manifold learning to construct a descriptive feature submanifold, where more efficient and accurate prediction can be performed. Methods: We developed a prediction framework for high-dimensional state subject to respiratory motion. The proposed method performs dimension reduction in a nonlinear setting to permit more descriptive features compared to its linear counterparts (e.g., classic PCA). Specifically, a kernel PCA is used to construct a proper low-dimensional feature manifold, where low-dimensional prediction is performed. A fixed-point iterative pre-image estimation method is applied subsequently to recover the predicted value in the original state space. We evaluated and compared the proposed method with PCA-based method on 200 level-set surfaces reconstructed from surface point clouds captured by the VisionRT system. The prediction accuracy was evaluated with respect to root-mean-squared-error (RMSE) for both 200ms and 600ms lookahead lengths. Results: The proposed method outperformed PCA-based approach with statistically higher prediction accuracy. In one-dimensional feature subspace, our method achieved mean prediction accuracy of 0.86mm and 0.89mm for 200ms and 600ms lookahead lengths respectively, compared to 0.95mm and 1.04mm from PCA-based method. The paired t-tests further demonstrated the statistical significance of the superiority of our method, with p-values of 6.33e-3 and 5.78e-5, respectively. Conclusion: The proposed approach benefits from the descriptiveness of a nonlinear manifold and the prediction

  18. Medical X-ray Image Hierarchical Classification Using a Merging and Splitting Scheme in Feature Space.

    Science.gov (United States)

    Fesharaki, Nooshin Jafari; Pourghassem, Hossein

    2013-07-01

    Due to the daily mass production and the widespread variation of medical X-ray images, it is necessary to classify these for searching and retrieving proposes, especially for content-based medical image retrieval systems. In this paper, a medical X-ray image hierarchical classification structure based on a novel merging and splitting scheme and using shape and texture features is proposed. In the first level of the proposed structure, to improve the classification performance, similar classes with regard to shape contents are grouped based on merging measures and shape features into the general overlapped classes. In the next levels of this structure, the overlapped classes split in smaller classes based on the classification performance of combination of shape and texture features or texture features only. Ultimately, in the last levels, this procedure is also continued forming all the classes, separately. Moreover, to optimize the feature vector in the proposed structure, we use orthogonal forward selection algorithm according to Mahalanobis class separability measure as a feature selection and reduction algorithm. In other words, according to the complexity and inter-class distance of each class, a sub-space of the feature space is selected in each level and then a supervised merging and splitting scheme is applied to form the hierarchical classification. The proposed structure is evaluated on a database consisting of 2158 medical X-ray images of 18 classes (IMAGECLEF 2005 database) and accuracy rate of 93.6% in the last level of the hierarchical structure for an 18-class classification problem is obtained.

  19. Assessing spacing impact on coherent features in a wind turbine array boundary layer

    Directory of Open Access Journals (Sweden)

    N. Ali

    2018-02-01

    intermediate scales are responsible for features seen in the original profile. The variation in streamwise and spanwise spacing leads to changes in the background structure of the turbulence, where the color map based on barycentric map and Reynolds stress anisotropy tensor provides an alternate perspective on the nature of the perturbations within the wind turbine array. The impact of the streamwise and spanwise spacings on power produced is quantified, where the maximum production corresponds with the case of greatest turbine spacing.

  20. Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

    Science.gov (United States)

    Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Leclère, Vincent; Salmon, Joseph

    2017-10-01

    In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider ℓ 1 penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for uncertainty quantification. In this work, after illustrating numerical difficulties for the Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expensive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.

  1. High-Dimensional Function Approximation With Neural Networks for Large Volumes of Data.

    Science.gov (United States)

    Andras, Peter

    2018-02-01

    Approximation of high-dimensional functions is a challenge for neural networks due to the curse of dimensionality. Often the data for which the approximated function is defined resides on a low-dimensional manifold and in principle the approximation of the function over this manifold should improve the approximation performance. It has been show that projecting the data manifold into a lower dimensional space, followed by the neural network approximation of the function over this space, provides a more precise approximation of the function than the approximation of the function with neural networks in the original data space. However, if the data volume is very large, the projection into the low-dimensional space has to be based on a limited sample of the data. Here, we investigate the nature of the approximation error of neural networks trained over the projection space. We show that such neural networks should have better approximation performance than neural networks trained on high-dimensional data even if the projection is based on a relatively sparse sample of the data manifold. We also find that it is preferable to use a uniformly distributed sparse sample of the data for the purpose of the generation of the low-dimensional projection. We illustrate these results considering the practical neural network approximation of a set of functions defined on high-dimensional data including real world data as well.

  2. A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

    Science.gov (United States)

    Wang, Xueyi

    2012-02-08

    The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

  3. Multivariate statistics high-dimensional and large-sample approximations

    CERN Document Server

    Fujikoshi, Yasunori; Shimizu, Ryoichi

    2010-01-01

    A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic

  4. Hierarchical low-rank approximation for high dimensional approximation

    KAUST Repository

    Nouy, Anthony

    2016-01-01

    Tensor methods are among the most prominent tools for the numerical solution of high-dimensional problems where functions of multiple variables have to be approximated. Such high-dimensional approximation problems naturally arise in stochastic analysis and uncertainty quantification. In many practical situations, the approximation of high-dimensional functions is made computationally tractable by using rank-structured approximations. In this talk, we present algorithms for the approximation in hierarchical tensor format using statistical methods. Sparse representations in a given tensor format are obtained with adaptive or convex relaxation methods, with a selection of parameters using crossvalidation methods.

  5. Hierarchical low-rank approximation for high dimensional approximation

    KAUST Repository

    Nouy, Anthony

    2016-01-07

    Tensor methods are among the most prominent tools for the numerical solution of high-dimensional problems where functions of multiple variables have to be approximated. Such high-dimensional approximation problems naturally arise in stochastic analysis and uncertainty quantification. In many practical situations, the approximation of high-dimensional functions is made computationally tractable by using rank-structured approximations. In this talk, we present algorithms for the approximation in hierarchical tensor format using statistical methods. Sparse representations in a given tensor format are obtained with adaptive or convex relaxation methods, with a selection of parameters using crossvalidation methods.

  6. Detection of Coronal Mass Ejections Using Multiple Features and Space-Time Continuity

    Science.gov (United States)

    Zhang, Ling; Yin, Jian-qin; Lin, Jia-ben; Feng, Zhi-quan; Zhou, Jin

    2017-07-01

    Coronal Mass Ejections (CMEs) release tremendous amounts of energy in the solar system, which has an impact on satellites, power facilities and wireless transmission. To effectively detect a CME in Large Angle Spectrometric Coronagraph (LASCO) C2 images, we propose a novel algorithm to locate the suspected CME regions, using the Extreme Learning Machine (ELM) method and taking into account the features of the grayscale and the texture. Furthermore, space-time continuity is used in the detection algorithm to exclude the false CME regions. The algorithm includes three steps: i) define the feature vector which contains textural and grayscale features of a running difference image; ii) design the detection algorithm based on the ELM method according to the feature vector; iii) improve the detection accuracy rate by using the decision rule of the space-time continuum. Experimental results show the efficiency and the superiority of the proposed algorithm in the detection of CMEs compared with other traditional methods. In addition, our algorithm is insensitive to most noise.

  7. Spherical phantom for research of radiation situation in outer space. Design-structural special features

    International Nuclear Information System (INIS)

    Kartsev, I.S.; Eremenko, V.G.; Petrov, V.I.; Polenov, B.V.; Yudin, V.N.; Akatov, Yu.A.; Petrov, V.M.; Shurshakov, V.A.

    2005-01-01

    The design-structural features of the updated spherical phantom applied within the frameworks of the space experiment Matreshka-R at the Russian segment of International space station during ISS-8 and ISS-9 expeditions are described. The replacement of 48 polyethylene containers with TLD and STD assemblies by 16 cases installed from external side of the phantom and 4 tissue-equivalent caps of the central disk by 4 cases with detector assemblies is carried out. The updated tissue-equivalent phantom contains the active dosemeter based on 5 MOS detectors. The phantom cover is made from the non-flammable material NT-7. The basic characteristics of the flight specimen of the phantom are presented. The results of its on-Earth testing and real space flights are analyzed [ru

  8. Explorations on High Dimensional Landscapes: Spin Glasses and Deep Learning

    Science.gov (United States)

    Sagun, Levent

    This thesis deals with understanding the structure of high-dimensional and non-convex energy landscapes. In particular, its focus is on the optimization of two classes of functions: homogeneous polynomials and loss functions that arise in machine learning. In the first part, the notion of complexity of a smooth, real-valued function is studied through its critical points. Existing theoretical results predict that certain random functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This section provides empirical evidence for convergence of gradient descent to local minima whose energies are near the predicted threshold justifying the existing asymptotic theory. Moreover, it is empirically shown that a similar phenomenon may hold for deep learning loss functions. Furthermore, there is a comparative analysis of gradient descent and its stochastic version showing that in high dimensional regimes the latter is a mere speedup. The next study focuses on the halting time of an algorithm at a given stopping condition. Given an algorithm, the normalized fluctuations of the halting time follow a distribution that remains unchanged even when the input data is sampled from a new distribution. Two qualitative classes are observed: a Gumbel-like distribution that appears in Google searches, human decision times, and spin glasses and a Gaussian-like distribution that appears in conjugate gradient method, deep learning with MNIST and random input data. Following the universality phenomenon, the Hessian of the loss functions of deep learning is studied. The spectrum is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. Empirical evidence is presented for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data. Furthermore, an algorithm is proposed such that it would

  9. Do features of public open spaces vary according to neighbourhood socio-economic status?

    Science.gov (United States)

    Crawford, David; Timperio, Anna; Giles-Corti, Billie; Ball, Kylie; Hume, Clare; Roberts, Rebecca; Andrianopoulos, Nick; Salmon, Jo

    2008-12-01

    This study examined the relations between neighbourhood socio-economic status and features of public open spaces (POS) hypothesised to influence children's physical activity. Data were from the first follow-up of the Children Living in Active Neighbourhoods (CLAN) Study, which involved 540 families of 5-6 and 10-12-year-old children in Melbourne, Australia. The Socio-Economic Index for Areas Index (SEIFA) of Relative Socio-economic Advantage/Disadvantage was used to assign a socioeconomic index score to each child's neighbourhood, based on postcode. Participant addresses were geocoded using a Geographic Information System. The Open Space 2002 spatial data set was used to identify all POS within an 800 m radius of each participant's home. The features of each of these POS (1497) were audited. Variability of POS features was examined across quintiles of neighbourhood SEIFA. Compared with POS in lower socioeconomic neighbourhoods, POS in the highest socioeconomic neighbourhoods had more amenities (e.g. picnic tables and drink fountains) and were more likely to have trees that provided shade, a water feature (e.g. pond, creek), walking and cycling paths, lighting, signage regarding dog access and signage restricting other activities. There were no differences across neighbourhoods in the number of playgrounds or the number of recreation facilities (e.g. number of sports catered for on courts and ovals, the presence of other facilities such as athletics tracks, skateboarding facility and swimming pool). This study suggests that POS in high socioeconomic neighbourhoods possess more features that are likely to promote physical activity amongst children.

  10. Characterization of discontinuities in high-dimensional stochastic problems on adaptive sparse grids

    International Nuclear Information System (INIS)

    Jakeman, John D.; Archibald, Richard; Xiu Dongbin

    2011-01-01

    In this paper we present a set of efficient algorithms for detection and identification of discontinuities in high dimensional space. The method is based on extension of polynomial annihilation for discontinuity detection in low dimensions. Compared to the earlier work, the present method poses significant improvements for high dimensional problems. The core of the algorithms relies on adaptive refinement of sparse grids. It is demonstrated that in the commonly encountered cases where a discontinuity resides on a small subset of the dimensions, the present method becomes 'optimal', in the sense that the total number of points required for function evaluations depends linearly on the dimensionality of the space. The details of the algorithms will be presented and various numerical examples are utilized to demonstrate the efficacy of the method.

  11. Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

    Science.gov (United States)

    Karim, Mohammad Ehsanul; Pang, Menglan; Platt, Robert W

    2018-03-01

    The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

  12. A biologically inspired scale-space for illumination invariant feature detection

    International Nuclear Information System (INIS)

    Vonikakis, Vasillios; Chrysostomou, Dimitrios; Kouskouridas, Rigas; Gasteratos, Antonios

    2013-01-01

    This paper presents a new illumination invariant operator, combining the nonlinear characteristics of biological center-surround cells with the classic difference of Gaussians operator. It specifically targets the underexposed image regions, exhibiting increased sensitivity to low contrast, while not affecting performance in the correctly exposed ones. The proposed operator can be used to create a scale-space, which in turn can be a part of a SIFT-based detector module. The main advantage of this illumination invariant scale-space is that, using just one global threshold, keypoints can be detected in both dark and bright image regions. In order to evaluate the degree of illumination invariance that the proposed, as well as other, existing, operators exhibit, a new benchmark dataset is introduced. It features a greater variety of imaging conditions, compared to existing databases, containing real scenes under various degrees and combinations of uniform and non-uniform illumination. Experimental results show that the proposed detector extracts a greater number of features, with a high level of repeatability, compared to other approaches, for both uniform and non-uniform illumination. This, along with its simple implementation, renders the proposed feature detector particularly appropriate for outdoor vision systems, working in environments under uncontrolled illumination conditions. (paper)

  13. Research on Optimal Observation Scale for Damaged Buildings after Earthquake Based on Optimal Feature Space

    Science.gov (United States)

    Chen, J.; Chen, W.; Dou, A.; Li, W.; Sun, Y.

    2018-04-01

    A new information extraction method of damaged buildings rooted in optimal feature space is put forward on the basis of the traditional object-oriented method. In this new method, ESP (estimate of scale parameter) tool is used to optimize the segmentation of image. Then the distance matrix and minimum separation distance of all kinds of surface features are calculated through sample selection to find the optimal feature space, which is finally applied to extract the image of damaged buildings after earthquake. The overall extraction accuracy reaches 83.1 %, the kappa coefficient 0.813. The new information extraction method greatly improves the extraction accuracy and efficiency, compared with the traditional object-oriented method, and owns a good promotional value in the information extraction of damaged buildings. In addition, the new method can be used for the information extraction of different-resolution images of damaged buildings after earthquake, then to seek the optimal observation scale of damaged buildings through accuracy evaluation. It is supposed that the optimal observation scale of damaged buildings is between 1 m and 1.2 m, which provides a reference for future information extraction of damaged buildings.

  14. EEMD Independent Extraction for Mixing Features of Rotating Machinery Reconstructed in Phase Space

    Directory of Open Access Journals (Sweden)

    Zaichao Ma

    2015-04-01

    Full Text Available Empirical Mode Decomposition (EMD, due to its adaptive decomposition property for the non-linear and non-stationary signals, has been widely used in vibration analyses for rotating machinery. However, EMD suffers from mode mixing, which is difficult to extract features independently. Although the improved EMD, well known as the ensemble EMD (EEMD, has been proposed, mode mixing is alleviated only to a certain degree. Moreover, EEMD needs to determine the amplitude of added noise. In this paper, we propose Phase Space Ensemble Empirical Mode Decomposition (PSEEMD integrating Phase Space Reconstruction (PSR and Manifold Learning (ML for modifying EEMD. We also provide the principle and detailed procedure of PSEEMD, and the analyses on a simulation signal and an actual vibration signal derived from a rubbing rotor are performed. The results show that PSEEMD is more efficient and convenient than EEMD in extracting the mixing features from the investigated signal and in optimizing the amplitude of the necessary added noise. Additionally PSEEMD can extract the weak features interfered with a certain amount of noise.

  15. Feature-space-based FMRI analysis using the optimal linear transformation.

    Science.gov (United States)

    Sun, Fengrong; Morris, Drew; Lee, Wayne; Taylor, Margot J; Mills, Travis; Babyn, Paul S

    2010-09-01

    The optimal linear transformation (OLT), an image analysis technique of feature space, was first presented in the field of MRI. This paper proposes a method of extending OLT from MRI to functional MRI (fMRI) to improve the activation-detection performance over conventional approaches of fMRI analysis. In this method, first, ideal hemodynamic response time series for different stimuli were generated by convolving the theoretical hemodynamic response model with the stimulus timing. Second, constructing hypothetical signature vectors for different activity patterns of interest by virtue of the ideal hemodynamic responses, OLT was used to extract features of fMRI data. The resultant feature space had particular geometric clustering properties. It was then classified into different groups, each pertaining to an activity pattern of interest; the applied signature vector for each group was obtained by averaging. Third, using the applied signature vectors, OLT was applied again to generate fMRI composite images with high SNRs for the desired activity patterns. Simulations and a blocked fMRI experiment were employed for the method to be verified and compared with the general linear model (GLM)-based analysis. The simulation studies and the experimental results indicated the superiority of the proposed method over the GLM-based analysis in detecting brain activities.

  16. An Unbiased Distance-based Outlier Detection Approach for High-dimensional Data

    DEFF Research Database (Denmark)

    Nguyen, Hoang Vu; Gopalkrishnan, Vivekanand; Assent, Ira

    2011-01-01

    than a global property. Different from existing approaches, it is not grid-based and dimensionality unbiased. Thus, its performance is impervious to grid resolution as well as the curse of dimensionality. In addition, our approach ranks the outliers, allowing users to select the number of desired...... outliers, thus mitigating the issue of high false alarm rate. Extensive empirical studies on real datasets show that our approach efficiently and effectively detects outliers, even in high-dimensional spaces....

  17. Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features

    Science.gov (United States)

    Bouboulis, Pantelis; Chouvardas, Symeon; Theodoridis, Sergios

    2018-04-01

    We present a novel diffusion scheme for online kernel-based learning over networks. So far, a major drawback of any online learning algorithm, operating in a reproducing kernel Hilbert space (RKHS), is the need for updating a growing number of parameters as time iterations evolve. Besides complexity, this leads to an increased need of communication resources, in a distributed setting. In contrast, the proposed method approximates the solution as a fixed-size vector (of larger dimension than the input space) using Random Fourier Features. This paves the way to use standard linear combine-then-adapt techniques. To the best of our knowledge, this is the first time that a complete protocol for distributed online learning in RKHS is presented. Conditions for asymptotic convergence and boundness of the networkwise regret are also provided. The simulated tests illustrate the performance of the proposed scheme.

  18. Harnessing high-dimensional hyperentanglement through a biphoton frequency comb

    Science.gov (United States)

    Xie, Zhenda; Zhong, Tian; Shrestha, Sajan; Xu, Xinan; Liang, Junlin; Gong, Yan-Xiao; Bienfang, Joshua C.; Restelli, Alessandro; Shapiro, Jeffrey H.; Wong, Franco N. C.; Wei Wong, Chee

    2015-08-01

    Quantum entanglement is a fundamental resource for secure information processing and communications, and hyperentanglement or high-dimensional entanglement has been separately proposed for its high data capacity and error resilience. The continuous-variable nature of the energy-time entanglement makes it an ideal candidate for efficient high-dimensional coding with minimal limitations. Here, we demonstrate the first simultaneous high-dimensional hyperentanglement using a biphoton frequency comb to harness the full potential in both the energy and time domain. Long-postulated Hong-Ou-Mandel quantum revival is exhibited, with up to 19 time-bins and 96.5% visibilities. We further witness the high-dimensional energy-time entanglement through Franson revivals, observed periodically at integer time-bins, with 97.8% visibility. This qudit state is observed to simultaneously violate the generalized Bell inequality by up to 10.95 standard deviations while observing recurrent Clauser-Horne-Shimony-Holt S-parameters up to 2.76. Our biphoton frequency comb provides a platform for photon-efficient quantum communications towards the ultimate channel capacity through energy-time-polarization high-dimensional encoding.

  19. Analysing spatially extended high-dimensional dynamics by recurrence plots

    Energy Technology Data Exchange (ETDEWEB)

    Marwan, Norbert, E-mail: marwan@pik-potsdam.de [Potsdam Institute for Climate Impact Research, 14412 Potsdam (Germany); Kurths, Jürgen [Potsdam Institute for Climate Impact Research, 14412 Potsdam (Germany); Humboldt Universität zu Berlin, Institut für Physik (Germany); Nizhny Novgorod State University, Department of Control Theory, Nizhny Novgorod (Russian Federation); Foerster, Saskia [GFZ German Research Centre for Geosciences, Section 1.4 Remote Sensing, Telegrafenberg, 14473 Potsdam (Germany)

    2015-05-08

    Recurrence plot based measures of complexity are capable tools for characterizing complex dynamics. In this letter we show the potential of selected recurrence plot measures for the investigation of even high-dimensional dynamics. We apply this method on spatially extended chaos, such as derived from the Lorenz96 model and show that the recurrence plot based measures can qualitatively characterize typical dynamical properties such as chaotic or periodic dynamics. Moreover, we demonstrate its power by analysing satellite image time series of vegetation cover with contrasting dynamics as a spatially extended and potentially high-dimensional example from the real world. - Highlights: • We use recurrence plots for analysing partially extended dynamics. • We investigate the high-dimensional chaos of the Lorenz96 model. • The approach distinguishes different spatio-temporal dynamics. • We use the method for studying vegetation cover time series.

  20. Do features of public open spaces vary between urban and rural areas?

    Science.gov (United States)

    Veitch, Jenny; Salmon, Jo; Ball, Kylie; Crawford, David; Timperio, Anna

    2013-02-01

    Parks are an important setting for physical activity and specific park features have been shown to be associated with park visitation and physical activity. Most park-based research has been conducted in urban settings with few studies examining rural parks. This study examined differences in features of parks in urban compared with rural areas. In 2009/10 a tool was developed to audit 433 urban and 195 rural parks located in disadvantaged areas of Victoria, Australia. Features assessed included: access; lighting/safety; aesthetics; amenities; paths; outdoor courts/ovals; informal play spaces; and playgrounds (number, diversity, age appropriateness and safety of play equipment). Rural parks scored higher for aesthetics compared with urban parks (5.08 vs 4.44). Urban parks scored higher for access (4.64 vs 3.89), lighting/safety (2.01 vs 1.76), and diversity of play equipment (7.37 vs 6.24), and were more likely to have paths suitable for walking/cycling (58.8% vs 40.9%) and play equipment for older children (68.2% vs 17.1%). Although the findings cannot be generalized to all urban and rural parks, the results may be used to inform advocacy for park development in rural areas to create parks that are more supportive of physical activity for children and adults. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Diffraction of SH-waves by topographic features in a layered transversely isotropic half-space

    Science.gov (United States)

    Ba, Zhenning; Liang, Jianwen; Zhang, Yanju

    2017-01-01

    The scattering of plane SH-waves by topographic features in a layered transversely isotropic (TI) half-space is investigated by using an indirect boundary element method (IBEM). Firstly, the anti-plane dynamic stiffness matrix of the layered TI half-space is established and the free fields are solved by using the direct stiffness method. Then, Green's functions are derived for uniformly distributed loads acting on an inclined line in a layered TI half-space and the scattered fields are constructed with the deduced Green's functions. Finally, the free fields are added to the scattered ones to obtain the global dynamic responses. The method is verified by comparing results with the published isotropic ones. Both the steady-state and transient dynamic responses are evaluated and discussed. Numerical results in the frequency domain show that surface motions for the TI media can be significantly different from those for the isotropic case, which are strongly dependent on the anisotropy property, incident angle and incident frequency. Results in the time domain show that the material anisotropy has important effects on the maximum duration and maximum amplitudes of the time histories.

  2. High-dimensional model estimation and model selection

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    I will review concepts and algorithms from high-dimensional statistics for linear model estimation and model selection. I will particularly focus on the so-called p>>n setting where the number of variables p is much larger than the number of samples n. I will focus mostly on regularized statistical estimators that produce sparse models. Important examples include the LASSO and its matrix extension, the Graphical LASSO, and more recent non-convex methods such as the TREX. I will show the applicability of these estimators in a diverse range of scientific applications, such as sparse interaction graph recovery and high-dimensional classification and regression problems in genomics.

  3. Statistical Analysis for High-Dimensional Data : The Abel Symposium 2014

    CERN Document Server

    Bühlmann, Peter; Glad, Ingrid; Langaas, Mette; Richardson, Sylvia; Vannucci, Marina

    2016-01-01

    This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on...

  4. Single cell proteomics in biomedicine: High-dimensional data acquisition, visualization, and analysis.

    Science.gov (United States)

    Su, Yapeng; Shi, Qihui; Wei, Wei

    2017-02-01

    New insights on cellular heterogeneity in the last decade provoke the development of a variety of single cell omics tools at a lightning pace. The resultant high-dimensional single cell data generated by these tools require new theoretical approaches and analytical algorithms for effective visualization and interpretation. In this review, we briefly survey the state-of-the-art single cell proteomic tools with a particular focus on data acquisition and quantification, followed by an elaboration of a number of statistical and computational approaches developed to date for dissecting the high-dimensional single cell data. The underlying assumptions, unique features, and limitations of the analytical methods with the designated biological questions they seek to answer will be discussed. Particular attention will be given to those information theoretical approaches that are anchored in a set of first principles of physics and can yield detailed (and often surprising) predictions. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Complete fold annotation of the human proteome using a novel structural feature space.

    Science.gov (United States)

    Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

  6. A hybridized K-means clustering approach for high dimensional ...

    African Journals Online (AJOL)

    International Journal of Engineering, Science and Technology ... Due to incredible growth of high dimensional dataset, conventional data base querying methods are inadequate to extract useful information, so researchers nowadays ... Recently cluster analysis is a popularly used data analysis method in number of areas.

  7. On Robust Information Extraction from High-Dimensional Data

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2014-01-01

    Roč. 9, č. 1 (2014), s. 131-144 ISSN 1452-4864 Grant - others:GA ČR(CZ) GA13-01930S Institutional support: RVO:67985807 Keywords : data mining * high-dimensional data * robust econometrics * outliers * machine learning Subject RIV: IN - Informatics, Computer Science

  8. Inference in High-dimensional Dynamic Panel Data Models

    DEFF Research Database (Denmark)

    Kock, Anders Bredahl; Tang, Haihan

    We establish oracle inequalities for a version of the Lasso in high-dimensional fixed effects dynamic panel data models. The inequalities are valid for the coefficients of the dynamic and exogenous regressors. Separate oracle inequalities are derived for the fixed effects. Next, we show how one can...

  9. Pricing High-Dimensional American Options Using Local Consistency Conditions

    NARCIS (Netherlands)

    Berridge, S.J.; Schumacher, J.M.

    2004-01-01

    We investigate a new method for pricing high-dimensional American options. The method is of finite difference type but is also related to Monte Carlo techniques in that it involves a representative sampling of the underlying variables.An approximating Markov chain is built using this sampling and

  10. Irregular grid methods for pricing high-dimensional American options

    NARCIS (Netherlands)

    Berridge, S.J.

    2004-01-01

    This thesis proposes and studies numerical methods for pricing high-dimensional American options; important examples being basket options, Bermudan swaptions and real options. Four new methods are presented and analysed, both in terms of their application to various test problems, and in terms of

  11. Multi-Scale Singularity Trees: Soft-Linked Scale-Space Hierarchies

    DEFF Research Database (Denmark)

    Somchaipeng, Kerawit; Sporring, Jon; Kreiborg, Sven

    2005-01-01

    We consider images as manifolds embedded in a hybrid of a high dimensional space of coordinates and features. Using the proposed energy functional and mathematical landmarks, images are partitioned into segments. The nesting of image segments occurring at catastrophe points in the scale-space is ...

  12. A Cp-theory problem book special features of function spaces

    CERN Document Server

    Tkachuk, Vladimir V

    2014-01-01

    The books in Vladimir Tkachuk’s A Cp-Theory Problem Book series will be the ‘go to’ texts for basic reference to Cp-theory. This second volume, Special Features of Function Spaces, gives a reasonably complete coverage of Cp-theory, systematically introducing each of the major topics and providing  500 carefully selected problems and exercises with complete solutions. Bonus results and open problems are also given. The text is designed to bring a dedicated reader from basic topological principles to the frontiers of modern research covering a wide variety of topics in Cp-theory and general topology at the professional level. The first volume, Topological and Function Spaces © 2011, provided an introduction from scratch to Cp-theory and general topology, preparing the reader for a professional understanding of Cp-theory in the last section of its main text. This second volume continues from the first, and can be used as a textbook for courses in both Cp-theory and general topology as well as a referenc...

  13. Linear sign in cystic brain lesions ≥5 mm: A suggestive feature of perivascular space.

    Science.gov (United States)

    Sung, Jinkyeong; Jang, Jinhee; Choi, Hyun Seok; Jung, So-Lyung; Ahn, Kook-Jin; Kim, Bum-Soo

    2017-11-01

    To determine the prevalence of a linear sign within enlarged perivascular space (EPVS) and chronic lacunar infarction (CLI) ≥ 5 mm on T2-weighted imaging (T2WI) and time-of-flight (TOF) magnetic resonance angiography (MRA), and to evaluate the diagnostic value of the linear signs for EPVS over CLI. This study included 101 patients with cystic lesions ≥ 5 mm on brain MRI including TOF MRA. After classification of cystic lesions into EPVS or CLI, two readers assessed linear signs on T2WI and TOF MRA. We compared the prevalence and the diagnostic performance of linear signs. Among 46 EPVS and 51 CLI, 84 lesions (86.6%) were in basal ganglia. The prevalence of T2 and TOF linear signs was significantly higher in the EPVS than in the CLI (P linear signs showed high sensitivity (> 80%). TOF linear sign showed significantly higher specificity (100%) and accuracy (92.8% and 90.7%) than T2 linear sign (P linear signs were more frequently observed in EPVS than CLI. They showed high sensitivity in differentiation of them, especially for basal ganglia. TOF sign showed higher specificity and accuracy than T2 sign. • Linear sign is a suggestive feature of EPVS. • Time-of-flight magnetic resonance angiography can reveal the lenticulostriate artery within perivascular spaces. • Linear sign helps differentiation of EPVS and CLI, especially in basal ganglia.

  14. Estimate of the largest Lyapunov characteristic exponent of a high dimensional atmospheric global circulation model: a sensitivity analysis

    International Nuclear Information System (INIS)

    Guerrieri, A.

    2009-01-01

    In this report the largest Lyapunov characteristic exponent of a high dimensional atmospheric global circulation model of intermediate complexity has been estimated numerically. A sensitivity analysis has been carried out by varying the equator-to-pole temperature difference, the space resolution and the value of some parameters employed by the model. Chaotic and non-chaotic regimes of circulation have been found. [it

  15. Hypergraph-based anomaly detection of high-dimensional co-occurrences.

    Science.gov (United States)

    Silva, Jorge; Willett, Rebecca

    2009-03-01

    This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.

  16. Using Simplified Thermal Inertia to Determine the Theoretical Dry Line in Feature Space for Evapotranspiration Retrieval

    Directory of Open Access Journals (Sweden)

    Sujuan Mi

    2015-08-01

    Full Text Available With the development of quantitative remote sensing, regional evapotranspiration (ET modeling based on the feature space has made substantial progress. Among those feature space based evapotranspiration models, accurate determination of the dry/wet lines remains a challenging task. This paper reports the development of a new model, named DDTI (Determination of Dry line by Thermal Inertia, which determines the theoretical dry line based on the relationship between the thermal inertia and the soil moisture. The Simplified Thermal Inertia value estimated in the North China Plain is consistent with the value measured in the laboratory. Three evaluation methods, which are based on the comparison of the locations of the theoretical dry line determined by two models (DDTI model and the heat energy balance model, the comparison of ET results, and the comparison of the evaporative fraction between the estimates from the two models and the in situ measurements, were used to assess the performance of the new model DDTI. The location of the theoretical dry line determined by DDTI is more reasonable than that determined by the heat energy balance model. ET estimated from DDTI has an RMSE (Root Mean Square Error of 56.77 W/m2 and a bias of 27.17 W/m2; while the heat energy balance model estimated ET with an RMSE of 83.36 W/m2 and a bias of −38.42 W/m2. When comparing the coeffcient of determination for the two models with the observations from Yucheng, DDTI demonstrated ET with an R2 of 0.9065; while the heat energy balance model has an R2 of 0.7729. When compared with the in situ measurements of evaporative fraction (EF at Yucheng Experimental Station, the ET model based on DDTI reproduces the pixel scale EF with an RMSE of 0.149, much lower than that based on the heat energy balance model which has an RMSE of 0.220. Also, the EF bias between the DDTI model and the in situ measurements is 0.064, lower than the EF bias of the heat energy balance model

  17. High Dimensional Modulation and MIMO Techniques for Access Networks

    DEFF Research Database (Denmark)

    Binti Othman, Maisara

    Exploration of advanced modulation formats and multiplexing techniques for next generation optical access networks are of interest as promising solutions for delivering multiple services to end-users. This thesis addresses this from two different angles: high dimensionality carrierless...... the capacity per wavelength of the femto-cell network. Bit rate up to 1.59 Gbps with fiber-wireless transmission over 1 m air distance is demonstrated. The results presented in this thesis demonstrate the feasibility of high dimensionality CAP in increasing the number of dimensions and their potentially......) optical access network. 2 X 2 MIMO RoF employing orthogonal frequency division multiplexing (OFDM) with 5.6 GHz RoF signaling over all-vertical cavity surface emitting lasers (VCSEL) WDM passive optical networks (PONs). We have employed polarization division multiplexing (PDM) to further increase...

  18. HSM: Heterogeneous Subspace Mining in High Dimensional Data

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Seidl, Thomas

    2009-01-01

    Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional...... challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes. In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant...... for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines...

  19. Analysis of chaos in high-dimensional wind power system.

    Science.gov (United States)

    Wang, Cong; Zhang, Hongli; Fan, Wenhui; Ma, Ping

    2018-01-01

    A comprehensive analysis on the chaos of a high-dimensional wind power system is performed in this study. A high-dimensional wind power system is more complex than most power systems. An 11-dimensional wind power system proposed by Huang, which has not been analyzed in previous studies, is investigated. When the systems are affected by external disturbances including single parameter and periodic disturbance, or its parameters changed, chaotic dynamics of the wind power system is analyzed and chaotic parameters ranges are obtained. Chaos existence is confirmed by calculation and analysis of all state variables' Lyapunov exponents and the state variable sequence diagram. Theoretical analysis and numerical simulations show that the wind power system chaos will occur when parameter variations and external disturbances change to a certain degree.

  20. HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS.

    Science.gov (United States)

    Fan, Jianqing; Liao, Yuan; Mincheva, Martina

    2011-01-01

    The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.

  1. High-dimensional data in economics and their (robust) analysis

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2017-01-01

    Roč. 12, č. 1 (2017), s. 171-183 ISSN 1452-4864 R&D Projects: GA ČR GA17-07384S Institutional support: RVO:67985556 Keywords : econometrics * high-dimensional data * dimensionality reduction * linear regression * classification analysis * robustness Subject RIV: BA - General Mathematics OBOR OECD: Business and management http://library.utia.cas.cz/separaty/2017/SI/kalina-0474076.pdf

  2. High-dimensional Data in Economics and their (Robust) Analysis

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2017-01-01

    Roč. 12, č. 1 (2017), s. 171-183 ISSN 1452-4864 R&D Projects: GA ČR GA17-07384S Grant - others:GA ČR(CZ) GA13-01930S Institutional support: RVO:67985807 Keywords : econometrics * high-dimensional data * dimensionality reduction * linear regression * classification analysis * robustness Subject RIV: BB - Applied Statistics, Operational Research OBOR OECD: Statistics and probability

  3. Quantifying high dimensional entanglement with two mutually unbiased bases

    Directory of Open Access Journals (Sweden)

    Paul Erker

    2017-07-01

    Full Text Available We derive a framework for quantifying entanglement in multipartite and high dimensional systems using only correlations in two unbiased bases. We furthermore develop such bounds in cases where the second basis is not characterized beyond being unbiased, thus enabling entanglement quantification with minimal assumptions. Furthermore, we show that it is feasible to experimentally implement our method with readily available equipment and even conservative estimates of physical parameters.

  4. High dimensional model representation method for fuzzy structural dynamics

    Science.gov (United States)

    Adhikari, S.; Chowdhury, R.; Friswell, M. I.

    2011-03-01

    Uncertainty propagation in multi-parameter complex structures possess significant computational challenges. This paper investigates the possibility of using the High Dimensional Model Representation (HDMR) approach when uncertain system parameters are modeled using fuzzy variables. In particular, the application of HDMR is proposed for fuzzy finite element analysis of linear dynamical systems. The HDMR expansion is an efficient formulation for high-dimensional mapping in complex systems if the higher order variable correlations are weak, thereby permitting the input-output relationship behavior to be captured by the terms of low-order. The computational effort to determine the expansion functions using the α-cut method scales polynomically with the number of variables rather than exponentially. This logic is based on the fundamental assumption underlying the HDMR representation that only low-order correlations among the input variables are likely to have significant impacts upon the outputs for most high-dimensional complex systems. The proposed method is first illustrated for multi-parameter nonlinear mathematical test functions with fuzzy variables. The method is then integrated with a commercial finite element software (ADINA). Modal analysis of a simplified aircraft wing with fuzzy parameters has been used to illustrate the generality of the proposed approach. In the numerical examples, triangular membership functions have been used and the results have been validated against direct Monte Carlo simulations. It is shown that using the proposed HDMR approach, the number of finite element function calls can be reduced without significantly compromising the accuracy.

  5. High-dimensional quantum cloning and applications to quantum hacking.

    Science.gov (United States)

    Bouchard, Frédéric; Fickler, Robert; Boyd, Robert W; Karimi, Ebrahim

    2017-02-01

    Attempts at cloning a quantum system result in the introduction of imperfections in the state of the copies. This is a consequence of the no-cloning theorem, which is a fundamental law of quantum physics and the backbone of security for quantum communications. Although perfect copies are prohibited, a quantum state may be copied with maximal accuracy via various optimal cloning schemes. Optimal quantum cloning, which lies at the border of the physical limit imposed by the no-signaling theorem and the Heisenberg uncertainty principle, has been experimentally realized for low-dimensional photonic states. However, an increase in the dimensionality of quantum systems is greatly beneficial to quantum computation and communication protocols. Nonetheless, no experimental demonstration of optimal cloning machines has hitherto been shown for high-dimensional quantum systems. We perform optimal cloning of high-dimensional photonic states by means of the symmetrization method. We show the universality of our technique by conducting cloning of numerous arbitrary input states and fully characterize our cloning machine by performing quantum state tomography on cloned photons. In addition, a cloning attack on a Bennett and Brassard (BB84) quantum key distribution protocol is experimentally demonstrated to reveal the robustness of high-dimensional states in quantum cryptography.

  6. Mining potential biomarkers associated with space flight in Caenorhabditis elegans experienced Shenzhou-8 mission with multiple feature selection techniques

    International Nuclear Information System (INIS)

    Zhao, Lei; Gao, Ying; Mi, Dong; Sun, Yeqing

    2016-01-01

    Highlights: • A combined algorithm is proposed to mine biomarkers of spaceflight in C. elegans. • This algorithm makes the feature selection more reliable and robust. • Apply this algorithm to predict 17 positive biomarkers to space environment stress. • The strategy can be used as a general method to select important features. - Abstract: To identify the potential biomarkers associated with space flight, a combined algorithm, which integrates the feature selection techniques, was used to deal with the microarray datasets of Caenorhabditis elegans obtained in the Shenzhou-8 mission. Compared with the ground control treatment, a total of 86 differentially expressed (DE) genes in responses to space synthetic environment or space radiation environment were identified by two filter methods. And then the top 30 ranking genes were selected by the random forest algorithm. Gene Ontology annotation and functional enrichment analyses showed that these genes were mainly associated with metabolism process. Furthermore, clustering analysis showed that 17 genes among these are positive, including 9 for space synthetic environment and 8 for space radiation environment only. These genes could be used as the biomarkers to reflect the space environment stresses. In addition, we also found that microgravity is the main stress factor to change the expression patterns of biomarkers for the short-duration spaceflight.

  7. Mining potential biomarkers associated with space flight in Caenorhabditis elegans experienced Shenzhou-8 mission with multiple feature selection techniques

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, Lei [Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian 116026 (China); Gao, Ying [Center of Medical Physics and Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Shushanhu Road 350, Hefei 230031 (China); Mi, Dong, E-mail: mid@dlmu.edu.cn [Department of Physics, Dalian Maritime University, Dalian 116026 (China); Sun, Yeqing, E-mail: yqsun@dlmu.edu.cn [Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian 116026 (China)

    2016-09-15

    Highlights: • A combined algorithm is proposed to mine biomarkers of spaceflight in C. elegans. • This algorithm makes the feature selection more reliable and robust. • Apply this algorithm to predict 17 positive biomarkers to space environment stress. • The strategy can be used as a general method to select important features. - Abstract: To identify the potential biomarkers associated with space flight, a combined algorithm, which integrates the feature selection techniques, was used to deal with the microarray datasets of Caenorhabditis elegans obtained in the Shenzhou-8 mission. Compared with the ground control treatment, a total of 86 differentially expressed (DE) genes in responses to space synthetic environment or space radiation environment were identified by two filter methods. And then the top 30 ranking genes were selected by the random forest algorithm. Gene Ontology annotation and functional enrichment analyses showed that these genes were mainly associated with metabolism process. Furthermore, clustering analysis showed that 17 genes among these are positive, including 9 for space synthetic environment and 8 for space radiation environment only. These genes could be used as the biomarkers to reflect the space environment stresses. In addition, we also found that microgravity is the main stress factor to change the expression patterns of biomarkers for the short-duration spaceflight.

  8. Covariance Method of the Tunneling Radiation from High Dimensional Rotating Black Holes

    Science.gov (United States)

    Li, Hui-Ling; Han, Yi-Wen; Chen, Shuai-Ru; Ding, Cong

    2018-04-01

    In this paper, Angheben-Nadalini-Vanzo-Zerbini (ANVZ) covariance method is used to study the tunneling radiation from the Kerr-Gödel black hole and Myers-Perry black hole with two independent angular momentum. By solving the Hamilton-Jacobi equation and separating the variables, the radial motion equation of a tunneling particle is obtained. Using near horizon approximation and the distance of the proper pure space, we calculate the tunneling rate and the temperature of Hawking radiation. Thus, the method of ANVZ covariance is extended to the research of high dimensional black hole tunneling radiation.

  9. High-dimensional quantum key distribution based on multicore fiber using silicon photonic integrated circuits

    DEFF Research Database (Denmark)

    Ding, Yunhong; Bacco, Davide; Dalgaard, Kjeld

    2017-01-01

    is intrinsically limited to 1 bit/photon. Here we propose and experimentally demonstrate, for the first time, a high-dimensional quantum key distribution protocol based on space division multiplexing in multicore fiber using silicon photonic integrated lightwave circuits. We successfully realized three mutually......-dimensional quantum states, and enables breaking the information efficiency limit of traditional quantum key distribution protocols. In addition, the silicon photonic circuits used in our work integrate variable optical attenuators, highly efficient multicore fiber couplers, and Mach-Zehnder interferometers, enabling...

  10. HBC-Evo: predicting human breast cancer by exploiting amino acid sequence-based feature spaces and evolutionary ensemble system.

    Science.gov (United States)

    Majid, Abdul; Ali, Safdar

    2015-01-01

    We developed genetic programming (GP)-based evolutionary ensemble system for the early diagnosis, prognosis and prediction of human breast cancer. This system has effectively exploited the diversity in feature and decision spaces. First, individual learners are trained in different feature spaces using physicochemical properties of protein amino acids. Their predictions are then stacked to develop the best solution during GP evolution process. Finally, results for HBC-Evo system are obtained with optimal threshold, which is computed using particle swarm optimization. Our novel approach has demonstrated promising results compared to state of the art approaches.

  11. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    International Nuclear Information System (INIS)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-01-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification. (paper)

  12. Using High-Dimensional Image Models to Perform Highly Undetectable Steganography

    Science.gov (United States)

    Pevný, Tomáš; Filler, Tomáš; Bas, Patrick

    This paper presents a complete methodology for designing practical and highly-undetectable stegosystems for real digital media. The main design principle is to minimize a suitably-defined distortion by means of efficient coding algorithm. The distortion is defined as a weighted difference of extended state-of-the-art feature vectors already used in steganalysis. This allows us to "preserve" the model used by steganalyst and thus be undetectable even for large payloads. This framework can be efficiently implemented even when the dimensionality of the feature set used by the embedder is larger than 107. The high dimensional model is necessary to avoid known security weaknesses. Although high-dimensional models might be problem in steganalysis, we explain, why they are acceptable in steganography. As an example, we introduce HUGO, a new embedding algorithm for spatial-domain digital images and we contrast its performance with LSB matching. On the BOWS2 image database and in contrast with LSB matching, HUGO allows the embedder to hide 7× longer message with the same level of security level.

  13. Vascular lesions of the lumbar epidural space: magnetic resonance imaging features of epidural cavernous hemangioma and epidural hematoma

    Directory of Open Access Journals (Sweden)

    Basile Júnior Roberto

    1999-01-01

    Full Text Available The authors report the magnetic resonance imaging diagnostic features in two cases with respectively lumbar epidural hematoma and cavernous hemangioma of the lumbar epidural space. Enhanced MRI T1-weighted scans show a hyperintense signal rim surrounding the vascular lesion. Non-enhanced T2-weighted scans showed hyperintense signal.

  14. Recognizing the Face of Johnny, Suzy, and Me: Insensitivity to the Spacing Among Features at 4 Years of Age

    Science.gov (United States)

    Mondloch, Catherine J.; Leis, Anishka; Maurer, Daphne

    2006-01-01

    Four-year-olds were tested for their ability to use differences in the spacing among features to recognize familiar faces. They were given a storybook depicting multiple views of 2 children. They returned to the laboratory 2 weeks later and used a "magic wand" to play a computer game that tested their ability to recognize the familiarized faces…

  15. Hawking radiation of a high-dimensional rotating black hole

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, Ren; Zhang, Lichun; Li, Huaifan; Wu, Yueqin [Shanxi Datong University, Institute of Theoretical Physics, Department of Physics, Datong (China)

    2010-01-15

    We extend the classical Damour-Ruffini method and discuss Hawking radiation spectrum of high-dimensional rotating black hole using Tortoise coordinate transformation defined by taking the reaction of the radiation to the spacetime into consideration. Under the condition that the energy and angular momentum are conservative, taking self-gravitation action into account, we derive Hawking radiation spectrums which satisfy unitary principle in quantum mechanics. It is shown that the process that the black hole radiates particles with energy {omega} is a continuous tunneling process. We provide a theoretical basis for further studying the physical mechanism of black-hole radiation. (orig.)

  16. On spectral distribution of high dimensional covariation matrices

    DEFF Research Database (Denmark)

    Heinrich, Claudio; Podolskij, Mark

    In this paper we present the asymptotic theory for spectral distributions of high dimensional covariation matrices of Brownian diffusions. More specifically, we consider N-dimensional Itô integrals with time varying matrix-valued integrands. We observe n equidistant high frequency data points...... of the underlying Brownian diffusion and we assume that N/n -> c in (0,oo). We show that under a certain mixed spectral moment condition the spectral distribution of the empirical covariation matrix converges in distribution almost surely. Our proof relies on method of moments and applications of graph theory....

  17. High-dimensional quantum channel estimation using classical light

    CSIR Research Space (South Africa)

    Mabena, Chemist M

    2017-11-01

    Full Text Available stream_source_info Mabena_20007_2017.pdf.txt stream_content_type text/plain stream_size 960 Content-Encoding UTF-8 stream_name Mabena_20007_2017.pdf.txt Content-Type text/plain; charset=UTF-8 PHYSICAL REVIEW A 96, 053860... (2017) High-dimensional quantum channel estimation using classical light Chemist M. Mabena CSIR National Laser Centre, P.O. Box 395, Pretoria 0001, South Africa and School of Physics, University of the Witwatersrand, Johannesburg 2000, South...

  18. Ghosts in high dimensional non-linear dynamical systems: The example of the hypercycle

    International Nuclear Information System (INIS)

    Sardanyes, Josep

    2009-01-01

    Ghost-induced delayed transitions are analyzed in high dimensional non-linear dynamical systems by means of the hypercycle model. The hypercycle is a network of catalytically-coupled self-replicating RNA-like macromolecules, and has been suggested to be involved in the transition from non-living to living matter in the context of earlier prebiotic evolution. It is demonstrated that, in the vicinity of the saddle-node bifurcation for symmetric hypercycles, the persistence time before extinction, T ε , tends to infinity as n→∞ (being n the number of units of the hypercycle), thus suggesting that the increase in the number of hypercycle units involves a longer resilient time before extinction because of the ghost. Furthermore, by means of numerical analysis the dynamics of three large hypercycle networks is also studied, focusing in their extinction dynamics associated to the ghosts. Such networks allow to explore the properties of the ghosts living in high dimensional phase space with n = 5, n = 10 and n = 15 dimensions. These hypercyclic networks, in agreement with other works, are shown to exhibit self-maintained oscillations governed by stable limit cycles. The bifurcation scenarios for these hypercycles are analyzed, as well as the effect of the phase space dimensionality in the delayed transition phenomena and in the scaling properties of the ghosts near bifurcation threshold

  19. Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.

    Science.gov (United States)

    Geraci, Joseph; Dharsee, Moyez; Nuin, Paulo; Haslehurst, Alexandria; Koti, Madhuri; Feilotter, Harriet E; Evans, Ken

    2014-03-01

    We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer

  20. Multiple-output support vector machine regression with feature selection for arousal/valence space emotion assessment.

    Science.gov (United States)

    Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A

    2014-01-01

    Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).

  1. Multiple channel space lattice focusing and features of its use in applied RF linac

    International Nuclear Information System (INIS)

    Kushin, V.; Plotnikov, S.; Zarubin, A.; Bondarev, B.; Durkin, A.

    2000-01-01

    Nowadays the use of multiple channel accelerator systems is well known with some hundred channels helps us to increase total beam intensity proportional to the number of channels while the divergence of the total beam is roughly equal to the divergence of single channel. The accelerator structure for multiple beam linac must provide both transversal and longitudinal stability for every small beam taking into account Coulomb interactions of all the micro beams. The most convenient for accelerator structures with 100 and more beams are the systems that use RF focusing such as RFQ, APF and DTL with rectangular profiles. The common disadvantage of all those systems is connected with decreasing of focusing forces of RF field with particle velocity increase. Our analysis shows that the disadvantage may be overcome in structures with rectangular profiles. For this purpose some additional thin (3-5 mm) focusing electrodes called space lattices (SL) must be arranged within accelerator gaps. The distance between these electrodes is chosen roughly equal to the thickness of additional electrodes. The number of the electrodes must be increased with length of accelerator gaps and may be equal n=1,2...6 and even more. The arrangement of n thin electrodes in accelerator gaps helps us to reach qualitative change of accelerator structure parameters. Firstly, they make n times amplification of the sign-alternate component of RF focusing field without appreciable influence to phasing action of accelerating field. Secondly, introducing of additional electrodes that divide the gap on n small accelerator gaps provides beams shielding from each other within the region of beam acceleration in RF fields between drift tubes. The analysis shows that if n=4-6, it is possible to reach transversal stability of all particles independently of their input phases in RF field. On the other hand, the analysis shows that adiabatic change of synchronous phase at the input stage of acceleration helps us

  2. Construction of high-dimensional universal quantum logic gates using a Λ system coupled with a whispering-gallery-mode microresonator.

    Science.gov (United States)

    He, Ling Yan; Wang, Tie-Jun; Wang, Chuan

    2016-07-11

    High-dimensional quantum system provides a higher capacity of quantum channel, which exhibits potential applications in quantum information processing. However, high-dimensional universal quantum logic gates is difficult to achieve directly with only high-dimensional interaction between two quantum systems and requires a large number of two-dimensional gates to build even a small high-dimensional quantum circuits. In this paper, we propose a scheme to implement a general controlled-flip (CF) gate where the high-dimensional single photon serve as the target qudit and stationary qubits work as the control logic qudit, by employing a three-level Λ-type system coupled with a whispering-gallery-mode microresonator. In our scheme, the required number of interaction times between the photon and solid state system reduce greatly compared with the traditional method which decomposes the high-dimensional Hilbert space into 2-dimensional quantum space, and it is on a shorter temporal scale for the experimental realization. Moreover, we discuss the performance and feasibility of our hybrid CF gate, concluding that it can be easily extended to a 2n-dimensional case and it is feasible with current technology.

  3. Research of features and structure of electoral space of Ukraine in 2014 with the use of synthetic approach

    Directory of Open Access Journals (Sweden)

    M. M. Shelemba

    2015-02-01

    Full Text Available The article is aimed at the ground of expediency of the use of synthetic authorial model for research of features and structure of electoral space of Ukraine in 2014 year. Methodological principles of the use of synthetic model are expounded with the use of quality and quantitative methods researches of electoral space, among that methods of factor and cross­correlation analysis. A synthetic model (approach that is built on the basis of the use of the best scientific approaches takes into account features and progress of electoral space of Ukraine trends. The analysis of features and structure of electoral space of Ukraine is conducted in 2014 with the use of an offer model. The application author synthetic model allows the study of the use of association factor and correlation analysis to justify support to political parties during election campaigns, respectively, depending on the factors and the most important correlates. It was found that electoral choice depends on the actions of those factors in the highest degree the expectations of the region. This article has shown that the use of Ukraine at this stage of the investigated during election campaigns as the most significant social correlates of «Human Development Index» is reasonable and one that makes it possible to obtain reliable results. It is proved that a high level of correlation holds at a high level of support the party and, consequently, high sense of social correlates all variants of expert research.

  4. Class prediction for high-dimensional class-imbalanced data

    Directory of Open Access Journals (Sweden)

    Lusa Lara

    2010-10-01

    Full Text Available Abstract Background The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance. Results Our results show that the evaluated classifiers are highly sensitive to class imbalance and that variable selection introduces an additional bias towards classification into the majority class. Most new samples are assigned to the majority class from the training set, unless the difference between the classes is very large. As a consequence, the class-specific predictive accuracies differ considerably. When the class imbalance is not too severe, down-sizing and asymmetric bagging embedding variable selection work well, while over-sampling does not. Variable normalization can further worsen the performance of the classifiers. Conclusions Our results show that matching the prevalence of the classes in training and test set does not guarantee good performance of classifiers and that the problems related to classification with class

  5. High-dimensional change-point estimation: Combining filtering with convex optimization

    OpenAIRE

    Soh, Yong Sheng; Chandrasekaran, Venkat

    2017-01-01

    We consider change-point estimation in a sequence of high-dimensional signals given noisy observations. Classical approaches to this problem such as the filtered derivative method are useful for sequences of scalar-valued signals, but they have undesirable scaling behavior in the high-dimensional setting. However, many high-dimensional signals encountered in practice frequently possess latent low-dimensional structure. Motivated by this observation, we propose a technique for high-dimensional...

  6. An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data.

    Science.gov (United States)

    Yu, Hualong; Ni, Jun

    2014-01-01

    Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.

  7. Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data.

    Science.gov (United States)

    Zhang, Bo; Chen, Zhen; Albert, Paul S

    2012-01-01

    High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.

  8. Applying recursive numerical integration techniques for solving high dimensional integrals

    International Nuclear Information System (INIS)

    Ammon, Andreas; Genz, Alan; Hartung, Tobias; Jansen, Karl; Volmer, Julia; Leoevey, Hernan

    2016-11-01

    The error scaling for Markov-Chain Monte Carlo techniques (MCMC) with N samples behaves like 1/√(N). This scaling makes it often very time intensive to reduce the error of computed observables, in particular for applications in lattice QCD. It is therefore highly desirable to have alternative methods at hand which show an improved error scaling. One candidate for such an alternative integration technique is the method of recursive numerical integration (RNI). The basic idea of this method is to use an efficient low-dimensional quadrature rule (usually of Gaussian type) and apply it iteratively to integrate over high-dimensional observables and Boltzmann weights. We present the application of such an algorithm to the topological rotor and the anharmonic oscillator and compare the error scaling to MCMC results. In particular, we demonstrate that the RNI technique shows an error scaling in the number of integration points m that is at least exponential.

  9. Network Reconstruction From High-Dimensional Ordinary Differential Equations.

    Science.gov (United States)

    Chen, Shizhe; Shojaie, Ali; Witten, Daniela M

    2017-01-01

    We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.

  10. Applying recursive numerical integration techniques for solving high dimensional integrals

    Energy Technology Data Exchange (ETDEWEB)

    Ammon, Andreas [IVU Traffic Technologies AG, Berlin (Germany); Genz, Alan [Washington State Univ., Pullman, WA (United States). Dept. of Mathematics; Hartung, Tobias [King' s College, London (United Kingdom). Dept. of Mathematics; Jansen, Karl; Volmer, Julia [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany). John von Neumann-Inst. fuer Computing NIC; Leoevey, Hernan [Humboldt Univ. Berlin (Germany). Inst. fuer Mathematik

    2016-11-15

    The error scaling for Markov-Chain Monte Carlo techniques (MCMC) with N samples behaves like 1/√(N). This scaling makes it often very time intensive to reduce the error of computed observables, in particular for applications in lattice QCD. It is therefore highly desirable to have alternative methods at hand which show an improved error scaling. One candidate for such an alternative integration technique is the method of recursive numerical integration (RNI). The basic idea of this method is to use an efficient low-dimensional quadrature rule (usually of Gaussian type) and apply it iteratively to integrate over high-dimensional observables and Boltzmann weights. We present the application of such an algorithm to the topological rotor and the anharmonic oscillator and compare the error scaling to MCMC results. In particular, we demonstrate that the RNI technique shows an error scaling in the number of integration points m that is at least exponential.

  11. Reduced order surrogate modelling (ROSM) of high dimensional deterministic simulations

    Science.gov (United States)

    Mitry, Mina

    Often, computationally expensive engineering simulations can prohibit the engineering design process. As a result, designers may turn to a less computationally demanding approximate, or surrogate, model to facilitate their design process. However, owing to the the curse of dimensionality, classical surrogate models become too computationally expensive for high dimensional data. To address this limitation of classical methods, we develop linear and non-linear Reduced Order Surrogate Modelling (ROSM) techniques. Two algorithms are presented, which are based on a combination of linear/kernel principal component analysis and radial basis functions. These algorithms are applied to subsonic and transonic aerodynamic data, as well as a model for a chemical spill in a channel. The results of this thesis show that ROSM can provide a significant computational benefit over classical surrogate modelling, sometimes at the expense of a minor loss in accuracy.

  12. Asymptotics of empirical eigenstructure for high dimensional spiked covariance.

    Science.gov (United States)

    Wang, Weichen; Fan, Jianqing

    2017-06-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.

  13. Functionality of system components: Conservation of protein function in protein feature space

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Ussery, David; Brunak, Søren

    2003-01-01

    well on organisms other than the one on which it was trained. We evaluate the performance of such a method, ProtFun, which relies on protein features as its sole input, and show that the method gives similar performance for most eukaryotes and performs much better than anticipated on archaea......Many protein features useful for prediction of protein function can be predicted from sequence, including posttranslational modifications, subcellular localization, and physical/chemical properties. We show here that such protein features are more conserved among orthologs than paralogs, indicating...... they are crucial for protein function and thus subject to selective pressure. This means that a function prediction method based on sequence-derived features may be able to discriminate between proteins with different function even when they have highly similar structure. Also, such a method is likely to perform...

  14. Probing features in inflaton potential and reionization history with future CMB space observations

    Science.gov (United States)

    Hazra, Dhiraj Kumar; Paoletti, Daniela; Ballardini, Mario; Finelli, Fabio; Shafieloo, Arman; Smoot, George F.; Starobinsky, Alexei A.

    2018-02-01

    We consider the prospects of probing features in the primordial power spectrum with future Cosmic Microwave Background (CMB) polarization measurements. In the scope of the inflationary scenario, such features in the spectrum can be produced by local non-smooth pieces in an inflaton potential (smooth and quasi-flat in general) which in turn may originate from fast phase transitions during inflation in other quantum fields interacting with the inflaton. They can fit some outliers in the CMB temperature power spectrum which are unaddressed within the standard inflationary ΛCDM model. We consider Wiggly Whipped Inflation (WWI) as a theoretical framework leading to improvements in the fit to the Planck 2015 temperature and polarization data in comparison with the standard inflationary models, although not at a statistically significant level. We show that some type of features in the potential within the WWI models, leading to oscillations in the primordial power spectrum that extend to intermediate and small scales can be constrained with high confidence (at 3σ or higher confidence level) by an instrument as the Cosmic ORigins Explorer (CORE). In order to investigate the possible confusion between inflationary features and footprints from the reionization era, we consider an extended reionization history with monotonic increase of free electrons with decrease in redshift. We discuss the present constraints on this model of extended reionization and future predictions with CORE. We also project, to what extent, this extended reionization can create confusion in identifying inflationary features in the data.

  15. Feature Space Dimensionality Reduction for Real-Time Vision-Based Food Inspection

    Directory of Open Access Journals (Sweden)

    Mai Moussa CHETIMA

    2009-03-01

    Full Text Available Machine vision solutions are becoming a standard for quality inspection in several manufacturing industries. In the processed-food industry where the appearance attributes of the product are essential to customer’s satisfaction, visual inspection can be reliably achieved with machine vision. But such systems often involve the extraction of a larger number of features than those actually needed to ensure proper quality control, making the process less efficient and difficult to tune. This work experiments with several feature selection techniques in order to reduce the number of attributes analyzed by a real-time vision-based food inspection system. Identifying and removing as much irrelevant and redundant information as possible reduces the dimensionality of the data and allows classification algorithms to operate faster. In some cases, accuracy on classification can even be improved. Filter-based and wrapper-based feature selectors are experimentally evaluated on different bakery products to identify the best performing approaches.

  16. SOCIAL DISTANCES AS A FEATURE OF THE CONTEMPORARY RUSSIAN SOCIAL SPACE

    Directory of Open Access Journals (Sweden)

    Л А Беляева

    2018-12-01

    Full Text Available Social space is a theoretical construct that allows to consider many key problems of social development including the society’s consolidation. The author defines social space as a set of social statuses and distances. Their objective characteristics are interrelated with subjective indicators identified through the opinions of individuals. The balance of statuses and distances in society and the acceptability of this structure for the majority of population ensure the stability of society and effective social control. If this balance is disturbed, social tensions arise and threaten the stability and consolidation of society. Thus, the ideas of the theories of social space possess a considerable heuristic potential for revealing urgent problems of social development such as solidarity, social stratification and mobility, social networks and their interaction, connections of local communities within and with the world, interaction of structured social relations and individual and collective practices, genesis of social space as a result of social production represented by both things and relationships, etc. According to the theory of P. Bourdieu, the author con-siders social space as a structure of social statuses based on the set of different types of capital: economic, cultural, social, and symbolic. The author uses statistical data and results of the monitoring survey conducted on the all-Russian sample. The article proposes some tested empirical indicators that proved the increase of social distances in Russia due to the redistribution of economic capital and, as a consequence, of cultural and social capitals. Thus, the social space of Russia cannot be considered stable. To ensure its greater stability we need a set of measures to reduce social distances: re-industrialization to create high-tech jobs, development of digital economy, and improvement of the mass secondary and higher education system - these measures can create a basis for the

  17. FEATURES OF PSYCHOLOGICAL SPACE SOVEREIGNTY MAINTAINED BY PEOPLE WITH DIFFERENT ATTITUDE TO SOLITUDE

    Directory of Open Access Journals (Sweden)

    Nadezhda Alekseevna Garipova

    2017-06-01

    Practical implications. The results can be useful for developing psychocorrection sessions and trainings. The data can be helpful for specialists of Family Psychological Support centers and for instructors of “Ecological Psychology”, “Family Relations Psychology” disciplines. The study carried out is likely to be highly educational since many respondents participating in the survey admitted that they had never considered personal boundaries violation to be the reason for marital conflicts. They also lacked information concerning psychological space, how to regulate personal space boundaries and how to respond to other family members behavior in an adequate manner.

  18. Study on the construction of multi-dimensional Remote Sensing feature space for hydrological drought

    International Nuclear Information System (INIS)

    Xiang, Daxiang; Tan, Debao; Wen, Xiongfei; Shen, Shaohong; Li, Zhe; Cui, Yuanlai

    2014-01-01

    Hydrological drought refers to an abnormal water shortage caused by precipitation and surface water shortages or a groundwater imbalance. Hydrological drought is reflected in a drop of surface water, decrease of vegetation productivity, increase of temperature difference between day and night and so on. Remote sensing permits the observation of surface water, vegetation, temperature and other information from a macro perspective. This paper analyzes the correlation relationship and differentiation of both remote sensing and surface measured indicators, after the selection and extraction a series of representative remote sensing characteristic parameters according to the spectral characterization of surface features in remote sensing imagery, such as vegetation index, surface temperature and surface water from HJ-1A/B CCD/IRS data. Finally, multi-dimensional remote sensing features such as hydrological drought are built on a intelligent collaborative model. Further, for the Dong-ting lake area, two drought events are analyzed for verification of multi-dimensional features using remote sensing data with different phases and field observation data. The experiments results proved that multi-dimensional features are a good method for hydrological drought

  19. Fault-tolerant feature-based estimation of space debris rotational motion during active removal missions

    Science.gov (United States)

    Biondi, Gabriele; Mauro, Stefano; Pastorelli, Stefano; Sorli, Massimo

    2018-05-01

    One of the key functionalities required by an Active Debris Removal mission is the assessment of the target kinematics and inertial properties. Passive sensors, such as stereo cameras, are often included in the onboard instrumentation of a chaser spacecraft for capturing sequential photographs and for tracking features of the target surface. A plenty of methods, based on Kalman filtering, are available for the estimation of the target's state from feature positions; however, to guarantee the filter convergence, they typically require continuity of measurements and the capability of tracking a fixed set of pre-defined features of the object. These requirements clash with the actual tracking conditions: failures in feature detection often occur and the assumption of having some a-priori knowledge about the shape of the target could be restrictive in certain cases. The aim of the presented work is to propose a fault-tolerant alternative method for estimating the angular velocity and the relative magnitudes of the principal moments of inertia of the target. Raw data regarding the positions of the tracked features are processed to evaluate corrupted values of a 3-dimentional parameter which entirely describes the finite screw motion of the debris and which primarily is invariant on the particular set of considered features of the object. Missing values of the parameter are completely restored exploiting the typical periodicity of the rotational motion of an uncontrolled satellite: compressed sensing techniques, typically adopted for recovering images or for prognostic applications, are herein used in a completely original fashion for retrieving a kinematic signal that appears sparse in the frequency domain. Due to its invariance about the features, no assumptions are needed about the target's shape and continuity of the tracking. The obtained signal is useful for the indirect evaluation of an attitude signal that feeds an unscented Kalman filter for the estimation of

  20. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

    Science.gov (United States)

    Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel

    2011-05-09

    Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM

  1. High-dimensional atom localization via spontaneously generated coherence in a microwave-driven atomic system.

    Science.gov (United States)

    Wang, Zhiping; Chen, Jinyu; Yu, Benli

    2017-02-20

    We investigate the two-dimensional (2D) and three-dimensional (3D) atom localization behaviors via spontaneously generated coherence in a microwave-driven four-level atomic system. Owing to the space-dependent atom-field interaction, it is found that the detecting probability and precision of 2D and 3D atom localization behaviors can be significantly improved via adjusting the system parameters, the phase, amplitude, and initial population distribution. Interestingly, the atom can be localized in volumes that are substantially smaller than a cubic optical wavelength. Our scheme opens a promising way to achieve high-precision and high-efficiency atom localization, which provides some potential applications in high-dimensional atom nanolithography.

  2. Semi-Supervised Clustering for High-Dimensional and Sparse Features

    Science.gov (United States)

    Yan, Su

    2010-01-01

    Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…

  3. Non-retinotopic feature processing in the absence of retinotopic spatial layout and the construction of perceptual space from motion.

    Science.gov (United States)

    Ağaoğlu, Mehmet N; Herzog, Michael H; Oğmen, Haluk

    2012-10-15

    The spatial representation of a visual scene in the early visual system is well known. The optics of the eye map the three-dimensional environment onto two-dimensional images on the retina. These retinotopic representations are preserved in the early visual system. Retinotopic representations and processing are among the most prevalent concepts in visual neuroscience. However, it has long been known that a retinotopic representation of the stimulus is neither sufficient nor necessary for perception. Saccadic Stimulus Presentation Paradigm and the Ternus-Pikler displays have been used to investigate non-retinotopic processes with and without eye movements, respectively. However, neither of these paradigms eliminates the retinotopic representation of the spatial layout of the stimulus. Here, we investigated how stimulus features are processed in the absence of a retinotopic layout and in the presence of retinotopic conflict. We used anorthoscopic viewing (slit viewing) and pitted a retinotopic feature-processing hypothesis against a non-retinotopic feature-processing hypothesis. Our results support the predictions of the non-retinotopic feature-processing hypothesis and demonstrate the ability of the visual system to operate non-retinotopically at a fine feature processing level in the absence of a retinotopic spatial layout. Our results suggest that perceptual space is actively constructed from the perceptual dimension of motion. The implications of these findings for normal ecological viewing conditions are discussed. 2012 Elsevier Ltd. All rights reserved

  4. A sparse grid based method for generative dimensionality reduction of high-dimensional data

    Science.gov (United States)

    Bohn, Bastian; Garcke, Jochen; Griebel, Michael

    2016-03-01

    Generative dimensionality reduction methods play an important role in machine learning applications because they construct an explicit mapping from a low-dimensional space to the high-dimensional data space. We discuss a general framework to describe generative dimensionality reduction methods, where the main focus lies on a regularized principal manifold learning variant. Since most generative dimensionality reduction algorithms exploit the representer theorem for reproducing kernel Hilbert spaces, their computational costs grow at least quadratically in the number n of data. Instead, we introduce a grid-based discretization approach which automatically scales just linearly in n. To circumvent the curse of dimensionality of full tensor product grids, we use the concept of sparse grids. Furthermore, in real-world applications, some embedding directions are usually more important than others and it is reasonable to refine the underlying discretization space only in these directions. To this end, we employ a dimension-adaptive algorithm which is based on the ANOVA (analysis of variance) decomposition of a function. In particular, the reconstruction error is used to measure the quality of an embedding. As an application, the study of large simulation data from an engineering application in the automotive industry (car crash simulation) is performed.

  5. Progress in high-dimensional percolation and random graphs

    CERN Document Server

    Heydenreich, Markus

    2017-01-01

    This text presents an engaging exposition of the active field of high-dimensional percolation that will likely provide an impetus for future work. With over 90 exercises designed to enhance the reader’s understanding of the material, as well as many open problems, the book is aimed at graduate students and researchers who wish to enter the world of this rich topic.  The text may also be useful in advanced courses and seminars, as well as for reference and individual study. Part I, consisting of 3 chapters, presents a general introduction to percolation, stating the main results, defining the central objects, and proving its main properties. No prior knowledge of percolation is assumed. Part II, consisting of Chapters 4–9, discusses mean-field critical behavior by describing the two main techniques used, namely, differential inequalities and the lace expansion. In Parts I and II, all results are proved, making this the first self-contained text discussing high-dimensiona l percolation.  Part III, consist...

  6. High-dimensional quantum cryptography with twisted light

    International Nuclear Information System (INIS)

    Mirhosseini, Mohammad; Magaña-Loaiza, Omar S; O’Sullivan, Malcolm N; Rodenburg, Brandon; Malik, Mehul; Boyd, Robert W; Lavery, Martin P J; Padgett, Miles J; Gauthier, Daniel J

    2015-01-01

    Quantum key distribution (QKD) systems often rely on polarization of light for encoding, thus limiting the amount of information that can be sent per photon and placing tight bounds on the error rates that such a system can tolerate. Here we describe a proof-of-principle experiment that indicates the feasibility of high-dimensional QKD based on the transverse structure of the light field allowing for the transfer of more than 1 bit per photon. Our implementation uses the orbital angular momentum (OAM) of photons and the corresponding mutually unbiased basis of angular position (ANG). Our experiment uses a digital micro-mirror device for the rapid generation of OAM and ANG modes at 4 kHz, and a mode sorter capable of sorting single photons based on their OAM and ANG content with a separation efficiency of 93%. Through the use of a seven-dimensional alphabet encoded in the OAM and ANG bases, we achieve a channel capacity of 2.05 bits per sifted photon. Our experiment demonstrates that, in addition to having an increased information capacity, multilevel QKD systems based on spatial-mode encoding can be more resilient against intercept-resend eavesdropping attacks. (paper)

  7. Inference for High-dimensional Differential Correlation Matrices.

    Science.gov (United States)

    Cai, T Tony; Zhang, Anru

    2016-01-01

    Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.

  8. Bayesian Subset Modeling for High-Dimensional Generalized Linear Models

    KAUST Repository

    Liang, Faming

    2013-06-01

    This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.

  9. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks.

    Science.gov (United States)

    Vlachas, Pantelis R; Byeon, Wonmin; Wan, Zhong Y; Sapsis, Themistoklis P; Koumoutsakos, Petros

    2018-05-01

    We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model. The LSTM networks outperform the GPs in short-term forecasting accuracy in all applications considered. A hybrid architecture, extending the LSTM with a mean stochastic model (MSM-LSTM), is proposed to ensure convergence to the invariant measure. This novel hybrid method is fully data-driven and extends the forecasting capabilities of LSTM networks.

  10. Features of Virchow-Robin spaces in newly diagnosed multiple sclerosis patients

    Energy Technology Data Exchange (ETDEWEB)

    Etemadifar, Masoud [Department of Clinical and Biological Sciences, Division of Neurology, San Luigi Gonzaga School of Medicine, Orbassano (Torino), Turin (Italy); Department of Neurology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Isfahan Research Committee of Multiple Sclerosis (IRCOMS), Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Hekmatnia, Ali; Tayari, Nazila [Department of Radiology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Kazemi, Mojtaba [Department of Neurology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Ghazavi, Amirhossein [Department of Radiology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Akbari, Mojtaba [Department of Epidemiology and Statistics, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Maghzi, Amir-Hadi, E-mail: maghzi@edc.mui.ac.ir [Isfahan Research Committee of Multiple Sclerosis (IRCOMS), Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Neuroimmunology Unit, Centre for Neuroscience and Trauma, Blizard Institute of Cell and Molecular Science, Barts and the London School of Medicine and Dentistry, London (United Kingdom); Isfahan Neurosciences Research Center, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of)

    2011-11-15

    Background: Virchow-Robin spaces (VRSs) are perivascular pia-lined extensions of the subarachnoid space around the arteries and veins as they enter the brain parenchyma. These spaces are responsible for inflammatory processes within the brain. Objectives: This study was designed to shed more light on the location, size and shape of VRSs on 3 mm slice thickness, 1.5 Tesla MRI scans of newly diagnosed MS patients in Isfahan, Iran and compare the results with healthy age- and sex-matched controls. Methods: We evaluated MRI scans of 73 MS patients obtained within 3 months of MS onset and compared them with MRI scans from 73 age- and sex-matched healthy volunteers. Three mm section proton density, T2W and FLAIR MR images were obtained for all subjects. The location, size and shape of VRSs were compared between the two groups. Results: The total number of VRSs was significantly more in the MS group (p < 0.001). The distribution of VRSs were significantly more located in the high convexity areas in the MS group (p < 0.001), while there was no significant differences in other regions. The round shaped VRSs were significantly more detected on MRI scans of MS patients, and curvilinear shapes were significantly more frequently observed in healthy volunteers, however there were no significant differences for oval shaped VRSs between the two groups. The number of VRSs with the size over than 2 mm were significantly more observed in the MS groups compared to controls. We also observed some differences in the characteristics of VRSs between the genders in the MS group. Conclusion: The results of this study shed more light on the usefulness of VRSs as an MRI marker for the disease. In addition, according to our results VRSs might also have implication to determine the prognosis of the disease. However, larger studies with more advanced MRI techniques are required to confirm our results.

  11. Features of Virchow-Robin spaces in newly diagnosed multiple sclerosis patients

    International Nuclear Information System (INIS)

    Etemadifar, Masoud; Hekmatnia, Ali; Tayari, Nazila; Kazemi, Mojtaba; Ghazavi, Amirhossein; Akbari, Mojtaba; Maghzi, Amir-Hadi

    2011-01-01

    Background: Virchow-Robin spaces (VRSs) are perivascular pia-lined extensions of the subarachnoid space around the arteries and veins as they enter the brain parenchyma. These spaces are responsible for inflammatory processes within the brain. Objectives: This study was designed to shed more light on the location, size and shape of VRSs on 3 mm slice thickness, 1.5 Tesla MRI scans of newly diagnosed MS patients in Isfahan, Iran and compare the results with healthy age- and sex-matched controls. Methods: We evaluated MRI scans of 73 MS patients obtained within 3 months of MS onset and compared them with MRI scans from 73 age- and sex-matched healthy volunteers. Three mm section proton density, T2W and FLAIR MR images were obtained for all subjects. The location, size and shape of VRSs were compared between the two groups. Results: The total number of VRSs was significantly more in the MS group (p < 0.001). The distribution of VRSs were significantly more located in the high convexity areas in the MS group (p < 0.001), while there was no significant differences in other regions. The round shaped VRSs were significantly more detected on MRI scans of MS patients, and curvilinear shapes were significantly more frequently observed in healthy volunteers, however there were no significant differences for oval shaped VRSs between the two groups. The number of VRSs with the size over than 2 mm were significantly more observed in the MS groups compared to controls. We also observed some differences in the characteristics of VRSs between the genders in the MS group. Conclusion: The results of this study shed more light on the usefulness of VRSs as an MRI marker for the disease. In addition, according to our results VRSs might also have implication to determine the prognosis of the disease. However, larger studies with more advanced MRI techniques are required to confirm our results.

  12. Flexible feature-space-construction architecture and its VLSI implementation for multi-scale object detection

    Science.gov (United States)

    Luo, Aiwen; An, Fengwei; Zhang, Xiangyu; Chen, Lei; Huang, Zunkai; Jürgen Mattausch, Hans

    2018-04-01

    Feature extraction techniques are a cornerstone of object detection in computer-vision-based applications. The detection performance of vison-based detection systems is often degraded by, e.g., changes in the illumination intensity of the light source, foreground-background contrast variations or automatic gain control from the camera. In order to avoid such degradation effects, we present a block-based L1-norm-circuit architecture which is configurable for different image-cell sizes, cell-based feature descriptors and image resolutions according to customization parameters from the circuit input. The incorporated flexibility in both the image resolution and the cell size for multi-scale image pyramids leads to lower computational complexity and power consumption. Additionally, an object-detection prototype for performance evaluation in 65 nm CMOS implements the proposed L1-norm circuit together with a histogram of oriented gradients (HOG) descriptor and a support vector machine (SVM) classifier. The proposed parallel architecture with high hardware efficiency enables real-time processing, high detection robustness, small chip-core area as well as low power consumption for multi-scale object detection.

  13. High-dimensional statistical inference: From vector to matrix

    Science.gov (United States)

    Zhang, Anru

    Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The

  14. Genuinely high-dimensional nonlocality optimized by complementary measurements

    International Nuclear Information System (INIS)

    Lim, James; Ryu, Junghee; Yoo, Seokwon; Lee, Changhyoup; Bang, Jeongho; Lee, Jinhyoung

    2010-01-01

    Qubits exhibit extreme nonlocality when their state is maximally entangled and this is observed by mutually unbiased local measurements. This criterion does not hold for the Bell inequalities of high-dimensional systems (qudits), recently proposed by Collins-Gisin-Linden-Massar-Popescu and Son-Lee-Kim. Taking an alternative approach, called the quantum-to-classical approach, we derive a series of Bell inequalities for qudits that satisfy the criterion as for the qubits. In the derivation each d-dimensional subsystem is assumed to be measured by one of d possible measurements with d being a prime integer. By applying to two qubits (d=2), we find that a derived inequality is reduced to the Clauser-Horne-Shimony-Holt inequality when the degree of nonlocality is optimized over all the possible states and local observables. Further applying to two and three qutrits (d=3), we find Bell inequalities that are violated for the three-dimensionally entangled states but are not violated by any two-dimensionally entangled states. In other words, the inequalities discriminate three-dimensional (3D) entanglement from two-dimensional (2D) entanglement and in this sense they are genuinely 3D. In addition, for the two qutrits we give a quantitative description of the relations among the three degrees of complementarity, entanglement and nonlocality. It is shown that the degree of complementarity jumps abruptly to very close to its maximum as nonlocality starts appearing. These characteristics imply that complementarity plays a more significant role in the present inequality compared with the previously proposed inequality.

  15. Approximation of High-Dimensional Rank One Tensors

    KAUST Repository

    Bachmayr, Markus

    2013-11-12

    Many real world problems are high-dimensional in that their solution is a function which depends on many variables or parameters. This presents a computational challenge since traditional numerical techniques are built on model classes for functions based solely on smoothness. It is known that the approximation of smoothness classes of functions suffers from the so-called \\'curse of dimensionality\\'. Avoiding this curse requires new model classes for real world functions that match applications. This has led to the introduction of notions such as sparsity, variable reduction, and reduced modeling. One theme that is particularly common is to assume a tensor structure for the target function. This paper investigates how well a rank one function f(x 1,...,x d)=f 1(x 1)⋯f d(x d), defined on Ω=[0,1]d can be captured through point queries. It is shown that such a rank one function with component functions f j in W∞ r([0,1]) can be captured (in L ∞) to accuracy O(C(d,r)N -r) from N well-chosen point evaluations. The constant C(d,r) scales like d dr. The queries in our algorithms have two ingredients, a set of points built on the results from discrepancy theory and a second adaptive set of queries dependent on the information drawn from the first set. Under the assumption that a point z∈Ω with nonvanishing f(z) is known, the accuracy improves to O(dN -r). © 2013 Springer Science+Business Media New York.

  16. Statistical mechanics of complex neural systems and high dimensional data

    International Nuclear Information System (INIS)

    Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya

    2013-01-01

    Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks. (paper)

  17. Approximation of High-Dimensional Rank One Tensors

    KAUST Repository

    Bachmayr, Markus; Dahmen, Wolfgang; DeVore, Ronald; Grasedyck, Lars

    2013-01-01

    Many real world problems are high-dimensional in that their solution is a function which depends on many variables or parameters. This presents a computational challenge since traditional numerical techniques are built on model classes for functions based solely on smoothness. It is known that the approximation of smoothness classes of functions suffers from the so-called 'curse of dimensionality'. Avoiding this curse requires new model classes for real world functions that match applications. This has led to the introduction of notions such as sparsity, variable reduction, and reduced modeling. One theme that is particularly common is to assume a tensor structure for the target function. This paper investigates how well a rank one function f(x 1,...,x d)=f 1(x 1)⋯f d(x d), defined on Ω=[0,1]d can be captured through point queries. It is shown that such a rank one function with component functions f j in W∞ r([0,1]) can be captured (in L ∞) to accuracy O(C(d,r)N -r) from N well-chosen point evaluations. The constant C(d,r) scales like d dr. The queries in our algorithms have two ingredients, a set of points built on the results from discrepancy theory and a second adaptive set of queries dependent on the information drawn from the first set. Under the assumption that a point z∈Ω with nonvanishing f(z) is known, the accuracy improves to O(dN -r). © 2013 Springer Science+Business Media New York.

  18. Reducing the n-gram feature space of class C GPCRs to subtype-discriminating patterns

    Directory of Open Access Journals (Sweden)

    König Caroline

    2014-12-01

    Full Text Available G protein-coupled receptors (GPCRs are a large and heterogeneous superfamily of receptors that are key cell players for their role as extracellular signal transmitters. Class C GPCRs, in particular, are of great interest in pharmacology. The lack of knowledge about their full 3-D structure prompts the use of their primary amino acid sequences for the construction of robust classifiers, capable of discriminating their different subtypes. In this paper, we investigate the use of feature selection techniques to build Support Vector Machine (SVM-based classification models from selected receptor subsequences described as n-grams. We show that this approach to classification is useful for finding class C GPCR subtype-specific motifs.

  19. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

    Science.gov (United States)

    Dazard, Jean-Eudes; Rao, J Sunil

    2012-07-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

  20. Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis

    KAUST Repository

    Bhadra, Anindya

    2013-04-22

    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.

  1. The Visualization and Analysis of POI Features under Network Space Supported by Kernel Density Estimation

    Directory of Open Access Journals (Sweden)

    YU Wenhao

    2015-01-01

    Full Text Available The distribution pattern and the distribution density of urban facility POIs are of great significance in the fields of infrastructure planning and urban spatial analysis. The kernel density estimation, which has been usually utilized for expressing these spatial characteristics, is superior to other density estimation methods (such as Quadrat analysis, Voronoi-based method, for that the Kernel density estimation considers the regional impact based on the first law of geography. However, the traditional kernel density estimation is mainly based on the Euclidean space, ignoring the fact that the service function and interrelation of urban feasibilities is carried out on the network path distance, neither than conventional Euclidean distance. Hence, this research proposed a computational model of network kernel density estimation, and the extension type of model in the case of adding constraints. This work also discussed the impacts of distance attenuation threshold and height extreme to the representation of kernel density. The large-scale actual data experiment for analyzing the different POIs' distribution patterns (random type, sparse type, regional-intensive type, linear-intensive type discusses the POI infrastructure in the city on the spatial distribution of characteristics, influence factors, and service functions.

  2. Linear sign in cystic brain lesions ≥5 mm. A suggestive feature of perivascular space

    Energy Technology Data Exchange (ETDEWEB)

    Sung, Jinkyeong [The Catholic University of Korea, Department of Radiology, Seoul St. Mary' s Hospital, College of Medicine, Seoul (Korea, Republic of); The Catholic University of Korea, Department of Radiology, St. Vincent' s Hospital, College of Medicine, Seoul (Korea, Republic of); Jang, Jinhee; Choi, Hyun Seok; Jung, So-Lyung; Ahn, Kook-Jin; Kim, Bum-soo [The Catholic University of Korea, Department of Radiology, Seoul St. Mary' s Hospital, College of Medicine, Seoul (Korea, Republic of)

    2017-11-15

    To determine the prevalence of a linear sign within enlarged perivascular space (EPVS) and chronic lacunar infarction (CLI) ≥ 5 mm on T2-weighted imaging (T2WI) and time-of-flight (TOF) magnetic resonance angiography (MRA), and to evaluate the diagnostic value of the linear signs for EPVS over CLI. This study included 101 patients with cystic lesions ≥ 5 mm on brain MRI including TOF MRA. After classification of cystic lesions into EPVS or CLI, two readers assessed linear signs on T2WI and TOF MRA. We compared the prevalence and the diagnostic performance of linear signs. Among 46 EPVS and 51 CLI, 84 lesions (86.6%) were in basal ganglia. The prevalence of T2 and TOF linear signs was significantly higher in the EPVS than in the CLI (P <.001). For the diagnosis of EPVS, T2 and TOF linear signs showed high sensitivity (> 80%). TOF linear sign showed significantly higher specificity (100%) and accuracy (92.8% and 90.7%) than T2 linear sign (P <.001). T2 and TOF linear signs were more frequently observed in EPVS than CLI. They showed high sensitivity in differentiation of them, especially for basal ganglia. TOF sign showed higher specificity and accuracy than T2 sign. (orig.)

  3. Features of space-charge-limited emission in foil-less diodes

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Ping; Yuan, Keliang; Liu, Guozhi [Department of Engineering Physics, Tsinghua University, Beijing 100084 (China); Science and Technology on High Power Microwave Laboratory, Northwest Institute of Nuclear Technology, Xi' an 710024 (China); Sun, Jun [Science and Technology on High Power Microwave Laboratory, Northwest Institute of Nuclear Technology, Xi' an 710024 (China)

    2014-12-15

    Space-charge-limited (SCL) current can always be obtained from the blade surface of annular cathodes in foil-less diodes which are widely used in O-type relativistic high power microwave generators. However, there is little theoretical analysis regarding it due to the mathematical complexity, and almost all formulas about the SCL current in foil-less diodes are based on numerical simulation results. This paper performs an initial trial in calculation of the SCL current from annular cathodes theoretically under the ultra-relativistic assumption and the condition of infinitely large guiding magnetic field. The numerical calculation based on the theoretical research is coherent with the particle-in-cell (PIC) simulation result to some extent under a diode voltage of 850 kV. Despite that the theoretical research gives a much larger current than the PIC simulation (41.3 kA for the former and 9.7 kA for the latter), which is induced by the ultra-relativistic assumption in the theoretical research, they both show the basic characteristic of emission from annular cathodes in foil-less diodes, i.e., the emission enhancement at the cathode blade edges, especially at the outer edge. This characteristic is confirmed to some extent in our experimental research of cathode plasma photographing under the same diode voltage and a guiding magnetic field of 4 T.

  4. Main circulator design features for HTR 100, HTR 500 and space heating plants

    International Nuclear Information System (INIS)

    Engel, J.; Glass, D.

    1988-01-01

    All design alternatives for modern high-temperature reactors have a common circulator concept: It is based on a vertical shaft design with a flying impeller. The circulators are equipped with active magnetic bearings and are driven by induction motors connected to variable-speed static converters. Due to their multiple functions during normal reactor operation and under accident conditions, extremely high requirements are made to safety-relevant circulators, since with the reactor pressurized as well as under depressurized conditions specified delivery heads and flow rates have to be ensured. The use of active magnetic bearings permits to obtain maintenance-free operation and functional safety to an extent which had not been achieved before. Magnetic bearings are therefore provided for the total range including primary gas circulators of a drive power of several MW as well as circulators for helium loops of reactor auxiliary systems. The essential feature for using active magnetic bearings is the retainer bearing technology, preventing contact between rotor and static circulator parts upon unintended deenergisation of the magnets. Results of current experiments are reported. Another aspect to be considered for reliable long-term operation for several decades is the effect of rotor dynamics. The various natural frequencies resulting from torsion and bending modes in view of a drive by a frequency-controlled induction motor have to be considered as well as the specific characteristics of the active magnetic bearings. Special attention has to be directed to the internal cooling loop so as to ensure that reactor temperature excursions in the event of deviation from normal operation can be overcome without damage. For circulator components exposed to temperature fields the design characteristics are determined by combining experimental and analytical methods. The coordination of all component parts is currently being optimized on a prototype circulator whose detailed

  5. Matrix correlations for high-dimensional data: The modified RV-coefficient

    NARCIS (Netherlands)

    Smilde, A.K.; Kiers, H.A.L.; Bijlsma, S.; Rubingh, C.M.; Erk, M.J. van

    2009-01-01

    Motivation: Modern functional genomics generates high-dimensional datasets. It is often convenient to have a single simple number characterizing the relationship between pairs of such high-dimensional datasets in a comprehensive way. Matrix correlations are such numbers and are appealing since they

  6. Key features of human episodic recollection in the cross-episode retrieval of rat hippocampus representations of space.

    Directory of Open Access Journals (Sweden)

    Eduard Kelemen

    2013-07-01

    Full Text Available Neurophysiological studies focus on memory retrieval as a reproduction of what was experienced and have established that neural discharge is replayed to express memory. However, cognitive psychology has established that recollection is not a verbatim replay of stored information. Recollection is constructive, the product of memory retrieval cues, the information stored in memory, and the subject's state of mind. We discovered key features of constructive recollection embedded in the rat CA1 ensemble discharge during an active avoidance task. Rats learned two task variants, one with the arena stable, the other with it rotating; each variant defined a distinct behavioral episode. During the rotating episode, the ensemble discharge of CA1 principal neurons was dynamically organized to concurrently represent space in two distinct codes. The code for spatial reference frame switched rapidly between representing the rat's current location in either the stationary spatial frame of the room or the rotating frame of the arena. The code for task variant switched less frequently between a representation of the current rotating episode and the stable episode from the rat's past. The characteristics and interplay of these two hippocampal codes revealed three key properties of constructive recollection. (1 Although the ensemble representations of the stable and rotating episodes were distinct, ensemble discharge during rotation occasionally resembled the stable condition, demonstrating cross-episode retrieval of the representation of the remote, stable episode. (2 This cross-episode retrieval at the level of the code for task variant was more likely when the rotating arena was about to match its orientation in the stable episode. (3 The likelihood of cross-episode retrieval was influenced by preretrieval information that was signaled at the level of the code for spatial reference frame. Thus key features of episodic recollection manifest in rat hippocampal

  7. Safe, Affordable, Convenient: Environmental Features of Malls and Other Public Spaces Used by Older Adults for Walking.

    Science.gov (United States)

    King, Diane K; Allen, Peg; Jones, Dina L; Marquez, David X; Brown, David R; Rosenberg, Dori; Janicek, Sarah; Allen, Laila; Belza, Basia

    2016-03-01

    Midlife and older adults use shopping malls for walking, but little research has examined mall characteristics that contribute to their walkability. We used modified versions of the Centers for Disease Control and Prevention (CDC)-Healthy Aging Research Network (HAN) Environmental Audit and the System for Observing Play and Recreation in Communities (SOPARC) tool to systematically observe 443 walkers in 10 shopping malls. We also observed 87 walkers in 6 community-based nonmall/nongym venues where older adults routinely walked for physical activity. All venues had public transit stops and accessible parking. All malls and 67% of nonmalls had wayfinding aids, and most venues (81%) had an established circuitous walking route and clean, well-maintained public restrooms (94%). All venues had level floor surfaces, and one-half had benches along the walking route. Venues varied in hours of access, programming, tripping hazards, traffic control near entrances, and lighting. Despite diversity in location, size, and purpose, the mall and nonmall venues audited shared numerous environmental features known to promote walking in older adults and few barriers to walking. Future research should consider programmatic features and outreach strategies to expand the use of malls and other suitable public spaces for walking.

  8. Safe, Affordable, Convenient: Environmental Features of Malls and Other Public Spaces Used by Older Adults for Walking

    Science.gov (United States)

    King, Diane K.; Allen, Peg; Jones, Dina L.; Marquez, David X.; Brown, David R.; Rosenberg, Dori; Janicek, Sarah; Allen, Laila; Belza, Basia

    2016-01-01

    Background Midlife and older adults use shopping malls for walking, but little research has examined mall characteristics that contribute to their walkability. Methods We used modified versions of the Centers for Disease Control and Prevention (CDC)-Healthy Aging Research Network (HAN) Environmental Audit and the System for Observing Play and Recreation in Communities (SOPARC) tool to systematically observe 443 walkers in 10 shopping malls. We also observed 87 walkers in 6 community-based nonmall/nongym venues where older adults routinely walked for physical activity. Results All venues had public transit stops and accessible parking. All malls and 67% of nonmalls had wayfinding aids, and most venues (81%) had an established circuitous walking route and clean, well-maintained public restrooms (94%). All venues had level floor surfaces, and one-half had benches along the walking route. Venues varied in hours of access, programming, tripping hazards, traffic control near entrances, and lighting. Conclusions Despite diversity in location, size, and purpose, the mall and nonmall venues audited shared numerous environmental features known to promote walking in older adults and few barriers to walking. Future research should consider programmatic features and outreach strategies to expand the use of malls and other suitable public spaces for walking. PMID:26181907

  9. From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data

    Science.gov (United States)

    Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.

    2007-11-01

    Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.

  10. Exploration of High-Dimensional Scalar Function for Nuclear Reactor Safety Analysis and Visualization

    Energy Technology Data Exchange (ETDEWEB)

    Dan Maljovec; Bei Wang; Valerio Pascucci; Peer-Timo Bremer; Michael Pernice; Robert Nourgaliev

    2013-05-01

    The next generation of methodologies for nuclear reactor Probabilistic Risk Assessment (PRA) explicitly accounts for the time element in modeling the probabilistic system evolution and uses numerical simulation tools to account for possible dependencies between failure events. The Monte-Carlo (MC) and the Dynamic Event Tree (DET) approaches belong to this new class of dynamic PRA methodologies. A challenge of dynamic PRA algorithms is the large amount of data they produce which may be difficult to visualize and analyze in order to extract useful information. We present a software tool that is designed to address these goals. We model a large-scale nuclear simulation dataset as a high-dimensional scalar function defined over a discrete sample of the domain. First, we provide structural analysis of such a function at multiple scales and provide insight into the relationship between the input parameters and the output. Second, we enable exploratory analysis for users, where we help the users to differentiate features from noise through multi-scale analysis on an interactive platform, based on domain knowledge and data characterization. Our analysis is performed by exploiting the topological and geometric properties of the domain, building statistical models based on its topological segmentations and providing interactive visual interfaces to facilitate such explorations. We provide a user’s guide to our software tool by highlighting its analysis and visualization capabilities, along with a use case involving dataset from a nuclear reactor safety simulation.

  11. Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity.

    Science.gov (United States)

    Chang, Jinyuan; Zheng, Chao; Zhou, Wen-Xin; Zhou, Wen

    2017-12-01

    In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2017, The International Biometric Society.

  12. Biomarker identification and effect estimation on schizophrenia –a high dimensional data analysis

    Directory of Open Access Journals (Sweden)

    Yuanzhang eLi

    2015-05-01

    Full Text Available Biomarkers have been examined in schizophrenia research for decades. Medical morbidity and mortality rates, as well as personal and societal costs, are associated with schizophrenia patients. The identification of biomarkers and alleles, which often have a small effect individually, may help to develop new diagnostic tests for early identification and treatment. Currently, there is not a commonly accepted statistical approach to identify predictive biomarkers from high dimensional data. We used space Decomposition-Gradient-Regression method (DGR to select biomarkers, which are associated with the risk of schizophrenia. Then, we used the gradient scores, generated from the selected biomarkers, as the prediction factor in regression to estimate their effects. We also used an alternative approach, classification and regression tree (CART, to compare the biomarker selected by DGR and found about 70% of the selected biomarkers were the same. However, the advantage of DGR is that it can evaluate individual effects for each biomarker from their combined effect. In DGR analysis of serum specimens of US military service members with a diagnosis of schizophrenia from 1992 to 2005 and their controls, Alpha-1-Antitrypsin (AAT, Interleukin-6 receptor (IL-6r and Connective Tissue Growth Factor (CTGF were selected to identify schizophrenia for males; and Alpha-1-Antitrypsin (AAT, Apolipoprotein B (Apo B and Sortilin were selected for females. If these findings from military subjects are replicated by other studies, they suggest the possibility of a novel biomarker panel as an adjunct to earlier diagnosis and initiation of treatment.

  13. Integrating high dimensional bi-directional parsing models for gene mention tagging.

    Science.gov (United States)

    Hsu, Chun-Nan; Chang, Yu-Ming; Kuo, Cheng-Ju; Lin, Yu-Shi; Huang, Han-Shen; Chung, I-Fang

    2008-07-01

    Tagging gene and gene product mentions in scientific text is an important initial step of literature mining. In this article, we describe in detail our gene mention tagger participated in BioCreative 2 challenge and analyze what contributes to its good performance. Our tagger is based on the conditional random fields model (CRF), the most prevailing method for the gene mention tagging task in BioCreative 2. Our tagger is interesting because it accomplished the highest F-scores among CRF-based methods and second over all. Moreover, we obtained our results by mostly applying open source packages, making it easy to duplicate our results. We first describe in detail how we developed our CRF-based tagger. We designed a very high dimensional feature set that includes most of information that may be relevant. We trained bi-directional CRF models with the same set of features, one applies forward parsing and the other backward, and integrated two models based on the output scores and dictionary filtering. One of the most prominent factors that contributes to the good performance of our tagger is the integration of an additional backward parsing model. However, from the definition of CRF, it appears that a CRF model is symmetric and bi-directional parsing models will produce the same results. We show that due to different feature settings, a CRF model can be asymmetric and the feature setting for our tagger in BioCreative 2 not only produces different results but also gives backward parsing models slight but constant advantage over forward parsing model. To fully explore the potential of integrating bi-directional parsing models, we applied different asymmetric feature settings to generate many bi-directional parsing models and integrate them based on the output scores. Experimental results show that this integrated model can achieve even higher F-score solely based on the training corpus for gene mention tagging. Data sets, programs and an on-line service of our gene

  14. Counting and classifying attractors in high dimensional dynamical systems.

    Science.gov (United States)

    Bagley, R J; Glass, L

    1996-12-07

    Randomly connected Boolean networks have been used as mathematical models of neural, genetic, and immune systems. A key quantity of such networks is the number of basins of attraction in the state space. The number of basins of attraction changes as a function of the size of the network, its connectivity and its transition rules. In discrete networks, a simple count of the number of attractors does not reveal the combinatorial structure of the attractors. These points are illustrated in a reexamination of dynamics in a class of random Boolean networks considered previously by Kauffman. We also consider comparisons between dynamics in discrete networks and continuous analogues. A continuous analogue of a discrete network may have a different number of attractors for many different reasons. Some attractors in discrete networks may be associated with unstable dynamics, and several different attractors in a discrete network may be associated with a single attractor in the continuous case. Special problems in determining attractors in continuous systems arise when there is aperiodic dynamics associated with quasiperiodicity of deterministic chaos.

  15. Mitigating the Insider Threat Using High-Dimensional Search and Modeling

    National Research Council Canada - National Science Library

    Van Den Berg, Eric; Uphadyaya, Shambhu; Ngo, Phi H; Muthukrishnan, Muthu; Palan, Rajago

    2006-01-01

    In this project a system was built aimed at mitigating insider attacks centered around a high-dimensional search engine for correlating the large number of monitoring streams necessary for detecting insider attacks...

  16. Demonstration of pattern transfer into sub-100 nm polysilicon line/space features patterned with extreme ultraviolet lithography

    International Nuclear Information System (INIS)

    Cardinale, G. F.; Henderson, C. C.; Goldsmith, J. E. M.; Mangat, P. J. S.; Cobb, J.; Hector, S. D.

    1999-01-01

    In two separate experiments, we have successfully demonstrated the transfer of dense- and loose-pitch line/space (L/S) photoresist features, patterned with extreme ultraviolet (EUV) lithography, into an underlying hard mask material. In both experiments, a deep-UV photoresist (∼90 nm thick) was spin cast in bilayer format onto a hard mask (50-90 nm thick) and was subsequently exposed to EUV radiation using a 10x reduction EUV exposure system. The EUV reticle was fabricated at Motorola (Tempe, AZ) using a subtractive process with Ta-based absorbers on Mo/Si multilayer mask blanks. In the first set of experiments, following the EUV exposures, the L/S patterns were transferred first into a SiO 2 hard mask (60 nm thick) using a reactive ion etch (RIE), and then into polysilicon (350 nm thick) using a triode-coupled plasma RIE etcher at the University of California, Berkeley, microfabrication facilities. The latter etch process, which produced steep (>85 degree sign ) sidewalls, employed a HBr/Cl chemistry with a large (>10:1) etch selectivity of polysilicon to silicon dioxide. In the second set of experiments, hard mask films of SiON (50 nm thick) and SiO 2 (87 nm thick) were used. A RIE was performed at Motorola using a halogen gas chemistry that resulted in a hard mask-to-photoresist etch selectivity >3:1 and sidewall profile angles ≥85 degree sign . Line edge roughness (LER) and linewidth critical dimension (CD) measurements were performed using Sandia's GORA(c) CD digital image analysis software. Low LER values (6-9 nm, 3σ, one side) and good CD linearity (better than 10%) were demonstrated for the final pattern-transferred dense polysilicon L/S features from 80 to 175 nm. In addition, pattern transfer (into polysilicon) of loose-pitch (1:2) L/S features with CDs≥60 nm was demonstrated. (c) 1999 American Vacuum Society

  17. Using a High-Dimensional Graph of Semantic Space to Model Relationships among Words

    Directory of Open Access Journals (Sweden)

    Alice F Jackson

    2014-05-01

    Full Text Available The GOLD model (Graph Of Language Distribution is a network model constructed based on co-occurrence in a large corpus of natural language that may be used to explore what information may be present in a graph-structured model of language, and what information may be extracted through theoretically-driven algorithms as well as standard graph analysis methods. The present study will employ GOLD to examine two types of relationship between words: semantic similarity and associative relatedness. Semantic similarity refers to the degree of overlap in meaning between words, while associative relatedness refers to the degree to which two words occur in the same schematic context. It is expected that a graph structured model of language constructed based on co-occurrence should easily capture associative relatedness, because this type of relationship is thought to be present directly in lexical co-occurrence. However, it is hypothesized that semantic similarity may be extracted from the intersection of the set of first-order connections, because two words that are semantically similar may occupy similar thematic or syntactic roles across contexts and thus would co-occur lexically with the same set of nodes. Two versions the GOLD model that differed in terms of the co-occurence window, bigGOLD at the paragraph level and smallGOLD at the adjacent word level, were directly compared to the performance of a well-established distributional model, Latent Semantic Analysis (LSA. The superior performance of the GOLD models (big and small suggest that a single acquisition and storage mechanism, namely co-occurrence, can account for associative and conceptual relationships between words and is more psychologically plausible than models using singular value decomposition.

  18. Using a high-dimensional graph of semantic space to model relationships among words.

    Science.gov (United States)

    Jackson, Alice F; Bolger, Donald J

    2014-01-01

    The GOLD model (Graph Of Language Distribution) is a network model constructed based on co-occurrence in a large corpus of natural language that may be used to explore what information may be present in a graph-structured model of language, and what information may be extracted through theoretically-driven algorithms as well as standard graph analysis methods. The present study will employ GOLD to examine two types of relationship between words: semantic similarity and associative relatedness. Semantic similarity refers to the degree of overlap in meaning between words, while associative relatedness refers to the degree to which two words occur in the same schematic context. It is expected that a graph structured model of language constructed based on co-occurrence should easily capture associative relatedness, because this type of relationship is thought to be present directly in lexical co-occurrence. However, it is hypothesized that semantic similarity may be extracted from the intersection of the set of first-order connections, because two words that are semantically similar may occupy similar thematic or syntactic roles across contexts and thus would co-occur lexically with the same set of nodes. Two versions the GOLD model that differed in terms of the co-occurence window, bigGOLD at the paragraph level and smallGOLD at the adjacent word level, were directly compared to the performance of a well-established distributional model, Latent Semantic Analysis (LSA). The superior performance of the GOLD models (big and small) suggest that a single acquisition and storage mechanism, namely co-occurrence, can account for associative and conceptual relationships between words and is more psychologically plausible than models using singular value decomposition (SVD).

  19. Restoring the Generalizability of SVM Based Decoding in High Dimensional Neuroimage Data

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie; Hansen, Lars Kai

    2011-01-01

    Variance inflation is caused by a mismatch between linear projections of test and training data when projections are estimated on training sets smaller than the dimensionality of the feature space. We demonstrate that variance inflation can lead to an increased neuroimage decoding error rate...

  20. Motif distributions in phase-space networks for characterizing experimental two-phase flow patterns with chaotic features.

    Science.gov (United States)

    Gao, Zhong-Ke; Jin, Ning-De; Wang, Wen-Xu; Lai, Ying-Cheng

    2010-07-01

    The dynamics of two-phase flows have been a challenging problem in nonlinear dynamics and fluid mechanics. We propose a method to characterize and distinguish patterns from inclined water-oil flow experiments based on the concept of network motifs that have found great usage in network science and systems biology. In particular, we construct from measured time series phase-space complex networks and then calculate the distribution of a set of distinct network motifs. To gain insight, we first test the approach using time series from classical chaotic systems and find a universal feature: motif distributions from different chaotic systems are generally highly heterogeneous. Our main finding is that the distributions from experimental two-phase flows tend to be heterogeneous as well, suggesting the underlying chaotic nature of the flow patterns. Calculation of the maximal Lyapunov exponent provides further support for this. Motif distributions can thus be a feasible tool to understand the dynamics of realistic two-phase flow patterns.

  1. Object-Based Change Detection in Urban Areas: The Effects of Segmentation Strategy, Scale, and Feature Space on Unsupervised Methods

    Directory of Open Access Journals (Sweden)

    Lei Ma

    2016-09-01

    Full Text Available Object-based change detection (OBCD has recently been receiving increasing attention as a result of rapid improvements in the resolution of remote sensing data. However, some OBCD issues relating to the segmentation of high-resolution images remain to be explored. For example, segmentation units derived using different segmentation strategies, segmentation scales, feature space, and change detection methods have rarely been assessed. In this study, we have tested four common unsupervised change detection methods using different segmentation strategies and a series of segmentation scale parameters on two WorldView-2 images of urban areas. We have also evaluated the effect of adding extra textural and Normalized Difference Vegetation Index (NDVI information instead of using only spectral information. Our results indicated that change detection methods performed better at a medium scale than at a fine scale where close to the pixel size. Multivariate Alteration Detection (MAD always outperformed the other methods tested, at the same confidence level. The overall accuracy appeared to benefit from using a two-date segmentation strategy rather than single-date segmentation. Adding textural and NDVI information appeared to reduce detection accuracy, but the magnitude of this reduction was not consistent across the different unsupervised methods and segmentation strategies. We conclude that a two-date segmentation strategy is useful for change detection in high-resolution imagery, but that the optimization of thresholds is critical for unsupervised change detection methods. Advanced methods need be explored that can take advantage of additional textural or other parameters.

  2. Feature-space assessment of electrical impedance tomography coregistered with computed tomography in detecting multiple contrast targets

    International Nuclear Information System (INIS)

    Krishnan, Kalpagam; Liu, Jeff; Kohli, Kirpal

    2014-01-01

    Purpose: Fusion of electrical impedance tomography (EIT) with computed tomography (CT) can be useful as a clinical tool for providing additional physiological information about tissues, but requires suitable fusion algorithms and validation procedures. This work explores the feasibility of fusing EIT and CT images using an algorithm for coregistration. The imaging performance is validated through feature space assessment on phantom contrast targets. Methods: EIT data were acquired by scanning a phantom using a circuit, configured for injecting current through 16 electrodes, placed around the phantom. A conductivity image of the phantom was obtained from the data using electrical impedance and diffuse optical tomography reconstruction software (EIDORS). A CT image of the phantom was also acquired. The EIT and CT images were fused using a region of interest (ROI) coregistration fusion algorithm. Phantom imaging experiments were carried out on objects of different contrasts, sizes, and positions. The conductive medium of the phantoms was made of a tissue-mimicking bolus material that is routinely used in clinical radiation therapy settings. To validate the imaging performance in detecting different contrasts, the ROI of the phantom was filled with distilled water and normal saline. Spatially separated cylindrical objects of different sizes were used for validating the imaging performance in multiple target detection. Analyses of the CT, EIT and the EIT/CT phantom images were carried out based on the variations of contrast, correlation, energy, and homogeneity, using a gray level co-occurrence matrix (GLCM). A reference image of the phantom was simulated using EIDORS, and the performances of the CT and EIT imaging systems were evaluated and compared against the performance of the EIT/CT system using various feature metrics, detectability, and structural similarity index measures. Results: In detecting distilled and normal saline water in bolus medium, EIT as a stand

  3. High-Dimensional Additive Hazards Regression for Oral Squamous Cell Carcinoma Using Microarray Data: A Comparative Study

    Directory of Open Access Journals (Sweden)

    Omid Hamidi

    2014-01-01

    Full Text Available Microarray technology results in high-dimensional and low-sample size data sets. Therefore, fitting sparse models is substantial because only a small number of influential genes can reliably be identified. A number of variable selection approaches have been proposed for high-dimensional time-to-event data based on Cox proportional hazards where censoring is present. The present study applied three sparse variable selection techniques of Lasso, smoothly clipped absolute deviation and the smooth integration of counting, and absolute deviation for gene expression survival time data using the additive risk model which is adopted when the absolute effects of multiple predictors on the hazard function are of interest. The performances of used techniques were evaluated by time dependent ROC curve and bootstrap .632+ prediction error curves. The selected genes by all methods were highly significant (P<0.001. The Lasso showed maximum median of area under ROC curve over time (0.95 and smoothly clipped absolute deviation showed the lowest prediction error (0.105. It was observed that the selected genes by all methods improved the prediction of purely clinical model indicating the valuable information containing in the microarray features. So it was concluded that used approaches can satisfactorily predict survival based on selected gene expression measurements.

  4. TSAR: a program for automatic resonance assignment using 2D cross-sections of high dimensionality, high-resolution spectra

    Energy Technology Data Exchange (ETDEWEB)

    Zawadzka-Kazimierczuk, Anna; Kozminski, Wiktor [University of Warsaw, Faculty of Chemistry (Poland); Billeter, Martin, E-mail: martin.billeter@chem.gu.se [University of Gothenburg, Biophysics Group, Department of Chemistry and Molecular Biology (Sweden)

    2012-09-15

    While NMR studies of proteins typically aim at structure, dynamics or interactions, resonance assignments represent in almost all cases the initial step of the analysis. With increasing complexity of the NMR spectra, for example due to decreasing extent of ordered structure, this task often becomes both difficult and time-consuming, and the recording of high-dimensional data with high-resolution may be essential. Random sampling of the evolution time space, combined with sparse multidimensional Fourier transform (SMFT), allows for efficient recording of very high dimensional spectra ({>=}4 dimensions) while maintaining high resolution. However, the nature of this data demands for automation of the assignment process. Here we present the program TSAR (Tool for SMFT-based Assignment of Resonances), which exploits all advantages of SMFT input. Moreover, its flexibility allows to process data from any type of experiments that provide sequential connectivities. The algorithm was tested on several protein samples, including a disordered 81-residue fragment of the {delta} subunit of RNA polymerase from Bacillus subtilis containing various repetitive sequences. For our test examples, TSAR achieves a high percentage of assigned residues without any erroneous assignments.

  5. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

    Directory of Open Access Journals (Sweden)

    Datta Susmita

    2010-08-01

    Full Text Available Abstract Background Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. Results We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. Conclusions For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.

  6. Effective traffic features selection algorithm for cyber-attacks samples

    Science.gov (United States)

    Li, Yihong; Liu, Fangzheng; Du, Zhenyu

    2018-05-01

    By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.

  7. Rolling Bearing Fault Diagnosis Using Modified Neighborhood Preserving Embedding and Maximal Overlap Discrete Wavelet Packet Transform with Sensitive Features Selection

    Directory of Open Access Journals (Sweden)

    Fei Dong

    2018-01-01

    Full Text Available In order to enhance the performance of bearing fault diagnosis and classification, features extraction and features dimensionality reduction have become more important. The original statistical feature set was calculated from single branch reconstruction vibration signals obtained by using maximal overlap discrete wavelet packet transform (MODWPT. In order to reduce redundancy information of original statistical feature set, features selection by adjusted rand index and sum of within-class mean deviations (FSASD was proposed to select fault sensitive features. Furthermore, a modified features dimensionality reduction method, supervised neighborhood preserving embedding with label information (SNPEL, was proposed to realize low-dimensional representations for high-dimensional feature space. Finally, vibration signals collected from two experimental test rigs were employed to evaluate the performance of the proposed procedure. The results show that the effectiveness, adaptability, and superiority of the proposed procedure can serve as an intelligent bearing fault diagnosis system.

  8. Engineering two-photon high-dimensional states through quantum interference

    Science.gov (United States)

    Zhang, Yingwen; Roux, Filippus S.; Konrad, Thomas; Agnew, Megan; Leach, Jonathan; Forbes, Andrew

    2016-01-01

    Many protocols in quantum science, for example, linear optical quantum computing, require access to large-scale entangled quantum states. Such systems can be realized through many-particle qubits, but this approach often suffers from scalability problems. An alternative strategy is to consider a lesser number of particles that exist in high-dimensional states. The spatial modes of light are one such candidate that provides access to high-dimensional quantum states, and thus they increase the storage and processing potential of quantum information systems. We demonstrate the controlled engineering of two-photon high-dimensional states entangled in their orbital angular momentum through Hong-Ou-Mandel interference. We prepare a large range of high-dimensional entangled states and implement precise quantum state filtering. We characterize the full quantum state before and after the filter, and are thus able to determine that only the antisymmetric component of the initial state remains. This work paves the way for high-dimensional processing and communication of multiphoton quantum states, for example, in teleportation beyond qubits. PMID:26933685

  9. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix

    KAUST Repository

    Hu, Zongliang

    2017-09-27

    The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.

  10. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.

    Science.gov (United States)

    Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun

    2017-09-21

    The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.

  11. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix

    KAUST Repository

    Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun

    2017-01-01

    The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.

  12. Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

    Science.gov (United States)

    Ghosh, Samiran; Wang, Yazhen

    2015-02-01

    The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small n ) but the number of dimensions/features are large (large p ), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an optimal set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an optimal sub-dimensional model to build the final classifier with sufficient accuracy.

  13. Linear stability theory as an early warning sign for transitions in high dimensional complex systems

    International Nuclear Information System (INIS)

    Piovani, Duccio; Grujić, Jelena; Jensen, Henrik Jeldtoft

    2016-01-01

    We analyse in detail a new approach to the monitoring and forecasting of the onset of transitions in high dimensional complex systems by application to the Tangled Nature model of evolutionary ecology and high dimensional replicator systems with a stochastic element. A high dimensional stability matrix is derived in the mean field approximation to the stochastic dynamics. This allows us to determine the stability spectrum about the observed quasi-stable configurations. From overlap of the instantaneous configuration vector of the full stochastic system with the eigenvectors of the unstable directions of the deterministic mean field approximation, we are able to construct a good early-warning indicator of the transitions occurring intermittently. (paper)

  14. Interface between path and orbital angular momentum entanglement for high-dimensional photonic quantum information.

    Science.gov (United States)

    Fickler, Robert; Lapkiewicz, Radek; Huber, Marcus; Lavery, Martin P J; Padgett, Miles J; Zeilinger, Anton

    2014-07-30

    Photonics has become a mature field of quantum information science, where integrated optical circuits offer a way to scale the complexity of the set-up as well as the dimensionality of the quantum state. On photonic chips, paths are the natural way to encode information. To distribute those high-dimensional quantum states over large distances, transverse spatial modes, like orbital angular momentum possessing Laguerre Gauss modes, are favourable as flying information carriers. Here we demonstrate a quantum interface between these two vibrant photonic fields. We create three-dimensional path entanglement between two photons in a nonlinear crystal and use a mode sorter as the quantum interface to transfer the entanglement to the orbital angular momentum degree of freedom. Thus our results show a flexible way to create high-dimensional spatial mode entanglement. Moreover, they pave the way to implement broad complex quantum networks where high-dimensionally entangled states could be distributed over distant photonic chips.

  15. Features of motivation of the crewmembers in an enclosed space at atmospheric pressure changes during breathing inert gases.

    Science.gov (United States)

    Komarevcev, Sergey

    Since the 1960s, our psychologists are working on experimenting with small groups in isolation .It was associated with the beginning of spaceflight and necessity to study of human behaviors in ways different from the natural habitat of man .Those, who study human behavior especially in isolation, know- that the behavior in isolation markedly different from that in the natural situations. It associated with the development of new, more adaptive behaviors (1) What are the differences ? First of all , isolation is achieved by the fact ,that the group is in a closed space. How experiments show - the crew members have changed the basic personality traits, such as motivation Statement of the problem and methods. In our experimentation we were interested in changing the features of human motivation (strength, stability and direction of motivation) in terms of a closed group in the modified atmosphere pressure and breathing inert gases. Also, we were interested in particular external and internal motivation of the individual in the circumstances. To conduct experimentation , we used an experimental barocomplex GVK -250 , which placed a group of six mans. A task was to spend fifteen days in isolation on barokomplex when breathing oxigen - xenon mixture of fifteen days in isolation on the same complex when breathing oxygen- helium mixture and fifteen days of isolation on the same complex when breathing normal air All this time, the subjects were isolated under conditions of atmospheric pressure changes , closer to what you normally deal divers. We assumed that breathing inert mixtures can change the strength and stability , and with it , the direction and stability of motivation. To check our results, we planned on using the battery of psychological techniques : 1. Schwartz technique that measures personal values and behavior in society, DORS procedure ( measurement of fatigue , monotony , satiety and stress ) and riffs that give the test once a week. Our assumption is

  16. Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation

    International Nuclear Information System (INIS)

    Tripathy, Rohit; Bilionis, Ilias; Gonzalez, Marcial

    2016-01-01

    Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range of physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To train the

  17. Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation

    Science.gov (United States)

    Tripathy, Rohit; Bilionis, Ilias; Gonzalez, Marcial

    2016-09-01

    Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range of physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To train the

  18. Gaussian processes with built-in dimensionality reduction: Applications to high-dimensional uncertainty propagation

    Energy Technology Data Exchange (ETDEWEB)

    Tripathy, Rohit, E-mail: rtripath@purdue.edu; Bilionis, Ilias, E-mail: ibilion@purdue.edu; Gonzalez, Marcial, E-mail: marcial-gonzalez@purdue.edu

    2016-09-15

    Uncertainty quantification (UQ) tasks, such as model calibration, uncertainty propagation, and optimization under uncertainty, typically require several thousand evaluations of the underlying computer codes. To cope with the cost of simulations, one replaces the real response surface with a cheap surrogate based, e.g., on polynomial chaos expansions, neural networks, support vector machines, or Gaussian processes (GP). However, the number of simulations required to learn a generic multivariate response grows exponentially as the input dimension increases. This curse of dimensionality can only be addressed, if the response exhibits some special structure that can be discovered and exploited. A wide range of physical responses exhibit a special structure known as an active subspace (AS). An AS is a linear manifold of the stochastic space characterized by maximal response variation. The idea is that one should first identify this low dimensional manifold, project the high-dimensional input onto it, and then link the projection to the output. If the dimensionality of the AS is low enough, then learning the link function is a much easier problem than the original problem of learning a high-dimensional function. The classic approach to discovering the AS requires gradient information, a fact that severely limits its applicability. Furthermore, and partly because of its reliance to gradients, it is not able to handle noisy observations. The latter is an essential trait if one wants to be able to propagate uncertainty through stochastic simulators, e.g., through molecular dynamics codes. In this work, we develop a probabilistic version of AS which is gradient-free and robust to observational noise. Our approach relies on a novel Gaussian process regression with built-in dimensionality reduction. In particular, the AS is represented as an orthogonal projection matrix that serves as yet another covariance function hyper-parameter to be estimated from the data. To train the

  19. Scalable Clustering of High-Dimensional Data Technique Using SPCM with Ant Colony Optimization Intelligence

    Directory of Open Access Journals (Sweden)

    Thenmozhi Srinivasan

    2015-01-01

    Full Text Available Clusters of high-dimensional data techniques are emerging, according to data noisy and poor quality challenges. This paper has been developed to cluster data using high-dimensional similarity based PCM (SPCM, with ant colony optimization intelligence which is effective in clustering nonspatial data without getting knowledge about cluster number from the user. The PCM becomes similarity based by using mountain method with it. Though this is efficient clustering, it is checked for optimization using ant colony algorithm with swarm intelligence. Thus the scalable clustering technique is obtained and the evaluation results are checked with synthetic datasets.

  20. The validation and assessment of machine learning: a game of prediction from high-dimensional data

    DEFF Research Database (Denmark)

    Pers, Tune Hannes; Albrechtsen, A; Holst, C

    2009-01-01

    In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often...... the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively....

  1. Stream/Bounce Event Perception Reveals a Temporal Limit of Motion Correspondence Based on Surface Feature over Space and Time

    Directory of Open Access Journals (Sweden)

    Yousuke Kawachi

    2011-06-01

    Full Text Available We examined how stream/bounce event perception is affected by motion correspondence based on the surface features of moving objects passing behind an occlusion. In the stream/bounce display two identical objects moving across each other in a two-dimensional display can be perceived as either streaming through or bouncing off each other at coincidence. Here, surface features such as colour (Experiments 1 and 2 or luminance (Experiment 3 were switched between the two objects at coincidence. The moment of coincidence was invisible to observers due to an occluder. Additionally, the presentation of the moving objects was manipulated in duration after the feature switch at coincidence. The results revealed that a postcoincidence duration of approximately 200 ms was required for the visual system to stabilize judgments of stream/bounce events by determining motion correspondence between the objects across the occlusion on the basis of the surface feature. The critical duration was similar across motion speeds of objects and types of surface features. Moreover, controls (Experiments 4a–4c showed that cognitive bias based on feature (colour/luminance congruency across the occlusion could not fully account for the effects of surface features on the stream/bounce judgments. We discuss the roles of motion correspondence, visual feature processing, and attentive tracking in the stream/bounce judgments.

  2. An irregular grid approach for pricing high-dimensional American options

    NARCIS (Netherlands)

    Berridge, S.J.; Schumacher, J.M.

    2008-01-01

    We propose and test a new method for pricing American options in a high-dimensional setting. The method is centered around the approximation of the associated complementarity problem on an irregular grid. We approximate the partial differential operator on this grid by appealing to the SDE

  3. Reconstruction of high-dimensional states entangled in orbital angular momentum using mutually unbiased measurements

    CSIR Research Space (South Africa)

    Giovannini, D

    2013-06-01

    Full Text Available : QELS_Fundamental Science, San Jose, California United States, 9-14 June 2013 Reconstruction of High-Dimensional States Entangled in Orbital Angular Momentum Using Mutually Unbiased Measurements D. Giovannini1, ⇤, J. Romero1, 2, J. Leach3, A...

  4. Global communication schemes for the numerical solution of high-dimensional PDEs

    DEFF Research Database (Denmark)

    Hupp, Philipp; Heene, Mario; Jacob, Riko

    2016-01-01

    The numerical treatment of high-dimensional partial differential equations is among the most compute-hungry problems and in urgent need for current and future high-performance computing (HPC) systems. It is thus also facing the grand challenges of exascale computing such as the requirement...

  5. High-Dimensional Exploratory Item Factor Analysis by a Metropolis-Hastings Robbins-Monro Algorithm

    Science.gov (United States)

    Cai, Li

    2010-01-01

    A Metropolis-Hastings Robbins-Monro (MH-RM) algorithm for high-dimensional maximum marginal likelihood exploratory item factor analysis is proposed. The sequence of estimates from the MH-RM algorithm converges with probability one to the maximum likelihood solution. Details on the computer implementation of this algorithm are provided. The…

  6. Estimating the effect of a variable in a high-dimensional regression model

    DEFF Research Database (Denmark)

    Jensen, Peter Sandholt; Wurtz, Allan

    assume that the effect is identified in a high-dimensional linear model specified by unconditional moment restrictions. We consider  properties of the following methods, which rely on lowdimensional models to infer the effect: Extreme bounds analysis, the minimum t-statistic over models, Sala...

  7. Multi-Scale Factor Analysis of High-Dimensional Brain Signals

    KAUST Repository

    Ting, Chee-Ming; Ombao, Hernando; Salleh, Sh-Hussain

    2017-01-01

    In this paper, we develop an approach to modeling high-dimensional networks with a large number of nodes arranged in a hierarchical and modular structure. We propose a novel multi-scale factor analysis (MSFA) model which partitions the massive

  8. Spectrally-Corrected Estimation for High-Dimensional Markowitz Mean-Variance Optimization

    NARCIS (Netherlands)

    Z. Bai (Zhidong); H. Li (Hua); M.J. McAleer (Michael); W.-K. Wong (Wing-Keung)

    2016-01-01

    textabstractThis paper considers the portfolio problem for high dimensional data when the dimension and size are both large. We analyze the traditional Markowitz mean-variance (MV) portfolio by large dimension matrix theory, and find the spectral distribution of the sample covariance is the main

  9. Using Localised Quadratic Functions on an Irregular Grid for Pricing High-Dimensional American Options

    NARCIS (Netherlands)

    Berridge, S.J.; Schumacher, J.M.

    2004-01-01

    We propose a method for pricing high-dimensional American options on an irregular grid; the method involves using quadratic functions to approximate the local effect of the Black-Scholes operator.Once such an approximation is known, one can solve the pricing problem by time stepping in an explicit

  10. Multigrid for high dimensional elliptic partial differential equations on non-equidistant grids

    NARCIS (Netherlands)

    bin Zubair, H.; Oosterlee, C.E.; Wienands, R.

    2006-01-01

    This work presents techniques, theory and numbers for multigrid in a general d-dimensional setting. The main focus is the multigrid convergence for high-dimensional partial differential equations (PDEs). As a model problem we have chosen the anisotropic diffusion equation, on a unit hypercube. We

  11. An Irregular Grid Approach for Pricing High-Dimensional American Options

    NARCIS (Netherlands)

    Berridge, S.J.; Schumacher, J.M.

    2004-01-01

    We propose and test a new method for pricing American options in a high-dimensional setting.The method is centred around the approximation of the associated complementarity problem on an irregular grid.We approximate the partial differential operator on this grid by appealing to the SDE

  12. Pricing and hedging high-dimensional American options : an irregular grid approach

    NARCIS (Netherlands)

    Berridge, S.; Schumacher, H.

    2002-01-01

    We propose and test a new method for pricing American options in a high dimensional setting. The method is centred around the approximation of the associated variational inequality on an irregular grid. We approximate the partial differential operator on this grid by appealing to the SDE

  13. Tracking in Object Action Space

    DEFF Research Database (Denmark)

    Krüger, Volker; Herzog, Dennis

    2013-01-01

    the space of the object affordances, i.e., the space of possible actions that are applied on a given object. This way, 3D body tracking reduces to action tracking in the object (and context) primed parameter space of the object affordances. This reduces the high-dimensional joint-space to a low...

  14. Tactile Earth and Space Science Materials for Students with Visual Impairments: Contours, Craters, Asteroids, and Features of Mars

    Science.gov (United States)

    Rule, Audrey C.

    2011-01-01

    New tactile curriculum materials for teaching Earth and planetary science lessons on rotation=revolution, silhouettes of objects from different views, contour maps, impact craters, asteroids, and topographic features of Mars to 11 elementary and middle school students with sight impairments at a week-long residential summer camp are presented…

  15. Feature fusion using kernel joint approximate diagonalization of eigen-matrices for rolling bearing fault identification

    Science.gov (United States)

    Liu, Yongbin; He, Bing; Liu, Fang; Lu, Siliang; Zhao, Yilei

    2016-12-01

    Fault pattern identification is a crucial step for the intelligent fault diagnosis of real-time health conditions in monitoring a mechanical system. However, many challenges exist in extracting the effective feature from vibration signals for fault recognition. A new feature fusion method is proposed in this study to extract new features using kernel joint approximate diagonalization of eigen-matrices (KJADE). In the method, the input space that is composed of original features is mapped into a high-dimensional feature space by nonlinear mapping. Then, the new features can be estimated through the eigen-decomposition of the fourth-order cumulative kernel matrix obtained from the feature space. Therefore, the proposed method could be used to reduce data redundancy because it extracts the inherent pattern structure of different fault classes as it is nonlinear by nature. The integration evaluation factor of between-class and within-class scatters (SS) is employed to depict the clustering performance quantitatively, and the new feature subset extracted by the proposed method is fed into a multi-class support vector machine for fault pattern identification. Finally, the effectiveness of the proposed method is verified by experimental vibration signals with different bearing fault types and severities. Results of several cases show that the KJADE algorithm is efficient in feature fusion for bearing fault identification.

  16. Materials Science Research Hardware for Application on the International Space Station: an Overview of Typical Hardware Requirements and Features

    Science.gov (United States)

    Schaefer, D. A.; Cobb, S.; Fiske, M. R.; Srinivas, R.

    2000-01-01

    NASA's Marshall Space Flight Center (MSFC) is the lead center for Materials Science Microgravity Research. The Materials Science Research Facility (MSRF) is a key development effort underway at MSFC. The MSRF will be the primary facility for microgravity materials science research on board the International Space Station (ISS) and will implement the NASA Materials Science Microgravity Research Program. It will operate in the U.S. Laboratory Module and support U. S. Microgravity Materials Science Investigations. This facility is being designed to maintain the momentum of the U.S. role in microgravity materials science and support NASA's Human Exploration and Development of Space (HEDS) Enterprise goals and objectives for Materials Science. The MSRF as currently envisioned will consist of three Materials Science Research Racks (MSRR), which will be deployed to the International Space Station (ISS) in phases, Each rack is being designed to accommodate various Experiment Modules, which comprise processing facilities for peer selected Materials Science experiments. Phased deployment will enable early opportunities for the U.S. and International Partners, and support the timely incorporation of technology updates to the Experiment Modules and sensor devices.

  17. Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

    Directory of Open Access Journals (Sweden)

    András Király

    2014-01-01

    Full Text Available During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data and biclustering (applied to gene expression data analysis. The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers.

  18. Non-intrusive low-rank separated approximation of high-dimensional stochastic models

    KAUST Repository

    Doostan, Alireza; Validi, AbdoulAhad; Iaccarino, Gianluca

    2013-01-01

    This work proposes a sampling-based (non-intrusive) approach within the context of low-. rank separated representations to tackle the issue of curse-of-dimensionality associated with the solution of models, e.g., PDEs/ODEs, with high-dimensional random inputs. Under some conditions discussed in details, the number of random realizations of the solution, required for a successful approximation, grows linearly with respect to the number of random inputs. The construction of the separated representation is achieved via a regularized alternating least-squares regression, together with an error indicator to estimate model parameters. The computational complexity of such a construction is quadratic in the number of random inputs. The performance of the method is investigated through its application to three numerical examples including two ODE problems with high-dimensional random inputs. © 2013 Elsevier B.V.

  19. Non-intrusive low-rank separated approximation of high-dimensional stochastic models

    KAUST Repository

    Doostan, Alireza

    2013-08-01

    This work proposes a sampling-based (non-intrusive) approach within the context of low-. rank separated representations to tackle the issue of curse-of-dimensionality associated with the solution of models, e.g., PDEs/ODEs, with high-dimensional random inputs. Under some conditions discussed in details, the number of random realizations of the solution, required for a successful approximation, grows linearly with respect to the number of random inputs. The construction of the separated representation is achieved via a regularized alternating least-squares regression, together with an error indicator to estimate model parameters. The computational complexity of such a construction is quadratic in the number of random inputs. The performance of the method is investigated through its application to three numerical examples including two ODE problems with high-dimensional random inputs. © 2013 Elsevier B.V.

  20. A Shell Multi-dimensional Hierarchical Cubing Approach for High-Dimensional Cube

    Science.gov (United States)

    Zou, Shuzhi; Zhao, Li; Hu, Kongfa

    The pre-computation of data cubes is critical for improving the response time of OLAP systems and accelerating data mining tasks in large data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional data warehouse, it might not be practical to build all these cuboids and their indices. In this paper, we propose a shell multi-dimensional hierarchical cubing algorithm, based on an extension of the previous minimal cubing approach. This method partitions the high dimensional data cube into low multi-dimensional hierarchical cube. Experimental results show that the proposed method is significantly more efficient than other existing cubing methods.

  1. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.

    Science.gov (United States)

    Cai, T Tony; Zhang, Anru

    2016-09-01

    Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.

  2. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data*

    Science.gov (United States)

    Cai, T. Tony; Zhang, Anru

    2016-01-01

    Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data. PMID:27777471

  3. THE MOBILE SPACE AND MOBILE TARGETING ENVIRONMENT FOR INTERNET USERS: FEATURES OF MODEL SUBMISSION AND USING IN EDUCATION

    OpenAIRE

    V. Bykov

    2013-01-01

    Article submitted the results of the analysis of the use of mobile devices in education. The substantiation of the definition of user mobility in the Internet space, taking into account the variability of mobile devices and communications. The use of mobile devices in the educational process is based on the paradigm of open and equal access to quality education. Considered the technology of using different types of devices and their functions . The conditions of user mobility in the internet ...

  4. EPS-LASSO: Test for High-Dimensional Regression Under Extreme Phenotype Sampling of Continuous Traits.

    Science.gov (United States)

    Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen

    2018-01-25

    Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please

  5. Controlling chaos in low and high dimensional systems with periodic parametric perturbations

    International Nuclear Information System (INIS)

    Mirus, K.A.; Sprott, J.C.

    1998-06-01

    The effect of applying a periodic perturbation to an accessible parameter of various chaotic systems is examined. Numerical results indicate that perturbation frequencies near the natural frequencies of the unstable periodic orbits of the chaotic systems can result in limit cycles for relatively small perturbations. Such perturbations can also control or significantly reduce the dimension of high-dimensional systems. Initial application to the control of fluctuations in a prototypical magnetic fusion plasma device will be reviewed

  6. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    OpenAIRE

    Zekić-Sušac, Marijana; Pfeifer, Sanja; Šarlija, Nataša

    2014-01-01

    Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART ...

  7. GAMLSS for high-dimensional data – a flexible approach based on boosting

    OpenAIRE

    Mayr, Andreas; Fenske, Nora; Hofner, Benjamin; Kneib, Thomas; Schmid, Matthias

    2010-01-01

    Generalized additive models for location, scale and shape (GAMLSS) are a popular semi-parametric modelling approach that, in contrast to conventional GAMs, regress not only the expected mean but every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSS are infeasible for high-dimensional data setups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algo...

  8. Preface [HD3-2015: International meeting on high-dimensional data-driven science

    International Nuclear Information System (INIS)

    2016-01-01

    A never-ending series of innovations in measurement technology and evolutions in information and communication technologies have led to the ongoing generation and accumulation of large quantities of high-dimensional data every day. While detailed data-centric approaches have been pursued in respective research fields, situations have been encountered where the same mathematical framework of high-dimensional data analysis can be found in a wide variety of seemingly unrelated research fields, such as estimation on the basis of undersampled Fourier transform in nuclear magnetic resonance spectroscopy in chemistry, in magnetic resonance imaging in medicine, and in astronomical interferometry in astronomy. In such situations, bringing diverse viewpoints together therefore becomes a driving force for the creation of innovative developments in various different research fields. This meeting focuses on “Sparse Modeling” (SpM) as a methodology for creation of innovative developments through the incorporation of a wide variety of viewpoints in various research fields. The objective of this meeting is to offer a forum where researchers with interest in SpM can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies for High-Dimensional Data-Driven science (HD 3 ). The meeting was held in Kyoto from 14-17 December 2015. We are pleased to publish 22 papers contributed by invited speakers in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of High-Dimensional Data-Driven science. (paper)

  9. A Bootstrap Based Measure Robust to the Choice of Normalization Methods for Detecting Rhythmic Features in High Dimensional Data.

    Science.gov (United States)

    Larriba, Yolanda; Rueda, Cristina; Fernández, Miguel A; Peddada, Shyamal D

    2018-01-01

    Motivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method. Results: In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset. Availability: A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.html.

  10. Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices.

    Science.gov (United States)

    Runcie, Daniel E; Mukherjee, Sayan

    2013-07-01

    Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed-effects model. The key idea of our model is that we need consider only G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse - affecting only a few observed traits. The advantages of this approach are twofold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set.

  11. Biomedical image representation approach using visualness and spatial information in a concept feature space for interactive region-of-interest-based retrieval.

    Science.gov (United States)

    Rahman, Md Mahmudur; Antani, Sameer K; Demner-Fushman, Dina; Thoma, George R

    2015-10-01

    This article presents an approach to biomedical image retrieval by mapping image regions to local concepts where images are represented in a weighted entropy-based concept feature space. The term "concept" refers to perceptually distinguishable visual patches that are identified locally in image regions and can be mapped to a glossary of imaging terms. Further, the visual significance (e.g., visualness) of concepts is measured as the Shannon entropy of pixel values in image patches and is used to refine the feature vector. Moreover, the system can assist the user in interactively selecting a region-of-interest (ROI) and searching for similar image ROIs. Further, a spatial verification step is used as a postprocessing step to improve retrieval results based on location information. The hypothesis that such approaches would improve biomedical image retrieval is validated through experiments on two different data sets, which are collected from open access biomedical literature.

  12. Performance Characterization of Loctite (Registered Trademark) 242 and 271 Liquid Locking Compounds (LLCs) as a Secondary Locking Feature for International Space Station (ISS) Fasteners

    Science.gov (United States)

    Dube, Michael J.; Gamwell, Wayne R.

    2011-01-01

    Several International Space Station (ISS) hardware components use Loctite (and other polymer based liquid locking compounds (LLCs)) as a means of meeting the secondary (redundant) locking feature requirement for fasteners. The primary locking method is the fastener preload, with the application of the Loctite compound which when cured is intended to resist preload reduction. The reliability of these compounds has been questioned due to a number of failures during ground testing. The ISS Program Manager requested the NASA Engineering and Safety Center (NESC) to characterize and quantify sensitivities of Loctite being used as a secondary locking feature. The findings and recommendations provided in this investigation apply to the anaerobic LLCs Loctite 242 and 271. No other anaerobic LLCs were evaluated for this investigation. This document contains the findings and recommendations of the NESC investigation

  13. Concepts, strategies and potentials using hypo-g and other features of the space environment for commercialization using higher plants

    Science.gov (United States)

    Krikorian, A. D.

    1985-01-01

    Opportunities for releasing, capturing, constructing and/or fixing the differential expressions or response potentials of the higher plant genome in the hypo-g environment for commercialization are explored. General strategies include improved plant-growing, crop and forestry production systems which conserve soil, water, labor and energy resources, and nutritional partitioning and mobilization of nutrients and synthates. Tissue and cell culture techniques of commercial potential include the growing and manipulation of cultured plant cells in vitro in a bioreactor to produce biologicals and secondary plants of economic value. The facilitation of plant breeding, the cloning of specific pathogen-free materials, the elimination of growing point or apex viruses, and the increase of plant yield are other O-g applications. The space environment may be advantageous in somatic embryogenesis, the culture of alkaloids, and the development of completely new crop plant germ plasm.

  14. Study of growth and development features of ten ground cover plants in Kish Island green space in warm season

    Directory of Open Access Journals (Sweden)

    S. Shooshtarian

    2016-05-01

    Full Text Available Having special ecological condition, Kish Island has a restricted range of native species of ornamental plants. Expansion of urban green space in this Island is great of importance due to its outstanding touristy position in the South of Iran. The purpose of this study was to investigate the growth and development of groundcover plants planted in four different regions of Kish Island and to recommend the most suitable and adaptable species for each region. Ten groundcover species included Festuca ovina L., Glaucium flavum Crantz., Frankenia thymifolia Desf., Sedum spurium Bieb., Sedum acre L., .Potentilla verna L., Carpobrotus acinaciformis (L. L. Bolus., Achillea millefolium L., Alternanthera dentata Moench. and Lampranthus spectabilis Haw. Evaluation of growth and development had been made by measurement of morphological characteristics such as height, covering area, leaf number and area, dry and fresh total weights and visual scoring. Physiological traits included proline and chlorophyll contents evaluated. This study was designed in factorial layout based on completely randomized blocks design with six replicates. Results showed that in terms of indices such as covering area, visual quality, height, total weight, and chlorophyll content, Pavioon and Sadaf plants had the most and the worst performances, respectively in comparison to other regions’ plants. Based on evaluated characteristics, C. acinaciformis, L. spectabilis and F. thymifolia had the most expansion and growth in all quadruplet regions and are recommend for planting in Kish Island and similar climates.

  15. THE MOBILE SPACE AND MOBILE TARGETING ENVIRONMENT FOR INTERNET USERS: FEATURES OF MODEL SUBMISSION AND USING IN EDUCATION

    Directory of Open Access Journals (Sweden)

    V. Bykov

    2013-08-01

    Full Text Available Article submitted the results of the analysis of the use of mobile devices in education. The substantiation of the definition of user mobility in the Internet space, taking into account the variability of mobile devices and communications. The use of mobile devices in the educational process is based on the paradigm of open and equal access to quality education. Considered the technology of using different types of devices and their functions . The conditions of user mobility in the internet environment, the factors influencing it, the creation and storage of mobile communications resources . Provided with basic mathematical model of user behavior in a virtual network. A model of migration as a user from device to device , and its geographic move , and then use the resulting model for the design of distance learning systems . Preliminary forecasts have been made on the development of education in the transition from the remote technology to open. It is assumed the appearance of new types of personal devices that will combine the power of a desktop PC and the autonomy of smartphones with constant access for broadband wireless connection to the Internet. The use of cloud technology to store and process information resources training helps centralize and synchronize data and access to them from different devices.

  16. Transforming high-dimensional potential energy surfaces into sum-of-products form using Monte Carlo methods

    Science.gov (United States)

    Schröder, Markus; Meyer, Hans-Dieter

    2017-08-01

    We propose a Monte Carlo method, "Monte Carlo Potfit," for transforming high-dimensional potential energy surfaces evaluated on discrete grid points into a sum-of-products form, more precisely into a Tucker form. To this end we use a variational ansatz in which we replace numerically exact integrals with Monte Carlo integrals. This largely reduces the numerical cost by avoiding the evaluation of the potential on all grid points and allows a treatment of surfaces up to 15-18 degrees of freedom. We furthermore show that the error made with this ansatz can be controlled and vanishes in certain limits. We present calculations on the potential of HFCO to demonstrate the features of the algorithm. To demonstrate the power of the method, we transformed a 15D potential of the protonated water dimer (Zundel cation) in a sum-of-products form and calculated the ground and lowest 26 vibrationally excited states of the Zundel cation with the multi-configuration time-dependent Hartree method.

  17. On-chip generation of high-dimensional entangled quantum states and their coherent control.

    Science.gov (United States)

    Kues, Michael; Reimer, Christian; Roztocki, Piotr; Cortés, Luis Romero; Sciara, Stefania; Wetzel, Benjamin; Zhang, Yanbing; Cino, Alfonso; Chu, Sai T; Little, Brent E; Moss, David J; Caspani, Lucia; Azaña, José; Morandotti, Roberto

    2017-06-28

    Optical quantum states based on entangled photons are essential for solving questions in fundamental physics and are at the heart of quantum information science. Specifically, the realization of high-dimensional states (D-level quantum systems, that is, qudits, with D > 2) and their control are necessary for fundamental investigations of quantum mechanics, for increasing the sensitivity of quantum imaging schemes, for improving the robustness and key rate of quantum communication protocols, for enabling a richer variety of quantum simulations, and for achieving more efficient and error-tolerant quantum computation. Integrated photonics has recently become a leading platform for the compact, cost-efficient, and stable generation and processing of non-classical optical states. However, so far, integrated entangled quantum sources have been limited to qubits (D = 2). Here we demonstrate on-chip generation of entangled qudit states, where the photons are created in a coherent superposition of multiple high-purity frequency modes. In particular, we confirm the realization of a quantum system with at least one hundred dimensions, formed by two entangled qudits with D = 10. Furthermore, using state-of-the-art, yet off-the-shelf telecommunications components, we introduce a coherent manipulation platform with which to control frequency-entangled states, capable of performing deterministic high-dimensional gate operations. We validate this platform by measuring Bell inequality violations and performing quantum state tomography. Our work enables the generation and processing of high-dimensional quantum states in a single spatial mode.

  18. Feature Selection via Chaotic Antlion Optimization.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting while minimizing the number of features used.We propose an optimization approach for the feature selection problem that considers a "chaotic" version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature. The balance between exploration of the search space and exploitation of the best solutions is a challenge in multi-objective optimization. The exploration/exploitation rate is controlled by the parameter I that limits the random walk range of the ants/prey. This variable is increased iteratively in a quasi-linear manner to decrease the exploration rate as the optimization progresses. The quasi-linear decrease in the variable I may lead to immature convergence in some cases and trapping in local minima in other cases. The chaotic system proposed here attempts to improve the tradeoff between exploration and exploitation. The methodology is evaluated using different chaotic maps on a number of feature selection datasets. To ensure generality, we used ten biological datasets, but we also used other types of data from various sources. The results are compared with the particle swarm optimizer and with genetic algorithm variants for feature selection using a set of quality metrics.

  19. High-dimensional chaos from self-sustained collisions of solitons

    Energy Technology Data Exchange (ETDEWEB)

    Yildirim, O. Ozgur, E-mail: donhee@seas.harvard.edu, E-mail: oozgury@gmail.com [Cavium, Inc., 600 Nickerson Rd., Marlborough, Massachusetts 01752 (United States); Ham, Donhee, E-mail: donhee@seas.harvard.edu, E-mail: oozgury@gmail.com [Harvard University, 33 Oxford St., Cambridge, Massachusetts 02138 (United States)

    2014-06-16

    We experimentally demonstrate chaos generation based on collisions of electrical solitons on a nonlinear transmission line. The nonlinear line creates solitons, and an amplifier connected to it provides gain to these solitons for their self-excitation and self-sustenance. Critically, the amplifier also provides a mechanism to enable and intensify collisions among solitons. These collisional interactions are of intrinsically nonlinear nature, modulating the phase and amplitude of solitons, thus causing chaos. This chaos generated by the exploitation of the nonlinear wave phenomena is inherently high-dimensional, which we also demonstrate.

  20. A novel algorithm of artificial immune system for high-dimensional function numerical optimization

    Institute of Scientific and Technical Information of China (English)

    DU Haifeng; GONG Maoguo; JIAO Licheng; LIU Ruochen

    2005-01-01

    Based on the clonal selection theory and immune memory theory, a novel artificial immune system algorithm, immune memory clonal programming algorithm (IMCPA), is put forward. Using the theorem of Markov chain, it is proved that IMCPA is convergent. Compared with some other evolutionary programming algorithms (like Breeder genetic algorithm), IMCPA is shown to be an evolutionary strategy capable of solving complex machine learning tasks, like high-dimensional function optimization, which maintains the diversity of the population and avoids prematurity to some extent, and has a higher convergence speed.

  1. Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.

    Science.gov (United States)

    Kong, Shengchun; Nan, Bin

    2014-01-01

    We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.

  2. High-dimensional data: p >> n in mathematical statistics and bio-medical applications

    OpenAIRE

    Van De Geer, Sara A.; Van Houwelingen, Hans C.

    2004-01-01

    The workshop 'High-dimensional data: p >> n in mathematical statistics and bio-medical applications' was held at the Lorentz Center in Leiden from 9 to 20 September 2002. This special issue of Bernoulli contains a selection of papers presented at that workshop. ¶ The introduction of high-throughput micro-array technology to measure gene-expression levels and the publication of the pioneering paper by Golub et al. (1999) has brought to life a whole new branch of data analysis under the name of...

  3. DataHigh: graphical user interface for visualizing and interacting with high-dimensional neural activity

    Science.gov (United States)

    Cowley, Benjamin R.; Kaufman, Matthew T.; Butler, Zachary S.; Churchland, Mark M.; Ryu, Stephen I.; Shenoy, Krishna V.; Yu, Byron M.

    2013-12-01

    Objective. Analyzing and interpreting the activity of a heterogeneous population of neurons can be challenging, especially as the number of neurons, experimental trials, and experimental conditions increases. One approach is to extract a set of latent variables that succinctly captures the prominent co-fluctuation patterns across the neural population. A key problem is that the number of latent variables needed to adequately describe the population activity is often greater than 3, thereby preventing direct visualization of the latent space. By visualizing a small number of 2-d projections of the latent space or each latent variable individually, it is easy to miss salient features of the population activity. Approach. To address this limitation, we developed a Matlab graphical user interface (called DataHigh) that allows the user to quickly and smoothly navigate through a continuum of different 2-d projections of the latent space. We also implemented a suite of additional visualization tools (including playing out population activity timecourses as a movie and displaying summary statistics, such as covariance ellipses and average timecourses) and an optional tool for performing dimensionality reduction. Main results. To demonstrate the utility and versatility of DataHigh, we used it to analyze single-trial spike count and single-trial timecourse population activity recorded using a multi-electrode array, as well as trial-averaged population activity recorded using single electrodes. Significance. DataHigh was developed to fulfil a need for visualization in exploratory neural data analysis, which can provide intuition that is critical for building scientific hypotheses and models of population activity.

  4. DataHigh: graphical user interface for visualizing and interacting with high-dimensional neural activity.

    Science.gov (United States)

    Cowley, Benjamin R; Kaufman, Matthew T; Butler, Zachary S; Churchland, Mark M; Ryu, Stephen I; Shenoy, Krishna V; Yu, Byron M

    2013-12-01

    Analyzing and interpreting the activity of a heterogeneous population of neurons can be challenging, especially as the number of neurons, experimental trials, and experimental conditions increases. One approach is to extract a set of latent variables that succinctly captures the prominent co-fluctuation patterns across the neural population. A key problem is that the number of latent variables needed to adequately describe the population activity is often greater than 3, thereby preventing direct visualization of the latent space. By visualizing a small number of 2-d projections of the latent space or each latent variable individually, it is easy to miss salient features of the population activity. To address this limitation, we developed a Matlab graphical user interface (called DataHigh) that allows the user to quickly and smoothly navigate through a continuum of different 2-d projections of the latent space. We also implemented a suite of additional visualization tools (including playing out population activity timecourses as a movie and displaying summary statistics, such as covariance ellipses and average timecourses) and an optional tool for performing dimensionality reduction. To demonstrate the utility and versatility of DataHigh, we used it to analyze single-trial spike count and single-trial timecourse population activity recorded using a multi-electrode array, as well as trial-averaged population activity recorded using single electrodes. DataHigh was developed to fulfil a need for visualization in exploratory neural data analysis, which can provide intuition that is critical for building scientific hypotheses and models of population activity.

  5. DataHigh: Graphical user interface for visualizing and interacting with high-dimensional neural activity

    Science.gov (United States)

    Cowley, Benjamin R.; Kaufman, Matthew T.; Butler, Zachary S.; Churchland, Mark M.; Ryu, Stephen I.; Shenoy, Krishna V.; Yu, Byron M.

    2014-01-01

    Objective Analyzing and interpreting the activity of a heterogeneous population of neurons can be challenging, especially as the number of neurons, experimental trials, and experimental conditions increases. One approach is to extract a set of latent variables that succinctly captures the prominent co-fluctuation patterns across the neural population. A key problem is that the number of latent variables needed to adequately describe the population activity is often greater than three, thereby preventing direct visualization of the latent space. By visualizing a small number of 2-d projections of the latent space or each latent variable individually, it is easy to miss salient features of the population activity. Approach To address this limitation, we developed a Matlab graphical user interface (called DataHigh) that allows the user to quickly and smoothly navigate through a continuum of different 2-d projections of the latent space. We also implemented a suite of additional visualization tools (including playing out population activity timecourses as a movie and displaying summary statistics, such as covariance ellipses and average timecourses) and an optional tool for performing dimensionality reduction. Main results To demonstrate the utility and versatility of DataHigh, we used it to analyze single-trial spike count and single-trial timecourse population activity recorded using a multi-electrode array, as well as trial-averaged population activity recorded using single electrodes. Significance DataHigh was developed to fulfill a need for visualization in exploratory neural data analysis, which can provide intuition that is critical for building scientific hypotheses and models of population activity. PMID:24216250

  6. Discriminative kernel feature extraction and learning for object recognition and detection

    DEFF Research Database (Denmark)

    Pan, Hong; Olsen, Søren Ingvor; Zhu, Yaping

    2015-01-01

    Feature extraction and learning is critical for object recognition and detection. By embedding context cue of image attributes into the kernel descriptors, we propose a set of novel kernel descriptors called context kernel descriptors (CKD). The motivation of CKD is to use the spatial consistency...... even in high-dimensional space. In addition, the latent connection between Rényi quadratic entropy and the mapping data in kernel feature space further facilitates us to capture the geometric structure as well as the information about the underlying labels of the CKD using CSQMI. Thus the resulting...... codebook and reduced CKD are discriminative. We report superior performance of our algorithm for object recognition on benchmark datasets like Caltech-101 and CIFAR-10, as well as for detection on a challenging chicken feet dataset....

  7. Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data.

    Science.gov (United States)

    Zhao, Yize; Kang, Jian; Long, Qi

    2018-01-01

    Ultra-high dimensional variable selection has become increasingly important in analysis of neuroimaging data. For example, in the Autism Brain Imaging Data Exchange (ABIDE) study, neuroscientists are interested in identifying important biomarkers for early detection of the autism spectrum disorder (ASD) using high resolution brain images that include hundreds of thousands voxels. However, most existing methods are not feasible for solving this problem due to their extensive computational costs. In this work, we propose a novel multiresolution variable selection procedure under a Bayesian probit regression framework. It recursively uses posterior samples for coarser-scale variable selection to guide the posterior inference on finer-scale variable selection, leading to very efficient Markov chain Monte Carlo (MCMC) algorithms. The proposed algorithms are computationally feasible for ultra-high dimensional data. Also, our model incorporates two levels of structural information into variable selection using Ising priors: the spatial dependence between voxels and the functional connectivity between anatomical brain regions. Applied to the resting state functional magnetic resonance imaging (R-fMRI) data in the ABIDE study, our methods identify voxel-level imaging biomarkers highly predictive of the ASD, which are biologically meaningful and interpretable. Extensive simulations also show that our methods achieve better performance in variable selection compared to existing methods.

  8. Energy Efficient MAC Scheme for Wireless Sensor Networks with High-Dimensional Data Aggregate

    Directory of Open Access Journals (Sweden)

    Seokhoon Kim

    2015-01-01

    Full Text Available This paper presents a novel and sustainable medium access control (MAC scheme for wireless sensor network (WSN systems that process high-dimensional aggregated data. Based on a preamble signal and buffer threshold analysis, it maximizes the energy efficiency of the wireless sensor devices which have limited energy resources. The proposed group management MAC (GM-MAC approach not only sets the buffer threshold value of a sensor device to be reciprocal to the preamble signal but also sets a transmittable group value to each sensor device by using the preamble signal of the sink node. The primary difference between the previous and the proposed approach is that existing state-of-the-art schemes use duty cycle and sleep mode to save energy consumption of individual sensor devices, whereas the proposed scheme employs the group management MAC scheme for sensor devices to maximize the overall energy efficiency of the whole WSN systems by minimizing the energy consumption of sensor devices located near the sink node. Performance evaluations show that the proposed scheme outperforms the previous schemes in terms of active time of sensor devices, transmission delay, control overhead, and energy consumption. Therefore, the proposed scheme is suitable for sensor devices in a variety of wireless sensor networking environments with high-dimensional data aggregate.

  9. The validation and assessment of machine learning: a game of prediction from high-dimensional data.

    Directory of Open Access Journals (Sweden)

    Tune H Pers

    Full Text Available In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.

  10. High-dimensional quantum key distribution with the entangled single-photon-added coherent state

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Yang [Zhengzhou Information Science and Technology Institute, Zhengzhou, 450001 (China); Synergetic Innovation Center of Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, Anhui 230026 (China); Bao, Wan-Su, E-mail: 2010thzz@sina.com [Zhengzhou Information Science and Technology Institute, Zhengzhou, 450001 (China); Synergetic Innovation Center of Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, Anhui 230026 (China); Bao, Hai-Ze; Zhou, Chun; Jiang, Mu-Sheng; Li, Hong-Wei [Zhengzhou Information Science and Technology Institute, Zhengzhou, 450001 (China); Synergetic Innovation Center of Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, Anhui 230026 (China)

    2017-04-25

    High-dimensional quantum key distribution (HD-QKD) can generate more secure bits for one detection event so that it can achieve long distance key distribution with a high secret key capacity. In this Letter, we present a decoy state HD-QKD scheme with the entangled single-photon-added coherent state (ESPACS) source. We present two tight formulas to estimate the single-photon fraction of postselected events and Eve's Holevo information and derive lower bounds on the secret key capacity and the secret key rate of our protocol. We also present finite-key analysis for our protocol by using the Chernoff bound. Our numerical results show that our protocol using one decoy state can perform better than that of previous HD-QKD protocol with the spontaneous parametric down conversion (SPDC) using two decoy states. Moreover, when considering finite resources, the advantage is more obvious. - Highlights: • Implement the single-photon-added coherent state source into the high-dimensional quantum key distribution. • Enhance both the secret key capacity and the secret key rate compared with previous schemes. • Show an excellent performance in view of statistical fluctuations.

  11. Quantum secret sharing based on modulated high-dimensional time-bin entanglement

    International Nuclear Information System (INIS)

    Takesue, Hiroki; Inoue, Kyo

    2006-01-01

    We propose a scheme for quantum secret sharing (QSS) that uses a modulated high-dimensional time-bin entanglement. By modulating the relative phase randomly by {0,π}, a sender with the entanglement source can randomly change the sign of the correlation of the measurement outcomes obtained by two distant recipients. The two recipients must cooperate if they are to obtain the sign of the correlation, which is used as a secret key. We show that our scheme is secure against intercept-and-resend (IR) and beam splitting attacks by an outside eavesdropper thanks to the nonorthogonality of high-dimensional time-bin entangled states. We also show that a cheating attempt based on an IR attack by one of the recipients can be detected by changing the dimension of the time-bin entanglement randomly and inserting two 'vacant' slots between the packets. Then, cheating attempts can be detected by monitoring the count rate in the vacant slots. The proposed scheme has better experimental feasibility than previously proposed entanglement-based QSS schemes

  12. Similarity measurement method of high-dimensional data based on normalized net lattice subspace

    Institute of Scientific and Technical Information of China (English)

    Li Wenfa; Wang Gongming; Li Ke; Huang Su

    2017-01-01

    The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity, leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals, and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this meth-od, three data types are used, and seven common similarity measurement methods are compared. The experimental result indicates that the relative difference of the method is increasing with the di-mensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition, the similarity range of this method in different dimensions is [0, 1], which is fit for similarity analysis after dimensionality reduction.

  13. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    Science.gov (United States)

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  14. High-dimensional quantum key distribution with the entangled single-photon-added coherent state

    International Nuclear Information System (INIS)

    Wang, Yang; Bao, Wan-Su; Bao, Hai-Ze; Zhou, Chun; Jiang, Mu-Sheng; Li, Hong-Wei

    2017-01-01

    High-dimensional quantum key distribution (HD-QKD) can generate more secure bits for one detection event so that it can achieve long distance key distribution with a high secret key capacity. In this Letter, we present a decoy state HD-QKD scheme with the entangled single-photon-added coherent state (ESPACS) source. We present two tight formulas to estimate the single-photon fraction of postselected events and Eve's Holevo information and derive lower bounds on the secret key capacity and the secret key rate of our protocol. We also present finite-key analysis for our protocol by using the Chernoff bound. Our numerical results show that our protocol using one decoy state can perform better than that of previous HD-QKD protocol with the spontaneous parametric down conversion (SPDC) using two decoy states. Moreover, when considering finite resources, the advantage is more obvious. - Highlights: • Implement the single-photon-added coherent state source into the high-dimensional quantum key distribution. • Enhance both the secret key capacity and the secret key rate compared with previous schemes. • Show an excellent performance in view of statistical fluctuations.

  15. High-Dimensional Single-Photon Quantum Gates: Concepts and Experiments.

    Science.gov (United States)

    Babazadeh, Amin; Erhard, Manuel; Wang, Feiran; Malik, Mehul; Nouroozi, Rahman; Krenn, Mario; Zeilinger, Anton

    2017-11-03

    Transformations on quantum states form a basic building block of every quantum information system. From photonic polarization to two-level atoms, complete sets of quantum gates for a variety of qubit systems are well known. For multilevel quantum systems beyond qubits, the situation is more challenging. The orbital angular momentum modes of photons comprise one such high-dimensional system for which generation and measurement techniques are well studied. However, arbitrary transformations for such quantum states are not known. Here we experimentally demonstrate a four-dimensional generalization of the Pauli X gate and all of its integer powers on single photons carrying orbital angular momentum. Together with the well-known Z gate, this forms the first complete set of high-dimensional quantum gates implemented experimentally. The concept of the X gate is based on independent access to quantum states with different parities and can thus be generalized to other photonic degrees of freedom and potentially also to other quantum systems.

  16. TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES.

    Science.gov (United States)

    Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn

    2017-09-01

    Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however statistical methods have been limited by the high dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with Sparse Principal Component Analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene "relationship" matrices that are of practical interest, such as the weighted adjacency matrices.

  17. Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications.

    Science.gov (United States)

    Tao, Chenyang; Nichols, Thomas E; Hua, Xue; Ching, Christopher R K; Rolls, Edmund T; Thompson, Paul M; Feng, Jianfeng

    2017-01-01

    We propose a generalized reduced rank latent factor regression model (GRRLF) for the analysis of tensor field responses and high dimensional covariates. The model is motivated by the need from imaging-genetic studies to identify genetic variants that are associated with brain imaging phenotypes, often in the form of high dimensional tensor fields. GRRLF identifies from the structure in the data the effective dimensionality of the data, and then jointly performs dimension reduction of the covariates, dynamic identification of latent factors, and nonparametric estimation of both covariate and latent response fields. After accounting for the latent and covariate effects, GRLLF performs a nonparametric test on the remaining factor of interest. GRRLF provides a better factorization of the signals compared with common solutions, and is less susceptible to overfitting because it exploits the effective dimensionality. The generality and the flexibility of GRRLF also allow various statistical models to be handled in a unified framework and solutions can be efficiently computed. Within the field of neuroimaging, it improves the sensitivity for weak signals and is a promising alternative to existing approaches. The operation of the framework is demonstrated with both synthetic datasets and a real-world neuroimaging example in which the effects of a set of genes on the structure of the brain at the voxel level were measured, and the results compared favorably with those from existing approaches. Copyright © 2016. Published by Elsevier Inc.

  18. Challenges and Approaches to Statistical Design and Inference in High Dimensional Investigations

    Science.gov (United States)

    Garrett, Karen A.; Allison, David B.

    2015-01-01

    Summary Advances in modern technologies have facilitated high-dimensional experiments (HDEs) that generate tremendous amounts of genomic, proteomic, and other “omic” data. HDEs involving whole-genome sequences and polymorphisms, expression levels of genes, protein abundance measurements, and combinations thereof have become a vanguard for new analytic approaches to the analysis of HDE data. Such situations demand creative approaches to the processes of statistical inference, estimation, prediction, classification, and study design. The novel and challenging biological questions asked from HDE data have resulted in many specialized analytic techniques being developed. This chapter discusses some of the unique statistical challenges facing investigators studying high-dimensional biology, and describes some approaches being developed by statistical scientists. We have included some focus on the increasing interest in questions involving testing multiple propositions simultaneously, appropriate inferential indicators for the types of questions biologists are interested in, and the need for replication of results across independent studies, investigators, and settings. A key consideration inherent throughout is the challenge in providing methods that a statistician judges to be sound and a biologist finds informative. PMID:19588106

  19. Challenges and approaches to statistical design and inference in high-dimensional investigations.

    Science.gov (United States)

    Gadbury, Gary L; Garrett, Karen A; Allison, David B

    2009-01-01

    Advances in modern technologies have facilitated high-dimensional experiments (HDEs) that generate tremendous amounts of genomic, proteomic, and other "omic" data. HDEs involving whole-genome sequences and polymorphisms, expression levels of genes, protein abundance measurements, and combinations thereof have become a vanguard for new analytic approaches to the analysis of HDE data. Such situations demand creative approaches to the processes of statistical inference, estimation, prediction, classification, and study design. The novel and challenging biological questions asked from HDE data have resulted in many specialized analytic techniques being developed. This chapter discusses some of the unique statistical challenges facing investigators studying high-dimensional biology and describes some approaches being developed by statistical scientists. We have included some focus on the increasing interest in questions involving testing multiple propositions simultaneously, appropriate inferential indicators for the types of questions biologists are interested in, and the need for replication of results across independent studies, investigators, and settings. A key consideration inherent throughout is the challenge in providing methods that a statistician judges to be sound and a biologist finds informative.

  20. Innovation Rather than Improvement: A Solvable High-Dimensional Model Highlights the Limitations of Scalar Fitness

    Science.gov (United States)

    Tikhonov, Mikhail; Monasson, Remi

    2018-01-01

    Much of our understanding of ecological and evolutionary mechanisms derives from analysis of low-dimensional models: with few interacting species, or few axes defining "fitness". It is not always clear to what extent the intuition derived from low-dimensional models applies to the complex, high-dimensional reality. For instance, most naturally occurring microbial communities are strikingly diverse, harboring a large number of coexisting species, each of which contributes to shaping the environment of others. Understanding the eco-evolutionary interplay in these systems is an important challenge, and an exciting new domain for statistical physics. Recent work identified a promising new platform for investigating highly diverse ecosystems, based on the classic resource competition model of MacArthur. Here, we describe how the same analytical framework can be used to study evolutionary questions. Our analysis illustrates how, at high dimension, the intuition promoted by a one-dimensional (scalar) notion of fitness can become misleading. Specifically, while the low-dimensional picture emphasizes organism cost or efficiency, we exhibit a regime where cost becomes irrelevant for survival, and link this observation to generic properties of high-dimensional geometry.

  1. AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data

    OpenAIRE

    Yu, Wenbao; Park, Taesung

    2014-01-01

    Motivation It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. Results We propose an AUC-based approach u...

  2. A kernel-based multi-feature image representation for histopathology image classification

    International Nuclear Information System (INIS)

    Moreno J; Caicedo J Gonzalez F

    2010-01-01

    This paper presents a novel strategy for building a high-dimensional feature space to represent histopathology image contents. Histogram features, related to colors, textures and edges, are combined together in a unique image representation space using kernel functions. This feature space is further enhanced by the application of latent semantic analysis, to model hidden relationships among visual patterns. All that information is included in the new image representation space. Then, support vector machine classifiers are used to assign semantic labels to images. Processing and classification algorithms operate on top of kernel functions, so that; the structure of the feature space is completely controlled using similarity measures and a dual representation. The proposed approach has shown a successful performance in a classification task using a dataset with 1,502 real histopathology images in 18 different classes. The results show that our approach for histological image classification obtains an improved average performance of 20.6% when compared to a conventional classification approach based on SVM directly applied to the original kernel.

  3. Manifold Learning with Self-Organizing Mapping for Feature Extraction of Nonlinear Faults in Rotating Machinery

    Directory of Open Access Journals (Sweden)

    Lin Liang

    2015-01-01

    Full Text Available A new method for extracting the low-dimensional feature automatically with self-organization mapping manifold is proposed for the detection of rotating mechanical nonlinear faults (such as rubbing, pedestal looseness. Under the phase space reconstructed by single vibration signal, the self-organization mapping (SOM with expectation maximization iteration algorithm is used to divide the local neighborhoods adaptively without manual intervention. After that, the local tangent space alignment algorithm is adopted to compress the high-dimensional phase space into low-dimensional feature space. The proposed method takes advantages of the manifold learning in low-dimensional feature extraction and adaptive neighborhood construction of SOM and can extract intrinsic fault features of interest in two dimensional projection space. To evaluate the performance of the proposed method, the Lorenz system was simulated and rotation machinery with nonlinear faults was obtained for test purposes. Compared with the holospectrum approaches, the results reveal that the proposed method is superior in identifying faults and effective for rotating machinery condition monitoring.

  4. A KERNEL-BASED MULTI-FEATURE IMAGE REPRESENTATION FOR HISTOPATHOLOGY IMAGE CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    J Carlos Moreno

    2010-09-01

    Full Text Available This paper presents a novel strategy for building a high-dimensional feature space to represent histopathology image contents. Histogram features, related to colors, textures and edges, are combined together in a unique image representation space using kernel functions. This feature space is further enhanced by the application of Latent Semantic Analysis, to model hidden relationships among visual patterns. All that information is included in the new image representation space. Then, Support Vector Machine classifiers are used to assign semantic labels to images. Processing and classification algorithms operate on top of kernel functions, so that, the structure of the feature space is completely controlled using similarity measures and a dual representation. The proposed approach has shown a successful performance in a classification task using a dataset with 1,502 real histopathology images in 18 different classes. The results show that our approach for histological image classification obtains an improved average performance of 20.6% when compared to a conventional classification approach based on SVM directly applied to the original kernel.

  5. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

    Science.gov (United States)

    Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

    2015-01-01

    Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  6. High dimensional biological data retrieval optimization with NoSQL technology

    Science.gov (United States)

    2014-01-01

    Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data

  7. High dimensional biological data retrieval optimization with NoSQL technology.

    Science.gov (United States)

    Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike

    2014-01-01

    High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating

  8. Classification Influence of Features on Given Emotions and Its Application in Feature Selection

    Science.gov (United States)

    Xing, Yin; Chen, Chuang; Liu, Li-Long

    2018-04-01

    In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.

  9. Penalized estimation for competing risks regression with applications to high-dimensional covariates

    DEFF Research Database (Denmark)

    Ambrogi, Federico; Scheike, Thomas H.

    2016-01-01

    of competing events. The direct binomial regression model of Scheike and others (2008. Predicting cumulative incidence probability by direct binomial regression. Biometrika 95: (1), 205-220) is reformulated in a penalized framework to possibly fit a sparse regression model. The developed approach is easily...... Research 19: (1), 29-51), the research regarding competing risks is less developed (Binder and others, 2009. Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25: (7), 890-896). The aim of this work is to consider how to do penalized regression in the presence...... implementable using existing high-performance software to do penalized regression. Results from simulation studies are presented together with an application to genomic data when the endpoint is progression-free survival. An R function is provided to perform regularized competing risks regression according...

  10. Entanglement dynamics of high-dimensional bipartite field states inside the cavities in dissipative environments

    Energy Technology Data Exchange (ETDEWEB)

    Tahira, Rabia; Ikram, Manzoor; Zubairy, M Suhail [Centre for Quantum Physics, COMSATS Institute of Information Technology, Islamabad (Pakistan); Bougouffa, Smail [Department of Physics, Faculty of Science, Taibah University, PO Box 30002, Madinah (Saudi Arabia)

    2010-02-14

    We investigate the phenomenon of sudden death of entanglement in a high-dimensional bipartite system subjected to dissipative environments with an arbitrary initial pure entangled state between two fields in the cavities. We find that in a vacuum reservoir, the presence of the state where one or more than one (two) photons in each cavity are present is a necessary condition for the sudden death of entanglement. Otherwise entanglement remains for infinite time and decays asymptotically with the decay of individual qubits. For pure two-qubit entangled states in a thermal environment, we observe that sudden death of entanglement always occurs. The sudden death time of the entangled states is related to the number of photons in the cavities, the temperature of the reservoir and the initial preparation of the entangled states.

  11. Entanglement dynamics of high-dimensional bipartite field states inside the cavities in dissipative environments

    International Nuclear Information System (INIS)

    Tahira, Rabia; Ikram, Manzoor; Zubairy, M Suhail; Bougouffa, Smail

    2010-01-01

    We investigate the phenomenon of sudden death of entanglement in a high-dimensional bipartite system subjected to dissipative environments with an arbitrary initial pure entangled state between two fields in the cavities. We find that in a vacuum reservoir, the presence of the state where one or more than one (two) photons in each cavity are present is a necessary condition for the sudden death of entanglement. Otherwise entanglement remains for infinite time and decays asymptotically with the decay of individual qubits. For pure two-qubit entangled states in a thermal environment, we observe that sudden death of entanglement always occurs. The sudden death time of the entangled states is related to the number of photons in the cavities, the temperature of the reservoir and the initial preparation of the entangled states.

  12. Time–energy high-dimensional one-side device-independent quantum key distribution

    International Nuclear Information System (INIS)

    Bao Hai-Ze; Bao Wan-Su; Wang Yang; Chen Rui-Ke; Ma Hong-Xin; Zhou Chun; Li Hong-Wei

    2017-01-01

    Compared with full device-independent quantum key distribution (DI-QKD), one-side device-independent QKD (1sDI-QKD) needs fewer requirements, which is much easier to meet. In this paper, by applying recently developed novel time–energy entropic uncertainty relations, we present a time–energy high-dimensional one-side device-independent quantum key distribution (HD-QKD) and provide the security proof against coherent attacks. Besides, we connect the security with the quantum steering. By numerical simulation, we obtain the secret key rate for Alice’s different detection efficiencies. The results show that our protocol can performance much better than the original 1sDI-QKD. Furthermore, we clarify the relation among the secret key rate, Alice’s detection efficiency, and the dispersion coefficient. Finally, we simply analyze its performance in the optical fiber channel. (paper)

  13. A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

    DEFF Research Database (Denmark)

    Abrahamsen, Trine Julie; Hansen, Lars Kai

    2011-01-01

    Small sample high-dimensional principal component analysis (PCA) suffers from variance inflation and lack of generalizability. It has earlier been pointed out that a simple leave-one-out variance renormalization scheme can cure the problem. In this paper we generalize the cure in two directions......: First, we propose a computationally less intensive approximate leave-one-out estimator, secondly, we show that variance inflation is also present in kernel principal component analysis (kPCA) and we provide a non-parametric renormalization scheme which can quite efficiently restore generalizability in kPCA....... As for PCA our analysis also suggests a simplified approximate expression. © 2011 Trine J. Abrahamsen and Lars K. Hansen....

  14. Diagonal Likelihood Ratio Test for Equality of Mean Vectors in High-Dimensional Data

    KAUST Repository

    Hu, Zongliang; Tong, Tiejun; Genton, Marc G.

    2017-01-01

    We propose a likelihood ratio test framework for testing normal mean vectors in high-dimensional data under two common scenarios: the one-sample test and the two-sample test with equal covariance matrices. We derive the test statistics under the assumption that the covariance matrices follow a diagonal matrix structure. In comparison with the diagonal Hotelling's tests, our proposed test statistics display some interesting characteristics. In particular, they are a summation of the log-transformed squared t-statistics rather than a direct summation of those components. More importantly, to derive the asymptotic normality of our test statistics under the null and local alternative hypotheses, we do not require the assumption that the covariance matrix follows a diagonal matrix structure. As a consequence, our proposed test methods are very flexible and can be widely applied in practice. Finally, simulation studies and a real data analysis are also conducted to demonstrate the advantages of our likelihood ratio test method.

  15. Characterization of differentially expressed genes using high-dimensional co-expression networks

    DEFF Research Database (Denmark)

    Coelho Goncalves de Abreu, Gabriel; Labouriau, Rodrigo S.

    2010-01-01

    We present a technique to characterize differentially expressed genes in terms of their position in a high-dimensional co-expression network. The set-up of Gaussian graphical models is used to construct representations of the co-expression network in such a way that redundancy and the propagation...... that allow to make effective inference in problems with high degree of complexity (e.g. several thousands of genes) and small number of observations (e.g. 10-100) as typically occurs in high throughput gene expression studies. Taking advantage of the internal structure of decomposable graphical models, we...... construct a compact representation of the co-expression network that allows to identify the regions with high concentration of differentially expressed genes. It is argued that differentially expressed genes located in highly interconnected regions of the co-expression network are less informative than...

  16. Discriminative topological features reveal biological network mechanisms

    Directory of Open Access Journals (Sweden)

    Levovitz Chaya

    2004-11-01

    Full Text Available Abstract Background Recent genomic and bioinformatic advances have motivated the development of numerous network models intending to describe graphs of biological, technological, and sociological origin. In most cases the success of a model has been evaluated by how well it reproduces a few key features of the real-world data, such as degree distributions, mean geodesic lengths, and clustering coefficients. Often pairs of models can reproduce these features with indistinguishable fidelity despite being generated by vastly different mechanisms. In such cases, these few target features are insufficient to distinguish which of the different models best describes real world networks of interest; moreover, it is not clear a priori that any of the presently-existing algorithms for network generation offers a predictive description of the networks inspiring them. Results We present a method to assess systematically which of a set of proposed network generation algorithms gives the most accurate description of a given biological network. To derive discriminative classifiers, we construct a mapping from the set of all graphs to a high-dimensional (in principle infinite-dimensional "word space". This map defines an input space for classification schemes which allow us to state unambiguously which models are most descriptive of a given network of interest. Our training sets include networks generated from 17 models either drawn from the literature or introduced in this work. We show that different duplication-mutation schemes best describe the E. coli genetic network, the S. cerevisiae protein interaction network, and the C. elegans neuronal network, out of a set of network models including a linear preferential attachment model and a small-world model. Conclusions Our method is a first step towards systematizing network models and assessing their predictability, and we anticipate its usefulness for a number of communities.

  17. An adaptive ANOVA-based PCKF for high-dimensional nonlinear inverse modeling

    Science.gov (United States)

    Li, Weixuan; Lin, Guang; Zhang, Dongxiao

    2014-02-01

    The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect-except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos basis functions in the expansion helps to capture uncertainty more accurately but increases computational cost. Selection of basis functions is particularly important for high-dimensional stochastic problems because the number of polynomial chaos basis functions required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE basis functions are pre-set based on users' experience. Also, for sequential data assimilation problems, the basis functions kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE basis functions for different problems and automatically adjusts the number of basis functions in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm was tested with different examples and demonstrated

  18. Kernel based methods for accelerated failure time model with ultra-high dimensional data

    Directory of Open Access Journals (Sweden)

    Jiang Feng

    2010-12-01

    Full Text Available Abstract Background Most genomic data have ultra-high dimensions with more than 10,000 genes (probes. Regularization methods with L1 and Lp penalty have been extensively studied in survival analysis with high-dimensional genomic data. However, when the sample size n ≪ m (the number of genes, directly identifying a small subset of genes from ultra-high (m > 10, 000 dimensional data is time-consuming and not computationally efficient. In current microarray analysis, what people really do is select a couple of thousands (or hundreds of genes using univariate analysis or statistical tests, and then apply the LASSO-type penalty to further reduce the number of disease associated genes. This two-step procedure may introduce bias and inaccuracy and lead us to miss biologically important genes. Results The accelerated failure time (AFT model is a linear regression model and a useful alternative to the Cox model for survival analysis. In this paper, we propose a nonlinear kernel based AFT model and an efficient variable selection method with adaptive kernel ridge regression. Our proposed variable selection method is based on the kernel matrix and dual problem with a much smaller n × n matrix. It is very efficient when the number of unknown variables (genes is much larger than the number of samples. Moreover, the primal variables are explicitly updated and the sparsity in the solution is exploited. Conclusions Our proposed methods can simultaneously identify survival associated prognostic factors and predict survival outcomes with ultra-high dimensional genomic data. We have demonstrated the performance of our methods with both simulation and real data. The proposed method performs superbly with limited computational studies.

  19. Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding.

    Science.gov (United States)

    Travnik, Jaden B; Pilarski, Patrick M

    2017-07-01

    Prosthetic devices have advanced in their capabilities and in the number and type of sensors included in their design. As the space of sensorimotor data available to a conventional or machine learning prosthetic control system increases in dimensionality and complexity, it becomes increasingly important that this data be represented in a useful and computationally efficient way. Well structured sensory data allows prosthetic control systems to make informed, appropriate control decisions. In this study, we explore the impact that increased sensorimotor information has on current machine learning prosthetic control approaches. Specifically, we examine the effect that high-dimensional sensory data has on the computation time and prediction performance of a true-online temporal-difference learning prediction method as embedded within a resource-limited upper-limb prosthesis control system. We present results comparing tile coding, the dominant linear representation for real-time prosthetic machine learning, with a newly proposed modification to Kanerva coding that we call selective Kanerva coding. In addition to showing promising results for selective Kanerva coding, our results confirm potential limitations to tile coding as the number of sensory input dimensions increases. To our knowledge, this study is the first to explicitly examine representations for realtime machine learning prosthetic devices in general terms. This work therefore provides an important step towards forming an efficient prosthesis-eye view of the world, wherein prompt and accurate representations of high-dimensional data may be provided to machine learning control systems within artificial limbs and other assistive rehabilitation technologies.

  20. Neural representations of emotion are organized around abstract event features.

    Science.gov (United States)

    Skerry, Amy E; Saxe, Rebecca

    2015-08-03

    Research on emotion attribution has tended to focus on the perception of overt expressions of at most five or six basic emotions. However, our ability to identify others' emotional states is not limited to perception of these canonical expressions. Instead, we make fine-grained inferences about what others feel based on the situations they encounter, relying on knowledge of the eliciting conditions for different emotions. In the present research, we provide convergent behavioral and neural evidence concerning the representations underlying these concepts. First, we find that patterns of activity in mentalizing regions contain information about subtle emotional distinctions conveyed through verbal descriptions of eliciting situations. Second, we identify a space of abstract situation features that well captures the emotion discriminations subjects make behaviorally and show that this feature space outperforms competing models in capturing the similarity space of neural patterns in these regions. Together, the data suggest that our knowledge of others' emotions is abstract and high dimensional, that brain regions selective for mental state reasoning support relatively subtle distinctions between emotion concepts, and that the neural representations in these regions are not reducible to more primitive affective dimensions such as valence and arousal. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Penalized feature selection and classification in bioinformatics

    OpenAIRE

    Ma, Shuangge; Huang, Jian

    2008-01-01

    In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classific...

  2. Comparing observer models and feature selection methods for a task-based statistical assessment of digital breast tomsynthesis in reconstruction space

    Science.gov (United States)

    Park, Subok; Zhang, George Z.; Zeng, Rongping; Myers, Kyle J.

    2014-03-01

    A task-based assessment of image quality1 for digital breast tomosynthesis (DBT) can be done in either the projected or reconstructed data space. As the choice of observer models and feature selection methods can vary depending on the type of task and data statistics, we previously investigated the performance of two channelized- Hotelling observer models in conjunction with 2D Laguerre-Gauss (LG) and two implementations of partial least squares (PLS) channels along with that of the Hotelling observer in binary detection tasks involving DBT projections.2, 3 The difference in these observers lies in how the spatial correlation in DBT angular projections is incorporated in the observer's strategy to perform the given task. In the current work, we extend our method to the reconstructed data space of DBT. We investigate how various model observers including the aforementioned compare for performing the binary detection of a spherical signal embedded in structured breast phantoms with the use of DBT slices reconstructed via filtered back projection. We explore how well the model observers incorporate the spatial correlation between different numbers of reconstructed DBT slices while varying the number of projections. For this, relatively small and large scan angles (24° and 96°) are used for comparison. Our results indicate that 1) given a particular scan angle, the number of projections needed to achieve the best performance for each observer is similar across all observer/channel combinations, i.e., Np = 25 for scan angle 96° and Np = 13 for scan angle 24°, and 2) given these sufficient numbers of projections, the number of slices for each observer to achieve the best performance differs depending on the channel/observer types, which is more pronounced in the narrow scan angle case.

  3. A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

    DEFF Research Database (Denmark)

    Pham, Ninh Dang; Pagh, Rasmus

    2012-01-01

    projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality...... neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random......Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest...

  4. Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.

    Science.gov (United States)

    Serra, Angela; Coretto, Pietro; Fratello, Michele; Tagliaferri, Roberto; Stegle, Oliver

    2018-02-15

    Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input. The R

  5. A HYBRID FILTER AND WRAPPER FEATURE SELECTION APPROACH FOR DETECTING CONTAMINATION IN DRINKING WATER MANAGEMENT SYSTEM

    Directory of Open Access Journals (Sweden)

    S. VISALAKSHI

    2017-07-01

    Full Text Available Feature selection is an important task in predictive models which helps to identify the irrelevant features in the high - dimensional dataset. In this case of water contamination detection dataset, the standard wrapper algorithm alone cannot be applied because of the complexity. To overcome this computational complexity problem and making it lighter, filter-wrapper based algorithm has been proposed. In this work, reducing the feature space is a significant component of water contamination. The main findings are as follows: (1 The main goal is speeding up the feature selection process, so the proposed filter - based feature pre-selection is applied and guarantees that useful data are improbable to be detached in the initial stage which discussed briefly in this paper. (2 The resulting features are again filtered by using the Genetic Algorithm coded with Support Vector Machine method, where it facilitates to nutshell the subset of features with high accuracy and decreases the expense. Experimental results show that the proposed methods trim down redundant features effectively and achieved better classification accuracy.

  6. Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.

    Science.gov (United States)

    Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack

    2017-06-01

    In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.

  7. Diagonal Likelihood Ratio Test for Equality of Mean Vectors in High-Dimensional Data

    KAUST Repository

    Hu, Zongliang

    2017-10-27

    We propose a likelihood ratio test framework for testing normal mean vectors in high-dimensional data under two common scenarios: the one-sample test and the two-sample test with equal covariance matrices. We derive the test statistics under the assumption that the covariance matrices follow a diagonal matrix structure. In comparison with the diagonal Hotelling\\'s tests, our proposed test statistics display some interesting characteristics. In particular, they are a summation of the log-transformed squared t-statistics rather than a direct summation of those components. More importantly, to derive the asymptotic normality of our test statistics under the null and local alternative hypotheses, we do not require the assumption that the covariance matrix follows a diagonal matrix structure. As a consequence, our proposed test methods are very flexible and can be widely applied in practice. Finally, simulation studies and a real data analysis are also conducted to demonstrate the advantages of our likelihood ratio test method.

  8. Technical Report: Toward a Scalable Algorithm to Compute High-Dimensional Integrals of Arbitrary Functions

    International Nuclear Information System (INIS)

    Snyder, Abigail C.; Jiao, Yu

    2010-01-01

    Neutron experiments at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL) frequently generate large amounts of data (on the order of 106-1012 data points). Hence, traditional data analysis tools run on a single CPU take too long to be practical and scientists are unable to efficiently analyze all data generated by experiments. Our goal is to develop a scalable algorithm to efficiently compute high-dimensional integrals of arbitrary functions. This algorithm can then be used to integrate the four-dimensional integrals that arise as part of modeling intensity from the experiments at the SNS. Here, three different one-dimensional numerical integration solvers from the GNU Scientific Library were modified and implemented to solve four-dimensional integrals. The results of these solvers on a final integrand provided by scientists at the SNS can be compared to the results of other methods, such as quasi-Monte Carlo methods, computing the same integral. A parallelized version of the most efficient method can allow scientists the opportunity to more effectively analyze all experimental data.

  9. Computational Strategies for Dissecting the High-Dimensional Complexity of Adaptive Immune Repertoires

    Directory of Open Access Journals (Sweden)

    Enkelejda Miho

    2018-02-01

    Full Text Available The adaptive immune system recognizes antigens via an immense array of antigen-binding antibodies and T-cell receptors, the immune repertoire. The interrogation of immune repertoires is of high relevance for understanding the adaptive immune response in disease and infection (e.g., autoimmunity, cancer, HIV. Adaptive immune receptor repertoire sequencing (AIRR-seq has driven the quantitative and molecular-level profiling of immune repertoires, thereby revealing the high-dimensional complexity of the immune receptor sequence landscape. Several methods for the computational and statistical analysis of large-scale AIRR-seq data have been developed to resolve immune repertoire complexity and to understand the dynamics of adaptive immunity. Here, we review the current research on (i diversity, (ii clustering and network, (iii phylogenetic, and (iv machine learning methods applied to dissect, quantify, and compare the architecture, evolution, and specificity of immune repertoires. We summarize outstanding questions in computational immunology and propose future directions for systems immunology toward coupling AIRR-seq with the computational discovery of immunotherapeutics, vaccines, and immunodiagnostics.

  10. Construction of high-dimensional neural network potentials using environment-dependent atom pairs.

    Science.gov (United States)

    Jose, K V Jovan; Artrith, Nongnuch; Behler, Jörg

    2012-05-21

    An accurate determination of the potential energy is the crucial step in computer simulations of chemical processes, but using electronic structure methods on-the-fly in molecular dynamics (MD) is computationally too demanding for many systems. Constructing more efficient interatomic potentials becomes intricate with increasing dimensionality of the potential-energy surface (PES), and for numerous systems the accuracy that can be achieved is still not satisfying and far from the reliability of first-principles calculations. Feed-forward neural networks (NNs) have a very flexible functional form, and in recent years they have been shown to be an accurate tool to construct efficient PESs. High-dimensional NN potentials based on environment-dependent atomic energy contributions have been presented for a number of materials. Still, these potentials may be improved by a more detailed structural description, e.g., in form of atom pairs, which directly reflect the atomic interactions and take the chemical environment into account. We present an implementation of an NN method based on atom pairs, and its accuracy and performance are compared to the atom-based NN approach using two very different systems, the methanol molecule and metallic copper. We find that both types of NN potentials provide an excellent description of both PESs, with the pair-based method yielding a slightly higher accuracy making it a competitive alternative for addressing complex systems in MD simulations.

  11. Two-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.

    Science.gov (United States)

    Xia, Yin; Cai, Tianxi; Cai, T Tony

    2018-01-01

    Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion. Theoretical justifications are provided to guarantee the validity of the proposed tests and optimality results are established under sparsity assumptions on the regression coefficients. The proposed testing procedures are easy to implement. Numerical properties of the procedures are investigated through simulation and data analysis. The results show that the proposed tests maintain the desired error rates under the null and have good power under the alternative at moderate sample sizes. The procedures are applied to the Framingham Offspring study to investigate the interactions between smoking and cardiovascular related genetic mutations important for an inflammation marker.

  12. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    Directory of Open Access Journals (Sweden)

    Zekić-Sušac Marijana

    2014-09-01

    Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

  13. High-dimensional neural network potentials for solvation: The case of protonated water clusters in helium

    Science.gov (United States)

    Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik

    2018-03-01

    The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol-1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.

  14. Multi-Scale Factor Analysis of High-Dimensional Brain Signals

    KAUST Repository

    Ting, Chee-Ming

    2017-05-18

    In this paper, we develop an approach to modeling high-dimensional networks with a large number of nodes arranged in a hierarchical and modular structure. We propose a novel multi-scale factor analysis (MSFA) model which partitions the massive spatio-temporal data defined over the complex networks into a finite set of regional clusters. To achieve further dimension reduction, we represent the signals in each cluster by a small number of latent factors. The correlation matrix for all nodes in the network are approximated by lower-dimensional sub-structures derived from the cluster-specific factors. To estimate regional connectivity between numerous nodes (within each cluster), we apply principal components analysis (PCA) to produce factors which are derived as the optimal reconstruction of the observed signals under the squared loss. Then, we estimate global connectivity (between clusters or sub-networks) based on the factors across regions using the RV-coefficient as the cross-dependence measure. This gives a reliable and computationally efficient multi-scale analysis of both regional and global dependencies of the large networks. The proposed novel approach is applied to estimate brain connectivity networks using functional magnetic resonance imaging (fMRI) data. Results on resting-state fMRI reveal interesting modular and hierarchical organization of human brain networks during rest.

  15. Enhanced spectral resolution by high-dimensional NMR using the filter diagonalization method and "hidden" dimensions.

    Science.gov (United States)

    Meng, Xi; Nguyen, Bao D; Ridge, Clark; Shaka, A J

    2009-01-01

    High-dimensional (HD) NMR spectra have poorer digital resolution than low-dimensional (LD) spectra, for a fixed amount of experiment time. This has led to "reduced-dimensionality" strategies, in which several LD projections of the HD NMR spectrum are acquired, each with higher digital resolution; an approximate HD spectrum is then inferred by some means. We propose a strategy that moves in the opposite direction, by adding more time dimensions to increase the information content of the data set, even if only a very sparse time grid is used in each dimension. The full HD time-domain data can be analyzed by the filter diagonalization method (FDM), yielding very narrow resonances along all of the frequency axes, even those with sparse sampling. Integrating over the added dimensions of HD FDM NMR spectra reconstitutes LD spectra with enhanced resolution, often more quickly than direct acquisition of the LD spectrum with a larger number of grid points in each of the fewer dimensions. If the extra-dimensions do not appear in the final spectrum, and are used solely to boost information content, we propose the moniker hidden-dimension NMR. This work shows that HD peaks have unmistakable frequency signatures that can be detected as single HD objects by an appropriate algorithm, even though their patterns would be tricky for a human operator to visualize or recognize, and even if digital resolution in an HD FT spectrum is very coarse compared with natural line widths.

  16. Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets

    Directory of Open Access Journals (Sweden)

    Shen Lu

    2013-04-01

    Full Text Available Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.

  17. HDclassif : An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Laurent Berge

    2012-01-01

    Full Text Available This paper presents the R package HDclassif which is devoted to the clustering and the discriminant analysis of high-dimensional data. The classification methods proposed in the package result from a new parametrization of the Gaussian mixture model which combines the idea of dimension reduction and model constraints on the covariance matrices. The supervised classification method using this parametrization is called high dimensional discriminant analysis (HDDA. In a similar manner, the associated clustering method iscalled high dimensional data clustering (HDDC and uses the expectation-maximization algorithm for inference. In order to correctly t the data, both methods estimate the specific subspace and the intrinsic dimension of the groups. Due to the constraints on the covariance matrices, the number of parameters to estimate is significantly lower than other model-based methods and this allows the methods to be stable and efficient in high dimensions. Two introductory examples illustrated with R codes allow the user to discover the hdda and hddc functions. Experiments on simulated and real datasets also compare HDDC and HDDA with existing classification methods on high-dimensional datasets. HDclassif is a free software and distributed under the general public license, as part of the R software project.

  18. Do MRI features at baseline predict radiographic joint space narrowing in the medial compartment of the osteoarthritic knee 2 years later?

    Energy Technology Data Exchange (ETDEWEB)

    Madan-Sharma, Ruby; Kornaat, Peter R.; Bloem, Johannes L.; Watt, Iain [Leiden University Medical Center, Department of Radiology, Leiden (Netherlands); Kloppenburg, Margreet; Botha-Scheepers, Stella A. [Leiden University Medical Center, Department of Rheumatology, Leiden (Netherlands); Graverand, Marie-Pierre Hellio le [Pfizer Groton, Groton, CT (United States)

    2008-09-15

    The purpose of the study was to relate magnetic resonance imaging (MRI) features at baseline with radiographically determined joint space narrowing (JSN) in the medial compartment of the knee after 2 years in a group of patients with symptomatic osteoarthritis at multiple joint sites. MRI of the knee and standardized radiographs were obtained at baseline and after 2 years in 186 patients (81% female; aged 43-76 years; mean 60 years). MRI was analyzed for bone marrow lesions, cysts, osteophytes, hyaline cartilage defects, joint effusion, and meniscal pathology in the medial compartment. Radiographs were scored semiquantitatively for JSN in the medial tibiofemoral joint using the Osteoarthritis Research Society International (OARSI) atlas. Radiological progression was defined as {>=}1 grade increase. Associations between baseline magnetic resonance (MR) parameters and subsequent radiographic JSN changes were assessed using logistic regression. Relative risk (RR) was then calculated. Radiographic progression of JSN was observed in 17 (9.1%) of 186 patients. Eleven patients had a Kellgren and Lawrence (KL) score of {>=}2. A significant association was observed between all patients and meniscal tears (RR 3.57; confidence interval (CI) 1.08-10.0) and meniscal subluxation (RR 2.73; CI 1.20-5.41), between KL<2 and meniscal subluxation (RR 11.3; CI 2.49-29.49) and KL {>=} 2 and meniscus tears (RR 8.91; CI 1.13-22.84) and radiographic JSN 2 years later. Follow-up MR in 15 of 17 patients with progressive JSN showed only new meniscal abnormalities and no progression of cartilage loss. Meniscal pathology (tears and/or meniscal subluxation) was the only MRI parameter to be associated with subsequent radiographic progression of JSN in the medial tibiofemoral compartment on a radiograph 2 years later, as assessed by the OARSI score. (orig.)

  19. Probabilistic numerical methods for high-dimensional stochastic control and valuation problems on electricity markets

    International Nuclear Information System (INIS)

    Langrene, Nicolas

    2014-01-01

    This thesis deals with the numerical solution of general stochastic control problems, with notable applications for electricity markets. We first propose a structural model for the price of electricity, allowing for price spikes well above the marginal fuel price under strained market conditions. This model allows to price and partially hedge electricity derivatives, using fuel forwards as hedging instruments. Then, we propose an algorithm, which combines Monte-Carlo simulations with local basis regressions, to solve general optimal switching problems. A comprehensive rate of convergence of the method is provided. Moreover, we manage to make the algorithm parsimonious in memory (and hence suitable for high dimensional problems) by generalizing to this framework a memory reduction method that avoids the storage of the sample paths. We illustrate this on the problem of investments in new power plants (our structural power price model allowing the new plants to impact the price of electricity). Finally, we study more general stochastic control problems (the control can be continuous and impact the drift and volatility of the state process), the solutions of which belong to the class of fully nonlinear Hamilton-Jacobi-Bellman equations, and can be handled via constrained Backward Stochastic Differential Equations, for which we develop a backward algorithm based on control randomization and parametric optimizations. A rate of convergence between the constraPned BSDE and its discrete version is provided, as well as an estimate of the optimal control. This algorithm is then applied to the problem of super replication of options under uncertain volatilities (and correlations). (author)

  20. Evaluation of a new high-dimensional miRNA profiling platform

    Directory of Open Access Journals (Sweden)

    Lamblin Anne-Francoise

    2009-08-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are a class of approximately 22 nucleotide long, widely expressed RNA molecules that play important regulatory roles in eukaryotes. To investigate miRNA function, it is essential that methods to quantify their expression levels be available. Methods We evaluated a new miRNA profiling platform that utilizes Illumina's existing robust DASL chemistry as the basis for the assay. Using total RNA from five colon cancer patients and four cell lines, we evaluated the reproducibility of miRNA expression levels across replicates and with varying amounts of input RNA. The beta test version was comprised of 735 miRNA targets of Illumina's miRNA profiling application. Results Reproducibility between sample replicates within a plate was good (Spearman's correlation 0.91 to 0.98 as was the plate-to-plate reproducibility replicates run on different days (Spearman's correlation 0.84 to 0.98. To determine whether quality data could be obtained from a broad range of input RNA, data obtained from amounts ranging from 25 ng to 800 ng were compared to those obtained at 200 ng. No effect across the range of RNA input was observed. Conclusion These results indicate that very small amounts of starting material are sufficient to allow sensitive miRNA profiling using the Illumina miRNA high-dimensional platform. Nonlinear biases were observed between replicates, indicating the need for abundance-dependent normalization. Overall, the performance characteristics of the Illumina miRNA profiling system were excellent.

  1. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.

  2. Uncertainty quantification in transcranial magnetic stimulation via high-dimensional model representation.

    Science.gov (United States)

    Gomez, Luis J; Yücel, Abdulkadir C; Hernandez-Garcia, Luis; Taylor, Stephan F; Michielssen, Eric

    2015-01-01

    A computational framework for uncertainty quantification in transcranial magnetic stimulation (TMS) is presented. The framework leverages high-dimensional model representations (HDMRs), which approximate observables (i.e., quantities of interest such as electric (E) fields induced inside targeted cortical regions) via series of iteratively constructed component functions involving only the most significant random variables (i.e., parameters that characterize the uncertainty in a TMS setup such as the position and orientation of TMS coils, as well as the size, shape, and conductivity of the head tissue). The component functions of HDMR expansions are approximated via a multielement probabilistic collocation (ME-PC) method. While approximating each component function, a quasi-static finite-difference simulator is used to compute observables at integration/collocation points dictated by the ME-PC method. The proposed framework requires far fewer simulations than traditional Monte Carlo methods for providing highly accurate statistical information (e.g., the mean and standard deviation) about the observables. The efficiency and accuracy of the proposed framework are demonstrated via its application to the statistical characterization of E-fields generated by TMS inside cortical regions of an MRI-derived realistic head model. Numerical results show that while uncertainties in tissue conductivities have negligible effects on TMS operation, variations in coil position/orientation and brain size significantly affect the induced E-fields. Our numerical results have several implications for the use of TMS during depression therapy: 1) uncertainty in the coil position and orientation may reduce the response rates of patients; 2) practitioners should favor targets on the crest of a gyrus to obtain maximal stimulation; and 3) an increasing scalp-to-cortex distance reduces the magnitude of E-fields on the surface and inside the cortex.

  3. Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed

    Science.gov (United States)

    Landfors, Mattias; Philip, Philge; Rydén, Patrik; Stenberg, Per

    2011-01-01

    Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher

  4. Discovering highly informative feature set over high dimensions

    KAUST Repository

    Zhang, Chongsheng; Masseglia, Florent; Zhang, Xiangliang

    2012-01-01

    For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient. © 2012 IEEE.

  5. Discovering highly informative feature set over high dimensions

    KAUST Repository

    Zhang, Chongsheng

    2012-11-01

    For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient. © 2012 IEEE.

  6. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    Science.gov (United States)

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  7. High-level intuitive features (HLIFs) for intuitive skin lesion description.

    Science.gov (United States)

    Amelard, Robert; Glaister, Jeffrey; Wong, Alexander; Clausi, David A

    2015-03-01

    A set of high-level intuitive features (HLIFs) is proposed to quantitatively describe melanoma in standard camera images. Melanoma is the deadliest form of skin cancer. With rising incidence rates and subjectivity in current clinical detection methods, there is a need for melanoma decision support systems. Feature extraction is a critical step in melanoma decision support systems. Existing feature sets for analyzing standard camera images are comprised of low-level features, which exist in high-dimensional feature spaces and limit the system's ability to convey intuitive diagnostic rationale. The proposed HLIFs were designed to model the ABCD criteria commonly used by dermatologists such that each HLIF represents a human-observable characteristic. As such, intuitive diagnostic rationale can be conveyed to the user. Experimental results show that concatenating the proposed HLIFs with a full low-level feature set increased classification accuracy, and that HLIFs were able to separate the data better than low-level features with statistical significance. An example of a graphical interface for providing intuitive rationale is given.

  8. Fast Markov chain Monte Carlo sampling for sparse Bayesian inference in high-dimensional inverse problems using L1-type priors

    International Nuclear Information System (INIS)

    Lucka, Felix

    2012-01-01

    Sparsity has become a key concept for solving of high-dimensional inverse problems using variational regularization techniques. Recently, using similar sparsity-constraints in the Bayesian framework for inverse problems by encoding them in the prior distribution has attracted attention. Important questions about the relation between regularization theory and Bayesian inference still need to be addressed when using sparsity promoting inversion. A practical obstacle for these examinations is the lack of fast posterior sampling algorithms for sparse, high-dimensional Bayesian inversion. Accessing the full range of Bayesian inference methods requires being able to draw samples from the posterior probability distribution in a fast and efficient way. This is usually done using Markov chain Monte Carlo (MCMC) sampling algorithms. In this paper, we develop and examine a new implementation of a single component Gibbs MCMC sampler for sparse priors relying on L1-norms. We demonstrate that the efficiency of our Gibbs sampler increases when the level of sparsity or the dimension of the unknowns is increased. This property is contrary to the properties of the most commonly applied Metropolis–Hastings (MH) sampling schemes. We demonstrate that the efficiency of MH schemes for L1-type priors dramatically decreases when the level of sparsity or the dimension of the unknowns is increased. Practically, Bayesian inversion for L1-type priors using MH samplers is not feasible at all. As this is commonly believed to be an intrinsic feature of MCMC sampling, the performance of our Gibbs sampler also challenges common beliefs about the applicability of sample based Bayesian inference. (paper)

  9. Filaments of Meaning in Word Space

    OpenAIRE

    Karlgren, Jussi; Holst, Anders; Sahlgren, Magnus

    2008-01-01

    Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dimensionality of typical vector space models lead to unintuitive effects on modeling likeness of meaning and that the local structure of word spaces is where interesting semantic relations reside. We show that the local structure of word spaces has substantially different dimensionality and character than the global s...

  10. Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure.

    Science.gov (United States)

    Foroughi Pour, Ali; Dalton, Lori A

    2018-03-21

    Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.

  11. DataHigh: Graphical user interface for visualizing and interacting with high-dimensional neural activity

    OpenAIRE

    Cowley, Benjamin R.; Kaufman, Matthew T.; Churchland, Mark M.; Ryu, Stephen I.; Shenoy, Krishna V.; Yu, Byron M.

    2012-01-01

    The activity of tens to hundreds of neurons can be succinctly summarized by a smaller number of latent variables extracted using dimensionality reduction methods. These latent variables define a reduced-dimensional space in which we can study how population activity varies over time, across trials, and across experimental conditions. Ideally, we would like to visualize the population activity directly in the reduced-dimensional space, whose optimal dimensionality (as determined from the data)...

  12. Greedy algorithms for high-dimensional non-symmetric linear problems***

    Directory of Open Access Journals (Sweden)

    Cancès E.

    2013-12-01

    Full Text Available In this article, we present a family of numerical approaches to solve high-dimensional linear non-symmetric problems. The principle of these methods is to approximate a function which depends on a large number of variates by a sum of tensor product functions, each term of which is iteratively computed via a greedy algorithm ? . There exists a good theoretical framework for these methods in the case of (linear and nonlinear symmetric elliptic problems. However, the convergence results are not valid any more as soon as the problems under consideration are not symmetric. We present here a review of the main algorithms proposed in the literature to circumvent this difficulty, together with some new approaches. The theoretical convergence results and the practical implementation of these algorithms are discussed. Their behaviors are illustrated through some numerical examples. Dans cet article, nous présentons une famille de méthodes numériques pour résoudre des problèmes linéaires non symétriques en grande dimension. Le principe de ces approches est de représenter une fonction dépendant d’un grand nombre de variables sous la forme d’une somme de fonctions produit tensoriel, dont chaque terme est calculé itérativement via un algorithme glouton ? . Ces méthodes possèdent de bonnes propriétés théoriques dans le cas de problèmes elliptiques symétriques (linéaires ou non linéaires, mais celles-ci ne sont plus valables dès lors que les problèmes considérés ne sont plus symétriques. Nous présentons une revue des principaux algorithmes proposés dans la littérature pour contourner cette difficulté ainsi que de nouvelles approches que nous proposons. Les résultats de convergence théoriques et la mise en oeuvre pratique de ces algorithmes sont détaillés et leur comportement est illustré au travers d’exemples numériques.

  13. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  14. Robust Learning of High-dimensional Biological Networks with Bayesian Networks

    Science.gov (United States)

    Nägele, Andreas; Dejori, Mathäus; Stetter, Martin

    Structure learning of Bayesian networks applied to gene expression data has become a potentially useful method to estimate interactions between genes. However, the NP-hardness of Bayesian network structure learning renders the reconstruction of the full genetic network with thousands of genes unfeasible. Consequently, the maximal network size is usually restricted dramatically to a small set of genes (corresponding with variables in the Bayesian network). Although this feature reduction step makes structure learning computationally tractable, on the downside, the learned structure might be adversely affected due to the introduction of missing genes. Additionally, gene expression data are usually very sparse with respect to the number of samples, i.e., the number of genes is much greater than the number of different observations. Given these problems, learning robust network features from microarray data is a challenging task. This chapter presents several approaches tackling the robustness issue in order to obtain a more reliable estimation of learned network features.

  15. Space space space

    CERN Document Server

    Trembach, Vera

    2014-01-01

    Space is an introduction to the mysteries of the Universe. Included are Task Cards for independent learning, Journal Word Cards for creative writing, and Hands-On Activities for reinforcing skills in Math and Language Arts. Space is a perfect introduction to further research of the Solar System.

  16. Enhanced, targeted sampling of high-dimensional free-energy landscapes using variationally enhanced sampling, with an application to chignolin.

    Science.gov (United States)

    Shaffer, Patrick; Valsson, Omar; Parrinello, Michele

    2016-02-02

    The capabilities of molecular simulations have been greatly extended by a number of widely used enhanced sampling methods that facilitate escaping from metastable states and crossing large barriers. Despite these developments there are still many problems which remain out of reach for these methods which has led to a vigorous effort in this area. One of the most important problems that remains unsolved is sampling high-dimensional free-energy landscapes and systems that are not easily described by a small number of collective variables. In this work we demonstrate a new way to compute free-energy landscapes of high dimensionality based on the previously introduced variationally enhanced sampling, and we apply it to the miniprotein chignolin.

  17. Enhanced, targeted sampling of high-dimensional free-energy landscapes using variationally enhanced sampling, with an application to chignolin

    Science.gov (United States)

    Shaffer, Patrick; Valsson, Omar; Parrinello, Michele

    2016-01-01

    The capabilities of molecular simulations have been greatly extended by a number of widely used enhanced sampling methods that facilitate escaping from metastable states and crossing large barriers. Despite these developments there are still many problems which remain out of reach for these methods which has led to a vigorous effort in this area. One of the most important problems that remains unsolved is sampling high-dimensional free-energy landscapes and systems that are not easily described by a small number of collective variables. In this work we demonstrate a new way to compute free-energy landscapes of high dimensionality based on the previously introduced variationally enhanced sampling, and we apply it to the miniprotein chignolin. PMID:26787868

  18. Feature extraction with deep neural networks by a generalized discriminant analysis.

    Science.gov (United States)

    Stuhlsatz, André; Lippel, Jens; Zielke, Thomas

    2012-04-01

    We present an approach to feature extraction that is a generalization of the classical linear discriminant analysis (LDA) on the basis of deep neural networks (DNNs). As for LDA, discriminative features generated from independent Gaussian class conditionals are assumed. This modeling has the advantages that the intrinsic dimensionality of the feature space is bounded by the number of classes and that the optimal discriminant function is linear. Unfortunately, linear transformations are insufficient to extract optimal discriminative features from arbitrarily distributed raw measurements. The generalized discriminant analysis (GerDA) proposed in this paper uses nonlinear transformations that are learnt by DNNs in a semisupervised fashion. We show that the feature extraction based on our approach displays excellent performance on real-world recognition and detection tasks, such as handwritten digit recognition and face detection. In a series of experiments, we evaluate GerDA features with respect to dimensionality reduction, visualization, classification, and detection. Moreover, we show that GerDA DNNs can preprocess truly high-dimensional input data to low-dimensional representations that facilitate accurate predictions even if simple linear predictors or measures of similarity are used.

  19. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

    Directory of Open Access Journals (Sweden)

    Aiming Liu

    2017-11-01

    Full Text Available Motor Imagery (MI electroencephalography (EEG is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP and local characteristic-scale decomposition (LCD algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems.

  20. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

    Science.gov (United States)

    Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-11-08

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.

  1. The long-term prospects of citizens managing urban green space: From place making to place-keeping? : Special feature:TURFGRASS

    NARCIS (Netherlands)

    Mattijssen, T.J.M.; van der Jagt, A.P.N.; Buijs, A.E.; Elands, B.H.M.; Erlwein, S.; Lafortezza, R.

    2017-01-01

    Abstract This paper discusses the long-term management or ‘place-keeping’ of urban green space by citizens and highlights enabling and constraining factors that play a crucial role in this continuity. While authorities have historically been in charge of managing public green spaces, there is an

  2. A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure.

    Science.gov (United States)

    Naghibi, Tofigh; Hoffmann, Sarah; Pfister, Beat

    2015-08-01

    Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem there are two main issues that need to be addressed: (i) Finding an appropriate measure function than can be fairly fast and robustly computed for high-dimensional data. (ii) A search strategy to optimize the measure over the subset space in a reasonable amount of time. In this article mutual information between features and class labels is considered to be the measure function. Two series expansions for mutual information are proposed, and it is shown that most heuristic criteria suggested in the literature are truncated approximations of these expansions. It is well-known that searching the whole subset space is an NP-hard problem. Here, instead of the conventional sequential search algorithms, we suggest a parallel search strategy based on semidefinite programming (SDP) that can search through the subset space in polynomial time. By exploiting the similarities between the proposed algorithm and an instance of the maximum-cut problem in graph theory, the approximation ratio of this algorithm is derived and is compared with the approximation ratio of the backward elimination method. The experiments show that it can be misleading to judge the quality of a measure solely based on the classification accuracy, without taking the effect of the non-optimum search strategy into account.

  3. Vacuum expectation values of high-dimensional operators and their contributions to the Bjorken and Ellis-Jaffe sum rules

    International Nuclear Information System (INIS)

    Oganesian, A.G.

    1998-01-01

    A method is proposed for estimating unknown vacuum expectation values of high-dimensional operators. The method is based on the idea that the factorization hypothesis is self-consistent. Results are obtained for all vacuum expectation values of dimension-7 operators, and some estimates for dimension-10 operators are presented as well. The resulting values are used to compute corrections of higher dimensions to the Bjorken and Ellis-Jaffe sum rules

  4. Feature Selection and Kernel Learning for Local Learning-Based Clustering.

    Science.gov (United States)

    Zeng, Hong; Cheung, Yiu-ming

    2011-08-01

    The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets.

  5. Multisymplectic Structure-Preserving in Simple Finite Element Method in High Dimensional Case

    Institute of Scientific and Technical Information of China (English)

    BAIYong-Qiang; LIUZhen; PEIMing; ZHENGZhu-Jun

    2003-01-01

    In this paper, we study a finite element scheme of some semi-linear elliptic boundary value problems in high-dhnensjonal space. With uniform mesh, we find that, the numerical scheme derived from finite element method can keep a preserved multisymplectic structure.

  6. Multisymplectic Structure-Preserving in Simple Finite Element Method in High Dimensional Case

    Institute of Scientific and Technical Information of China (English)

    BAI Yong-Qiang; LIU Zhen; PEI Ming; ZHENG Zhu-Jun

    2003-01-01

    In this paper, we study a finite element scheme of some semi-linear elliptic boundary value problems inhigh-dimensional space. With uniform mesh, we find that, the numerical scheme derived from finite element method cankeep a preserved multisymplectic structure.

  7. The French Space Operation Act: Scope and Main Features. Introduction to the Technical Regulation Considerations about the Implementation in the Launcher Field

    Science.gov (United States)

    Cahuzac, Francois

    2010-09-01

    This publication provides a presentation of the new French Space Operation Act(hereafter FSOA). The main objectives of FSOA are to institute a clarified legal regime for launch operations. The technical regulation associated to the act is set forth, in particular for the safety of persons and property, the protection of public health and the environment. First, we give an overview of the institutional and legal framework implemented in accordance with the act. The general purpose of this French Space Operation Act(hereafter FSOA) is to set up a coherent national regime of authorization and control of Space operations under the French jurisdiction or for which the French Government bears international liability either under UN Treaties principles(namely the 1967 Outer Space Treaty, the 1972 Liability Convention and the 1976 Registration Convention) or in accordance with its European commitments with the ESA organization and its Members States. For a given space operation, the operator must show that systems and procedures that he intends to implement are compliant with the technical regulation. The regime of authorization leads to a request of authorization for each launch operation. Thus, licences concerning operator management organization or a given space system can be obtained. These licences help to simplify the authorization file required for a given space operation. The technical regulation is presented in another article, and will be issued in 2010 by the French Minister in charge of space activities. A brief description of the organization associated to the implementation of the authorization regime in the launcher field is presented.

  8. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    Science.gov (United States)

    2015-01-01

    A. Porter and my advisor. The text is primarily written by me. Chapter 5 is a version of [46] where my contribution is all of the analytical ...inn Euclidean space, a variational method refers to using calculus of variation techniques to find the minimizer (or maximizer) of a functional (energy... geometric inter- pretation of modularity optimization contrasts with existing interpretations (e.g., probabilistic ones or in terms of the Potts model

  9. Understanding Legacy Features with Featureous

    DEFF Research Database (Denmark)

    Olszak, Andrzej; Jørgensen, Bo Nørregaard

    2011-01-01

    Java programs called Featureous that addresses this issue. Featureous allows a programmer to easily establish feature-code traceability links and to analyze their characteristics using a number of visualizations. Featureous is an extension to the NetBeans IDE, and can itself be extended by third...

  10. Calculation of high-dimensional fission-fusion potential-energy surfaces in the SHE region

    International Nuclear Information System (INIS)

    Moeller, Peter; Sierk, Arnold J.; Ichikawa, Takatoshi; Iwamoto, Akira

    2004-01-01

    We calculate in a macroscopic-microscopic model fission-fusion potential-energy surfaces relevant to the analysis of heavy-ion reactions employed to form heavy-element evaporation residues. We study these multidimensional potential-energy surfaces both inside and outside the touching point.Inside the point of contact we define the potential on a multi-million-point grid in 5D deformation space where elongation, merging projectile and target spheroidal shapes, neck radius and projectile/target mass asymmetry are independent shape variables. The same deformation space and the corresponding potential-energy surface also describe the shape evolution from the nuclear ground-state to separating fragments in fission, and the fast-fission trajectories in incomplete fusion.For separated nuclei we study the macroscopic-microscopic potential energy, that is the ''collision surface'' between a spheroidally deformed target and a spheroidally deformed projectile as a function of three coordinates which are: the relative location of the projectile center-of-mass with respect to the target center-of-mass and the spheroidal deformations of the target and the projectile. We limit our study to the most favorable relative positions of target and projectile, namely that the symmetry axes of the target and projectile are collinear

  11. Doubly sparse factor models for unifying feature transformation and feature selection

    International Nuclear Information System (INIS)

    Katahira, Kentaro; Okanoya, Kazuo; Okada, Masato; Matsumoto, Narihisa; Sugase-Miyamoto, Yasuko

    2010-01-01

    A number of unsupervised learning methods for high-dimensional data are largely divided into two groups based on their procedures, i.e., (1) feature selection, which discards irrelevant dimensions of the data, and (2) feature transformation, which constructs new variables by transforming and mixing over all dimensions. We propose a method that both selects and transforms features in a common Bayesian inference procedure. Our method imposes a doubly automatic relevance determination (ARD) prior on the factor loading matrix. We propose a variational Bayesian inference for our model and demonstrate the performance of our method on both synthetic and real data.

  12. Doubly sparse factor models for unifying feature transformation and feature selection

    Energy Technology Data Exchange (ETDEWEB)

    Katahira, Kentaro; Okanoya, Kazuo; Okada, Masato [ERATO, Okanoya Emotional Information Project, Japan Science Technology Agency, Saitama (Japan); Matsumoto, Narihisa; Sugase-Miyamoto, Yasuko, E-mail: okada@k.u-tokyo.ac.j [Human Technology Research Institute, National Institute of Advanced Industrial Science and Technology, Ibaraki (Japan)

    2010-06-01

    A number of unsupervised learning methods for high-dimensional data are largely divided into two groups based on their procedures, i.e., (1) feature selection, which discards irrelevant dimensions of the data, and (2) feature transformation, which constructs new variables by transforming and mixing over all dimensions. We propose a method that both selects and transforms features in a common Bayesian inference procedure. Our method imposes a doubly automatic relevance determination (ARD) prior on the factor loading matrix. We propose a variational Bayesian inference for our model and demonstrate the performance of our method on both synthetic and real data.

  13. Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Novovičová, Jana

    2010-01-01

    Roč. 32, č. 11 (2010), s. 1921-1939 ISSN 0162-8828 R&D Projects: GA MŠk 1M0572; GA ČR GA102/08/0593; GA ČR GA102/07/1594 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection * feature stability * stability measures * similarity measures * sequential search * individual ranking * feature subset-size optimization * high dimensionality * small sample size Subject RIV: BD - Theory of Information Impact factor: 5.027, year: 2010 http://library.utia.cas.cz/separaty/2010/RO/somol-0348726.pdf

  14. Deterministic joint remote preparation of an equatorial hybrid state via high-dimensional Einstein-Podolsky-Rosen pairs: active versus passive receiver

    Science.gov (United States)

    Bich, Cao Thi; Dat, Le Thanh; Van Hop, Nguyen; An, Nguyen Ba

    2018-04-01

    Entanglement plays a vital and in many cases non-replaceable role in the quantum network communication. Here, we propose two new protocols to jointly and remotely prepare a special so-called bipartite equatorial state which is hybrid in the sense that it entangles two Hilbert spaces with arbitrary different dimensions D and N (i.e., a type of entanglement between a quDit and a quNit). The quantum channels required to do that are however not necessarily hybrid. In fact, we utilize four high-dimensional Einstein-Podolsky-Rosen pairs, two of which are quDit-quDit entanglements, while the other two are quNit-quNit ones. In the first protocol the receiver has to be involved actively in the process of remote state preparation, while in the second protocol the receiver is passive as he/she needs to participate only in the final step for reconstructing the target hybrid state. Each protocol meets a specific circumstance that may be encountered in practice and both can be performed with unit success probability. Moreover, the concerned equatorial hybrid entangled state can also be jointly prepared for two receivers at two separated locations by slightly modifying the initial particles' distribution, thereby establishing between them an entangled channel ready for a later use.

  15. Stable long-time semiclassical description of zero-point energy in high-dimensional molecular systems.

    Science.gov (United States)

    Garashchuk, Sophya; Rassolov, Vitaly A

    2008-07-14

    Semiclassical implementation of the quantum trajectory formalism [J. Chem. Phys. 120, 1181 (2004)] is further developed to give a stable long-time description of zero-point energy in anharmonic systems of high dimensionality. The method is based on a numerically cheap linearized quantum force approach; stabilizing terms compensating for the linearization errors are added into the time-evolution equations for the classical and nonclassical components of the momentum operator. The wave function normalization and energy are rigorously conserved. Numerical tests are performed for model systems of up to 40 degrees of freedom.

  16. Conjugate-Gradient Neural Networks in Classification of Multisource and Very-High-Dimensional Remote Sensing Data

    Science.gov (United States)

    Benediktsson, J. A.; Swain, P. H.; Ersoy, O. K.

    1993-01-01

    Application of neural networks to classification of remote sensing data is discussed. Conventional two-layer backpropagation is found to give good results in classification of remote sensing data but is not efficient in training. A more efficient variant, based on conjugate-gradient optimization, is used for classification of multisource remote sensing and geographic data and very-high-dimensional data. The conjugate-gradient neural networks give excellent performance in classification of multisource data, but do not compare as well with statistical methods in classification of very-high-dimentional data.

  17. Feature Article

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education. Feature Article. Articles in Resonance – Journal of Science Education. Volume 1 Issue 1 January 1996 pp 80-85 Feature Article. What's New in Computers Windows 95 · Vijnan Shastri · More Details Fulltext PDF. Volume 1 Issue 1 January 1996 pp 86-89 Feature ...

  18. Phase space interrogation of the empirical response modes for seismically excited structures

    Science.gov (United States)

    Paul, Bibhas; George, Riya C.; Mishra, Sudib K.

    2017-07-01

    Conventional Phase Space Interrogation (PSI) for structural damage assessment relies on exciting the structure with low dimensional chaotic waveform, thereby, significantly limiting their applicability to large structures. The PSI technique is presently extended for structure subjected to seismic excitations. The high dimensionality of the phase space for seismic response(s) are overcome by the Empirical Mode Decomposition (EMD), decomposing the responses to a number of intrinsic low dimensional oscillatory modes, referred as Intrinsic Mode Functions (IMFs). Along with their low dimensionality, a few IMFs, retain sufficient information of the system dynamics to reflect the damage induced changes. The mutually conflicting nature of low-dimensionality and the sufficiency of dynamic information are taken care by the optimal choice of the IMF(s), which is shown to be the third/fourth IMFs. The optimal IMF(s) are employed for the reconstruction of the Phase space attractor following Taken's embedding theorem. The widely referred Changes in Phase Space Topology (CPST) feature is then employed on these Phase portrait(s) to derive the damage sensitive feature, referred as the CPST of the IMFs (CPST-IMF). The legitimacy of the CPST-IMF is established as a damage sensitive feature by assessing its variation with a number of damage scenarios benchmarked in the IASC-ASCE building. The damage localization capability, remarkable tolerance to noise contamination and the robustness under different seismic excitations of the feature are demonstrated.

  19. Tangent map intermittency as an approximate analysis of intermittency in a high dimensional fully stochastic dynamical system: The Tangled Nature model.

    Science.gov (United States)

    Diaz-Ruelas, Alvaro; Jeldtoft Jensen, Henrik; Piovani, Duccio; Robledo, Alberto

    2016-12-01

    It is well known that low-dimensional nonlinear deterministic maps close to a tangent bifurcation exhibit intermittency and this circumstance has been exploited, e.g., by Procaccia and Schuster [Phys. Rev. A 28, 1210 (1983)], to develop a general theory of 1/f spectra. This suggests it is interesting to study the extent to which the behavior of a high-dimensional stochastic system can be described by such tangent maps. The Tangled Nature (TaNa) Model of evolutionary ecology is an ideal candidate for such a study, a significant model as it is capable of reproducing a broad range of the phenomenology of macroevolution and ecosystems. The TaNa model exhibits strong intermittency reminiscent of punctuated equilibrium and, like the fossil record of mass extinction, the intermittency in the model is found to be non-stationary, a feature typical of many complex systems. We derive a mean-field version for the evolution of the likelihood function controlling the reproduction of species and find a local map close to tangency. This mean-field map, by our own local approximation, is able to describe qualitatively only one episode of the intermittent dynamics of the full TaNa model. To complement this result, we construct a complete nonlinear dynamical system model consisting of successive tangent bifurcations that generates time evolution patterns resembling those of the full TaNa model in macroscopic scales. The switch from one tangent bifurcation to the next in the sequences produced in this model is stochastic in nature, based on criteria obtained from the local mean-field approximation, and capable of imitating the changing set of types of species and total population in the TaNa model. The model combines full deterministic dynamics with instantaneous parameter random jumps at stochastically drawn times. In spite of the limitations of our approach, which entails a drastic collapse of degrees of freedom, the description of a high-dimensional model system in terms of a low

  20. Predicting Future High-Cost Schizophrenia Patients Using High-Dimensional Administrative Data

    Directory of Open Access Journals (Sweden)

    Yajuan Wang

    2017-06-01

    Full Text Available BackgroundThe burden of serious and persistent mental illness such as schizophrenia is substantial and requires health-care organizations to have adequate risk adjustment models to effectively allocate their resources to managing patients who are at the greatest risk. Currently available models underestimate health-care costs for those with mental or behavioral health conditions.ObjectivesThe study aimed to develop and evaluate predictive models for identification of future high-cost schizophrenia patients using advanced supervised machine learning methods.MethodsThis was a retrospective study using a payer administrative database. The study cohort consisted of 97,862 patients diagnosed with schizophrenia (ICD9 code 295.* from January 2009 to June 2014. Training (n = 34,510 and study evaluation (n = 30,077 cohorts were derived based on 12-month observation and prediction windows (PWs. The target was average total cost/patient/month in the PW. Three models (baseline, intermediate, final were developed to assess the value of different variable categories for cost prediction (demographics, coverage, cost, health-care utilization, antipsychotic medication usage, and clinical conditions. Scalable orthogonal regression, significant attribute selection in high dimensions method, and random forests regression were used to develop the models. The trained models were assessed in the evaluation cohort using the regression R2, patient classification accuracy (PCA, and cost accuracy (CA. The model performance was compared to the Centers for Medicare & Medicaid Services Hierarchical Condition Categories (CMS-HCC model.ResultsAt top 10% cost cutoff, the final model achieved 0.23 R2, 43% PCA, and 63% CA; in contrast, the CMS-HCC model achieved 0.09 R2, 27% PCA with 45% CA. The final model and the CMS-HCC model identified 33 and 22%, respectively, of total cost at the top 10% cost cutoff.ConclusionUsing advanced feature selection leveraging detailed

  1. Local Likelihood Approach for High-Dimensional Peaks-Over-Threshold Inference

    KAUST Repository

    Baki, Zhuldyzay

    2018-05-14

    Global warming is affecting the Earth climate year by year, the biggest difference being observable in increasing temperatures in the World Ocean. Following the long- term global ocean warming trend, average sea surface temperatures across the global tropics and subtropics have increased by 0.4–1◦C in the last 40 years. These rates become even higher in semi-enclosed southern seas, such as the Red Sea, threaten- ing the survival of thermal-sensitive species. As average sea surface temperatures are projected to continue to rise, careful study of future developments of extreme temper- atures is paramount for the sustainability of marine ecosystem and biodiversity. In this thesis, we use Extreme-Value Theory to study sea surface temperature extremes from a gridded dataset comprising 16703 locations over the Red Sea. The data were provided by Operational SST and Sea Ice Analysis (OSTIA), a satellite-based data system designed for numerical weather prediction. After pre-processing the data to account for seasonality and global trends, we analyze the marginal distribution of ex- tremes, defined as observations exceeding a high spatially varying threshold, using the Generalized Pareto distribution. This model allows us to extrapolate beyond the ob- served data to compute the 100-year return levels over the entire Red Sea, confirming the increasing trend of extreme temperatures. To understand the dynamics govern- ing the dependence of extreme temperatures in the Red Sea, we propose a flexible local approach based on R-Pareto processes, which extend the univariate Generalized Pareto distribution to the spatial setting. Assuming that the sea surface temperature varies smoothly over space, we perform inference based on the gradient score method over small regional neighborhoods, in which the data are assumed to be stationary in space. This approach allows us to capture spatial non-stationarity, and to reduce the overall computational cost by taking advantage of

  2. Sparse Learning of the Disease Severity Score for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Ivan Stojkovic

    2017-01-01

    Full Text Available Learning disease severity scores automatically from collected measurements may aid in the quality of both healthcare and scientific understanding. Some steps in that direction have been taken and machine learning algorithms for extracting scoring functions from data have been proposed. Given the rapid increase in both quantity and diversity of data measured and stored, the large amount of information is becoming one of the challenges for learning algorithms. In this work, we investigated the direction of the problem where the dimensionality of measured variables is large. Learning the severity score in such cases brings the issue of which of measured features are relevant. We have proposed a novel approach by combining desirable properties of existing formulations, which compares favorably to alternatives in accuracy and especially in the robustness of the learned scoring function. The proposed formulation has a nonsmooth penalty that induces sparsity. This problem is solved by addressing a dual formulation which is smooth and allows an efficient optimization. The proposed approach might be used as an effective and reliable tool for both scoring function learning and biomarker discovery, as demonstrated by identifying a stable set of genes related to influenza symptoms’ severity, which are enriched in immune-related processes.

  3. Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Gascón Adrià

    2017-10-01

    Full Text Available We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013, and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

  4. An automatic iris occlusion estimation method based on high-dimensional density estimation.

    Science.gov (United States)

    Li, Yung-Hui; Savvides, Marios

    2013-04-01

    Iris masks play an important role in iris recognition. They indicate which part of the iris texture map is useful and which part is occluded or contaminated by noisy image artifacts such as eyelashes, eyelids, eyeglasses frames, and specular reflections. The accuracy of the iris mask is extremely important. The performance of the iris recognition system will decrease dramatically when the iris mask is inaccurate, even when the best recognition algorithm is used. Traditionally, people used the rule-based algorithms to estimate iris masks from iris images. However, the accuracy of the iris masks generated this way is questionable. In this work, we propose to use Figueiredo and Jain's Gaussian Mixture Models (FJ-GMMs) to model the underlying probabilistic distributions of both valid and invalid regions on iris images. We also explored possible features and found that Gabor Filter Bank (GFB) provides the most discriminative information for our goal. Finally, we applied Simulated Annealing (SA) technique to optimize the parameters of GFB in order to achieve the best recognition rate. Experimental results show that the masks generated by the proposed algorithm increase the iris recognition rate on both ICE2 and UBIRIS dataset, verifying the effectiveness and importance of our proposed method for iris occlusion estimation.

  5. Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes.

    Science.gov (United States)

    Ren, Jie; He, Tao; Li, Ye; Liu, Sai; Du, Yinhao; Jiang, Yu; Wu, Cen

    2017-05-16

    Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case-control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques. We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence. In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed. Both the simulation study and the analysis of the Nurses's Health Study, a case-control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives.

  6. Use of high-dimensional spectral data to evaluate organic matter, reflectance relationships in soils

    Science.gov (United States)

    Henderson, T. L.; Baumgardner, M. F.; Coster, D. C.; Franzmeier, D. P.; Stott, D. E.

    1990-01-01

    Recent breakthroughs in remote sensing technology have led to the development of a spaceborne high spectral resolution imaging sensor, HIRIS, to be launched in the mid-1990s for observation of earth surface features. The effects of organic carbon content on soil reflectance over the spectral range of HIRIS, and to examine the contributions of humic and fulvic acid fractions to soil reflectance was evaluated. Organic matter from four Indiana agricultural soils was extracted, fractionated, and purified, and six individual components of each soil were isolated and prepared for spectral analysis. The four soils, ranging in organic carbon content from 0.99 percent, represented various combinations of genetic parameters such as parent material, age, drainage, and native vegetation. An experimental procedure was developed to measure reflectance of very small soil and organic component samples in the laboratory, simulating the spectral coverage and resolution of the HIRIS sensor. Reflectance in 210 narrow (10 nm) bands was measured using the CARY 17D spectrophotometer over the 400 to 2500 nm wavelength range. Reflectance data were analyzed statistically to determine the regions of the reflective spectrum which provided useful information about soil organic matter content and composition. Wavebands providing significant information about soil organic carbon content were located in all three major regions of the reflective spectrum: visible, near infrared, and middle infrared. The purified humic acid fractions of the four soils were separable in six bands in the 1600 to 2400 nm range, suggesting that longwave middle infrared reflectance may be useful as a non-destructive laboratory technique for humic acid characterization.

  7. Robust inter-subject audiovisual decoding in functional magnetic resonance imaging using high-dimensional regression.

    Science.gov (United States)

    Raz, Gal; Svanera, Michele; Singer, Neomi; Gilam, Gadi; Cohen, Maya Bleich; Lin, Tamar; Admon, Roee; Gonen, Tal; Thaler, Avner; Granot, Roni Y; Goebel, Rainer; Benini, Sergio; Valente, Giancarlo

    2017-12-01

    Major methodological advancements have been recently made in the field of neural decoding, which is concerned with the reconstruction of mental content from neuroimaging measures. However, in the absence of a large-scale examination of the validity of the decoding models across subjects and content, the extent to which these models can be generalized is not clear. This study addresses the challenge of producing generalizable decoding models, which allow the reconstruction of perceived audiovisual features from human magnetic resonance imaging (fMRI) data without prior training of the algorithm on the decoded content. We applied an adapted version of kernel ridge regression combined with temporal optimization on data acquired during film viewing (234 runs) to generate standardized brain models for sound loudness, speech presence, perceived motion, face-to-frame ratio, lightness, and color brightness. The prediction accuracies were tested on data collected from different subjects watching other movies mainly in another scanner. Substantial and significant (Q FDR movies (R¯=0.62, R¯ = 0.60, R¯ = 0.60, respectively) with high reproducibility of the predictors across subjects. The face ratio model produced significant correlations in 7 out of 8 movies (R¯=0.56). The lightness and brightness models did not show robustness (R¯=0.23, R¯ = 0). Further analysis of additional data (95 runs) indicated that loudness reconstruction veridicality can consistently reveal relevant group differences in musical experience. The findings point to the validity and generalizability of our loudness, speech, motion, and face ratio models for complex cinematic stimuli (as well as for music in the case of loudness). While future research should further validate these models using controlled stimuli and explore the feasibility of extracting more complex models via this method, the reliability of our results indicates the potential usefulness of the approach and the resulting models

  8. Strategies to reduce the complexity of hydrologic data assimilation for high-dimensional models

    Science.gov (United States)

    Hernandez, F.; Liang, X.

    2017-12-01

    Probabilistic forecasts in the geosciences offer invaluable information by allowing to estimate the uncertainty of predicted conditions (including threats like floods and droughts). However, while forecast systems based on modern data assimilation algorithms are capable of producing multi-variate probability distributions of future conditions, the computational resources required to fully characterize the dependencies between the model's state variables render their applicability impractical for high-resolution cases. This occurs because of the quadratic space complexity of storing the covariance matrices that encode these dependencies and the cubic time complexity of performing inference operations with them. In this work we introduce two complementary strategies to reduce the size of the covariance matrices that are at the heart of Bayesian assimilation methods—like some variants of (ensemble) Kalman filters and of particle filters—and variational methods. The first strategy involves the optimized grouping of state variables by clustering individual cells of the model into "super-cells." A dynamic fuzzy clustering approach is used to take into account the states (e.g., soil moisture) and forcings (e.g., precipitation) of each cell at each time step. The second strategy consists in finding a compressed representation of the covariance matrix that still encodes the most relevant information but that can be more efficiently stored and processed. A learning and a belief-propagation inference algorithm are developed to take advantage of this modified low-rank representation. The two proposed strategies are incorporated into OPTIMISTS, a state-of-the-art hybrid Bayesian/variational data assimilation algorithm, and comparative streamflow forecasting tests are performed using two watersheds modeled with the Distributed Hydrology Soil Vegetation Model (DHSVM). Contrasts are made between the efficiency gains and forecast accuracy losses of each strategy used in

  9. High-Dimensional Analysis of Convex Optimization-Based Massive MIMO Decoders

    KAUST Repository

    Ben Atitallah, Ismail

    2017-04-01

    A wide range of modern large-scale systems relies on recovering a signal from noisy linear measurements. In many applications, the useful signal has inherent properties, such as sparsity, low-rankness, or boundedness, and making use of these properties and structures allow a more efficient recovery. Hence, a significant amount of work has been dedicated to developing and analyzing algorithms that can take advantage of the signal structure. Especially, since the advent of Compressed Sensing (CS) there has been significant progress towards this direction. Generally speaking, the signal structure can be harnessed by solving an appropriate regularized or constrained M-estimator. In modern Multi-input Multi-output (MIMO) communication systems, all transmitted signals are drawn from finite constellations and are thus bounded. Besides, most recent modulation schemes such as Generalized Space Shift Keying (GSSK) or Generalized Spatial Modulation (GSM) yield signals that are inherently sparse. In the recovery procedure, boundedness and sparsity can be promoted by using the ℓ1 norm regularization and by imposing an ℓ∞ norm constraint respectively. In this thesis, we propose novel optimization algorithms to recover certain classes of structured signals with emphasis on MIMO communication systems. The exact analysis permits a clear characterization of how well these systems perform. Also, it allows an automatic tuning of the parameters. In each context, we define the appropriate performance metrics and we analyze them exactly in the High Dimentional Regime (HDR). The framework we use for the analysis is based on Gaussian process inequalities; in particular, on a new strong and tight version of a classical comparison inequality (due to Gordon, 1988) in the presence of additional convexity assumptions. The new framework that emerged from this inequality is coined as Convex Gaussian Min-max Theorem (CGMT).

  10. Control-group feature normalization for multivariate pattern analysis of structural MRI data using the support vector machine.

    Science.gov (United States)

    Linn, Kristin A; Gaonkar, Bilwaj; Satterthwaite, Theodore D; Doshi, Jimit; Davatzikos, Christos; Shinohara, Russell T

    2016-05-15

    Normalization of feature vector values is a common practice in machine learning. Generally, each feature value is standardized to the unit hypercube or by normalizing to zero mean and unit variance. Classification decisions based on support vector machines (SVMs) or by other methods are sensitive to the specific normalization used on the features. In the context of multivariate pattern analysis using neuroimaging data, standardization effectively up- and down-weights features based on their individual variability. Since the standard approach uses the entire data set to guide the normalization, it utilizes the total variability of these features. This total variation is inevitably dependent on the amount of marginal separation between groups. Thus, such a normalization may attenuate the separability of the data in high dimensional space. In this work we propose an alternate approach that uses an estimate of the control-group standard deviation to normalize features before training. We study our proposed approach in the context of group classification using structural MRI data. We show that control-based normalization leads to better reproducibility of estimated multivariate disease patterns and improves the classifier performance in many cases. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. High dimensional ICA analysis detects within-network functional connectivity damage of default mode and sensory motor networks in Alzheimer's disease

    Directory of Open Access Journals (Sweden)

    Ottavia eDipasquale

    2015-02-01

    Full Text Available High dimensional independent component analysis (ICA, compared to low dimensional ICA, allows performing a detailed parcellation of the resting state networks. The purpose of this study was to give further insight into functional connectivity (FC in Alzheimer’s disease (AD using high dimensional ICA. For this reason, we performed both low and high dimensional ICA analyses of resting state fMRI (rfMRI data of 20 healthy controls and 21 AD patients, focusing on the primarily altered default mode network (DMN and exploring the sensory motor network (SMN. As expected, results obtained at low dimensionality were in line with previous literature. Moreover, high dimensional results allowed us to observe either the presence of within-network disconnections and FC damage confined to some of the resting state sub-networks. Due to the higher sensitivity of the high dimensional ICA analysis, our results suggest that high-dimensional decomposition in sub-networks is very promising to better localize FC alterations in AD and that FC damage is not confined to the default mode network.

  12. Cluster expression in fission and fusion in high-dimensional macroscopic-microscopic calculations

    International Nuclear Information System (INIS)

    Iwamoto, Akira; Ichikawa, Takatoshi; Moller, Peter; Sierk, Arnold J.

    2004-01-01

    We discuss the relation between the fission-fusion potential-energy surfaces of very heavy nuclei and the formation process of these nuclei in cold-fusion reactions. In the potential-energy surfaces, we find a pronounced valley structure, with one valley corresponding to the cold-fusion reaction, the other to fission. As the touching point is approached in the cold-fusion entrance channel, an instability towards dynamical deformation of the projectile occurs, which enhances the fusion cross section. These two 'cluster effects' enhance the production of superheavy nuclei in cold-fusion reactions, in addition to the effect of the low compound-system excitation energy in these reactions. Heavy-ion fusion reactions have been used extensively to synthesize heavy elements beyond actinide nuclei. In order to proceed further in this direction, we need to understand the formation process more precisely, not just the decay process. The dynamics of the formation process are considerably more complex than the dynamics necessary to interpret the spontaneous-fission decay of heavy elements. However, before implementing a full dynamical description it is useful to understand the basic properties of the potential-energy landscape encountered in the initial stages of the collision. The collision process and entrance-channel landscape can conveniently be separated into two parts, namely the early-stage separated system before touching and the late-stage composite system after touching. The transition between these two stages is particularly important, but not very well understood until now. To understand better the transition between the two stages we analyze here in detail the potential energy landscape or 'collision surface' of the system both outside and inside the touching configuration of the target and projectile. In Sec. 2, we discuss calculated five-dimensional potential-energy landscapes inside touching and identify major features. In Sec. 3, we present calculated

  13. Incorporating Data Link Features into a Multi-Function Display to Support Self-Separation and Spacing Tasks for General Aviation Pilots

    Science.gov (United States)

    Adams, Catherine A.; Murdoch, Jennifer L.; Consiglio, Maria C.; WIlliams, Daniel M.

    2005-01-01

    One objective of the Small Aircraft Transportation System (SATS) Higher Volume Operations (HVO) project is to increase the capacity and utilization of small non-towered, non-radar equipped airports by transferring traffic management activities to an automated Airport Management Module (AMM) and separation responsibilities to general aviation (GA) pilots. Implementation of this concept required the development of a research Multi-Function Display (MFD) to support the interactive communications between pilots and the AMM. The interface also had to accommodate traffic awareness, self-separation, and spacing tasks through dynamic messaging and symbology for flight path conformance and conflict detection and alerting (CDA). The display served as the mechanism to support the examination of the viability of executing instrument operations designed for SATS designated airports. Results of simulation and flight experiments conducted at the National Aeronautics and Space Administration's (NASA) Langley Research Center indicate that the concept, as facilitated by the research MFD, did not increase pilots subjective workload levels or reduce their situation awareness (SA). Post-test usability assessments revealed that pilots preferred using the enhanced MFD to execute flight procedures, reporting improved SA over conventional instrument flight rules (IFR) procedures.

  14. A hybrid fault diagnosis approach based on mixed-domain state features for rotating machinery.

    Science.gov (United States)

    Xue, Xiaoming; Zhou, Jianzhong

    2017-01-01

    To make further improvement in the diagnosis accuracy and efficiency, a mixed-domain state features data based hybrid fault diagnosis approach, which systematically blends both the statistical analysis approach and the artificial intelligence technology, is proposed in this work for rolling element bearings. For simplifying the fault diagnosis problems, the execution of the proposed method is divided into three steps, i.e., fault preliminary detection, fault type recognition and fault degree identification. In the first step, a preliminary judgment about the health status of the equipment can be evaluated by the statistical analysis method based on the permutation entropy theory. If fault exists, the following two processes based on the artificial intelligence approach are performed to further recognize the fault type and then identify the fault degree. For the two subsequent steps, mixed-domain state features containing time-domain, frequency-domain and multi-scale features are extracted to represent the fault peculiarity under different working conditions. As a powerful time-frequency analysis method, the fast EEMD method was employed to obtain multi-scale features. Furthermore, due to the information redundancy and the submergence of original feature space, a novel manifold learning method (modified LGPCA) is introduced to realize the low-dimensional representations for high-dimensional feature space. Finally, two cases with 12 working conditions respectively have been employed to evaluate the performance of the proposed method, where vibration signals were measured from an experimental bench of rolling element bearing. The analysis results showed the effectiveness and the superiority of the proposed method of which the diagnosis thought is more suitable for practical application. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  15. A selective overview of feature screening for ultrahigh-dimensional data.

    Science.gov (United States)

    JingYuan, Liu; Wei, Zhong; RunZe, L I

    2015-10-01

    High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of high-dimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many high-dimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

  16. Thermodynamics of noncommutative high-dimensional AdS black holes with non-Gaussian smeared matter distributions

    Energy Technology Data Exchange (ETDEWEB)

    Miao, Yan-Gang [Nankai University, School of Physics, Tianjin (China); Chinese Academy of Sciences, State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, P.O. Box 2735, Beijing (China); CERN, PH-TH Division, Geneva 23 (Switzerland); Xu, Zhen-Ming [Nankai University, School of Physics, Tianjin (China)

    2016-04-15

    Considering non-Gaussian smeared matter distributions, we investigate the thermodynamic behaviors of the noncommutative high-dimensional Schwarzschild-Tangherlini anti-de Sitter black hole, and we obtain the condition for the existence of extreme black holes. We indicate that the Gaussian smeared matter distribution, which is a special case of non-Gaussian smeared matter distributions, is not applicable for the six- and higher-dimensional black holes due to the hoop conjecture. In particular, the phase transition is analyzed in detail. Moreover, we point out that the Maxwell equal area law holds for the noncommutative black hole whose Hawking temperature is within a specific range, but fails for one whose the Hawking temperature is beyond this range. (orig.)

  17. Thermodynamics of noncommutative high-dimensional AdS black holes with non-Gaussian smeared matter distributions

    CERN Document Server

    Miao, Yan-Gang

    2016-01-01

    Considering non-Gaussian smeared matter distributions, we investigate thermodynamic behaviors of the noncommutative high-dimensional Schwarzschild-Tangherlini anti-de Sitter black hole, and obtain the condition for the existence of extreme black holes. We indicate that the Gaussian smeared matter distribution, which is a special case of non-Gaussian smeared matter distributions, is not applicable for the 6- and higher-dimensional black holes due to the hoop conjecture. In particular, the phase transition is analyzed in detail. Moreover, we point out that the Maxwell equal area law maintains for the noncommutative black hole with the Hawking temperature within a specific range, but fails with the Hawking temperature beyond this range.

  18. Estimation of the local response to a forcing in a high dimensional system using the fluctuation-dissipation theorem

    Directory of Open Access Journals (Sweden)

    F. C. Cooper

    2013-04-01

    Full Text Available The fluctuation-dissipation theorem (FDT has been proposed as a method of calculating the response of the earth's atmosphere to a forcing. For this problem the high dimensionality of the relevant data sets makes truncation necessary. Here we propose a method of truncation based upon the assumption that the response to a localised forcing is spatially localised, as an alternative to the standard method of choosing a number of the leading empirical orthogonal functions. For systems where this assumption holds, the response to any sufficiently small non-localised forcing may be estimated using a set of truncations that are chosen algorithmically. We test our algorithm using 36 and 72 variable versions of a stochastic Lorenz 95 system of ordinary differential equations. We find that, for long integrations, the bias in the response estimated by the FDT is reduced from ~75% of the true response to ~30%.

  19. Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.

    Directory of Open Access Journals (Sweden)

    Ali Dashti

    Full Text Available This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU. The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.

  20. Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data.

    Science.gov (United States)

    McParland, D; Phillips, C M; Brennan, L; Roche, H M; Gormley, I C

    2017-12-10

    The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  1. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

    Directory of Open Access Journals (Sweden)

    Himmelreich Uwe

    2009-07-01

    Full Text Available Abstract Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

  2. A high-resolution atlas of the infrared spectrum of the Sun and the Earth atmosphere from space. Volume 3: Key to identification of solar features

    Science.gov (United States)

    Geller, Murray

    1992-01-01

    During the period April 29 through May 2, 1985, the Atmospheric Trace Molecule Spectroscopy (ATMOS) experiment was operated as part of the Spacelab-3 (SL-3) payload on the shuttle Challenger. The instrument, a Fourier transform spectrometer, recorded over 2000 infrared solar spectra from an altitude of 360 km. Although the majority of the spectra were taken through the limb of the Earth's atmosphere in order to better understand its composition, several hundred of the 'high-sun' spectra were completely free from telluric absorption. These high-sun spectra recorded from space are, at the present time, the only high-resolution infrared spectra ever taken of the Sun free from absorptions due to constituents in the Earth's atmosphere. Volumes 1 and 2 of this series provide a compilation of these spectra arranged in a format suitable for quick-look reference purposes and are the first record of the continuous high-resolution infrared spectrum of the Sun and the Earth's atmosphere from space. In the Table of Identifications, which constitutes the main body of this volume, each block of eight wavenumbers is given a separate heading and corresponds to a page of two panels in Volume 1 of this series. In addition, three separate blocks of data available from ATMOS from 622-630 cm(exp -1), 630-638 cm(exp -1) and 638-646 cm(exp -1), excluded from Volume 1 because of the low signal-to-noise ratio, have been included due to the certain identification of several OH and NH transitions. In the first column of the table, the corrected frequency is given. The second column identifies the molecular species. The third and fourth columns represent the assigned transition. The fifth column gives the depth of the molecular line in millimeters. Also included in this column is a notation to indicate whether the line is a blend or lies on the shoulder(s) of another line(s). The final column repeats a question mark if the line is unidentified.

  3. High dimension feature extraction based visualized SOM fault diagnosis method and its application in p-xylene oxidation process☆

    Institute of Scientific and Technical Information of China (English)

    Ying Tian; Wenli Du; Feng Qian

    2015-01-01

    Purified terephthalic acid (PTA) is an important chemical raw material. P-xylene (PX) is transformed to terephthalic acid (TA) through oxidation process and TA is refined to produce PTA. The PX oxidation reaction is a complex process involving three-phase reaction of gas, liquid and solid. To monitor the process and to im-prove the product quality, as wel as to visualize the fault type clearly, a fault diagnosis method based on self-organizing map (SOM) and high dimensional feature extraction method, local tangent space alignment (LTSA), is proposed. In this method, LTSA can reduce the dimension and keep the topology information simultaneously, and SOM distinguishes various states on the output map. Monitoring results of PX oxidation reaction process in-dicate that the LTSA–SOM can wel detect and visualize the fault type.

  4. Feature Extraction

    CERN Document Server

    CERN. Geneva

    2015-01-01

    Feature selection and reduction are key to robust multivariate analyses. In this talk I will focus on pros and cons of various variable selection methods and focus on those that are most relevant in the context of HEP.

  5. Solar Features

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Collection includes a variety of solar feature datasets contributed by a number of national and private solar observatories located worldwide.

  6. Site Features

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset consists of various site features from multiple Superfund sites in U.S. EPA Region 8. These data were acquired from multiple sources at different times...

  7. A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2014-11-01

    Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

  8. Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate

    Science.gov (United States)

    Padmanaban, Subash; Baker, Justin; Greger, Bradley

    2018-01-01

    Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding

  9. Mapping the Indonesian territory, based on pollution, social demography and geographical data, using self organizing feature map

    Science.gov (United States)

    Hernawati, Kuswari; Insani, Nur; Bambang S. H., M.; Nur Hadi, W.; Sahid

    2017-08-01

    This research aims to mapping the 33 (thirty-three) provinces in Indonesia, based on the data on air, water and soil pollution, as well as social demography and geography data, into a clustered model. The method used in this study was unsupervised method that combines the basic concept of Kohonen or Self-Organizing Feature Maps (SOFM). The method is done by providing the design parameters for the model based on data related directly/ indirectly to pollution, which are the demographic and social data, pollution levels of air, water and soil, as well as the geographical situation of each province. The parameters used consists of 19 features/characteristics, including the human development index, the number of vehicles, the availability of the plant's water absorption and flood prevention, as well as geographic and demographic situation. The data used were secondary data from the Central Statistics Agency (BPS), Indonesia. The data are mapped into SOFM from a high-dimensional vector space into two-dimensional vector space according to the closeness of location in term of Euclidean distance. The resulting outputs are represented in clustered grouping. Thirty-three provinces are grouped into five clusters, where each cluster has different features/characteristics and level of pollution. The result can used to help the efforts on prevention and resolution of pollution problems on each cluster in an effective and efficient way.

  10. A prototype feature system for feature retrieval using relationships

    Science.gov (United States)

    Choi, J.; Usery, E.L.

    2009-01-01

    Using a feature data model, geographic phenomena can be represented effectively by integrating space, theme, and time. This paper extends and implements a feature data model that supports query and visualization of geographic features using their non-spatial and temporal relationships. A prototype feature-oriented geographic information system (FOGIS) is then developed and storage of features named Feature Database is designed. Buildings from the U.S. Marine Corps Base, Camp Lejeune, North Carolina and subways in Chicago, Illinois are used to test the developed system. The results of the applications show the strength of the feature data model and the developed system 'FOGIS' when they utilize non-spatial and temporal relationships in order to retrieve and visualize individual features.

  11. SkyFACT: high-dimensional modeling of gamma-ray emission with adaptive templates and penalized likelihoods

    Energy Technology Data Exchange (ETDEWEB)

    Storm, Emma; Weniger, Christoph [GRAPPA, Institute of Physics, University of Amsterdam, Science Park 904, 1090 GL Amsterdam (Netherlands); Calore, Francesca, E-mail: e.m.storm@uva.nl, E-mail: c.weniger@uva.nl, E-mail: francesca.calore@lapth.cnrs.fr [LAPTh, CNRS, 9 Chemin de Bellevue, BP-110, Annecy-le-Vieux, 74941, Annecy Cedex (France)

    2017-08-01

    We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (∼> 10{sup 5}) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |ℓ|<90{sup o} and | b |<20{sup o}, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.

  12. CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Malgorzata Nowicka

    2017-05-01

    Full Text Available High dimensional mass and flow cytometry (HDCyto experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots, reporting of clustering results (dimensionality reduction, heatmaps with dendrograms and differential analyses (e.g. plots of aggregated signals.

  13. Constrained optimization by radial basis function interpolation for high-dimensional expensive black-box problems with infeasible initial points

    Science.gov (United States)

    Regis, Rommel G.

    2014-02-01

    This article develops two new algorithms for constrained expensive black-box optimization that use radial basis function surrogates for the objective and constraint functions. These algorithms are called COBRA and Extended ConstrLMSRBF and, unlike previous surrogate-based approaches, they can be used for high-dimensional problems where all initial points are infeasible. They both follow a two-phase approach where the first phase finds a feasible point while the second phase improves this feasible point. COBRA and Extended ConstrLMSRBF are compared with alternative methods on 20 test problems and on the MOPTA08 benchmark automotive problem (D.R. Jones, Presented at MOPTA 2008), which has 124 decision variables and 68 black-box inequality constraints. The alternatives include a sequential penalty derivative-free algorithm, a direct search method with kriging surrogates, and two multistart methods. Numerical results show that COBRA algorithms are competitive with Extended ConstrLMSRBF and they generally outperform the alternatives on the MOPTA08 problem and most of the test problems.

  14. Modeling genome-wide dynamic regulatory network in mouse lungs with influenza infection using high-dimensional ordinary differential equations.

    Science.gov (United States)

    Wu, Shuang; Liu, Zhi-Ping; Qiu, Xing; Wu, Hulin

    2014-01-01

    The immune response to viral infection is regulated by an intricate network of many genes and their products. The reverse engineering of gene regulatory networks (GRNs) using mathematical models from time course gene expression data collected after influenza infection is key to our understanding of the mechanisms involved in controlling influenza infection within a host. A five-step pipeline: detection of temporally differentially expressed genes, clustering genes into co-expressed modules, identification of network structure, parameter estimate refinement, and functional enrichment analysis, is developed for reconstructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. Applying the pipeline to the time course gene expression data from influenza-infected mouse lungs, we have identified 20 distinct temporal expression patterns in the differentially expressed genes and constructed a module-based dynamic network using a linear ODE model. Both intra-module and inter-module annotations and regulatory relationships of our inferred network show some interesting findings and are highly consistent with existing knowledge about the immune response in mice after influenza infection. The proposed method is a computationally efficient, data-driven pipeline bridging experimental data, mathematical modeling, and statistical analysis. The application to the influenza infection data elucidates the potentials of our pipeline in providing valuable insights into systematic modeling of complicated biological processes.

  15. Enhanced spectral resolution by high-dimensional NMR using the filter diagonalization method and “hidden” dimensions

    Science.gov (United States)

    Meng, Xi; Nguyen, Bao D.; Ridge, Clark; Shaka, A. J.

    2009-01-01

    High-dimensional (HD) NMR spectra have poorer digital resolution than low-dimensional (LD) spectra, for a fixed amount of experiment time. This has led to “reduced-dimensionality” strategies, in which several LD projections of the HD NMR spectrum are acquired, each with higher digital resolution; an approximate HD spectrum is then inferred by some means. We propose a strategy that moves in the opposite direction, by adding more time dimensions to increase the information content of the data set, even if only a very sparse time grid is used in each dimension. The full HD time-domain data can be analyzed by the Filter Diagonalization Method (FDM), yielding very narrow resonances along all of the frequency axes, even those with sparse sampling. Integrating over the added dimensions of HD FDM NMR spectra reconstitutes LD spectra with enhanced resolution, often more quickly than direct acquisition of the LD spectrum with a larger number of grid points in each of the fewer dimensions. If the extra dimensions do not appear in the final spectrum, and are used solely to boost information content, we propose the moniker hidden-dimension NMR. This work shows that HD peaks have unmistakable frequency signatures that can be detected as single HD objects by an appropriate algorithm, even though their patterns would be tricky for a human operator to visualize or recognize, and even if digital resolution in an HD FT spectrum is very coarse compared with natural line widths. PMID:18926747

  16. Big Data Challenges of High-Dimensional Continuous-Time Mean-Variance Portfolio Selection and a Remedy.

    Science.gov (United States)

    Chiu, Mei Choi; Pun, Chi Seng; Wong, Hoi Ying

    2017-08-01

    Investors interested in the global financial market must analyze financial securities internationally. Making an optimal global investment decision involves processing a huge amount of data for a high-dimensional portfolio. This article investigates the big data challenges of two mean-variance optimal portfolios: continuous-time precommitment and constant-rebalancing strategies. We show that both optimized portfolios implemented with the traditional sample estimates converge to the worst performing portfolio when the portfolio size becomes large. The crux of the problem is the estimation error accumulated from the huge dimension of stock data. We then propose a linear programming optimal (LPO) portfolio framework, which applies a constrained ℓ 1 minimization to the theoretical optimal control to mitigate the risk associated with the dimensionality issue. The resulting portfolio becomes a sparse portfolio that selects stocks with a data-driven procedure and hence offers a stable mean-variance portfolio in practice. When the number of observations becomes large, the LPO portfolio converges to the oracle optimal portfolio, which is free of estimation error, even though the number of stocks grows faster than the number of observations. Our numerical and empirical studies demonstrate the superiority of the proposed approach. © 2017 Society for Risk Analysis.

  17. Low-storage implicit/explicit Runge-Kutta schemes for the simulation of stiff high-dimensional ODE systems

    Science.gov (United States)

    Cavaglieri, Daniele; Bewley, Thomas

    2015-04-01

    Implicit/explicit (IMEX) Runge-Kutta (RK) schemes are effective for time-marching ODE systems with both stiff and nonstiff terms on the RHS; such schemes implement an (often A-stable or better) implicit RK scheme for the stiff part of the ODE, which is often linear, and, simultaneously, a (more convenient) explicit RK scheme for the nonstiff part of the ODE, which is often nonlinear. Low-storage RK schemes are especially effective for time-marching high-dimensional ODE discretizations of PDE systems on modern (cache-based) computational hardware, in which memory management is often the most significant computational bottleneck. In this paper, we develop and characterize eight new low-storage implicit/explicit RK schemes which have higher accuracy and better stability properties than the only low-storage implicit/explicit RK scheme available previously, the venerable second-order Crank-Nicolson/Runge-Kutta-Wray (CN/RKW3) algorithm that has dominated the DNS/LES literature for the last 25 years, while requiring similar storage (two, three, or four registers of length N) and comparable floating-point operations per timestep.

  18. Fault diagnosis of rotating machine by isometric feature mapping

    International Nuclear Information System (INIS)

    Zhang, Yun; Li, Benwei; Wang, Lin; Wang, Wen; Wang, Zibin

    2013-01-01

    Principal component analysis (PCA) and linear discriminate analysis (LDA) are well-known linear dimensionality reductions for fault classification. However, since they are linear methods, they perform not well for high-dimensional data that has the nonlinear geometric structure. As kernel extension of PCA, Kernel PCA is used for nonlinear fault classification. However, the performance of Kernel PCA largely depends on its kernel function which can only be empirically selected from finite candidates. Thus, a novel rotating machine fault diagnosis approach based on geometrically motivated nonlinear dimensionality reduction named isometric feature mapping (Isomap) is proposed. The approach can effectively extract the intrinsic nonlinear manifold features embedded in high-dimensional fault data sets. Experimental results with rotor and rolling bearing data show that the proposed approach overcomes the flaw of conventional fault pattern recognition approaches and obviously improves the fault classification performance.

  19. Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction

    Directory of Open Access Journals (Sweden)

    Boulesteix Anne-Laure

    2009-12-01

    Full Text Available Abstract Background In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively assessed in the present manuscript. The focus of our work is on class prediction based on high-dimensional data (e.g. microarray data, since such analyses are particularly exposed to this kind of bias. Methods In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure. Results We assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case and the bias resulting from the choice of the classification method are examined both separately and jointly. Conclusions The median minimal error rate over the investigated classifiers was as low as 31% and 41% based on permuted uninformative predictors from studies on colon cancer and prostate cancer, respectively. We conclude that the strategy to present only the optimal result is not acceptable because it yields a substantial bias in error rate estimation, and suggest alternative approaches for properly reporting classification accuracy.

  20. An overview of techniques for linking high-dimensional molecular data to time-to-event endpoints by risk prediction models.

    Science.gov (United States)

    Binder, Harald; Porzelius, Christine; Schumacher, Martin

    2011-03-01

    Analysis of molecular data promises identification of biomarkers for improving prognostic models, thus potentially enabling better patient management. For identifying such biomarkers, risk prediction models can be employed that link high-dimensional molecular covariate data to a clinical endpoint. In low-dimensional settings, a multitude of statistical techniques already exists for building such models, e.g. allowing for variable selection or for quantifying the added value of a new biomarker. We provide an overview of techniques for regularized estimation that transfer this toward high-dimensional settings, with a focus on models for time-to-event endpoints. Techniques for incorporating specific covariate structure are discussed, as well as techniques for dealing with more complex endpoints. Employing gene expression data from patients with diffuse large B-cell lymphoma, some typical modeling issues from low-dimensional settings are illustrated in a high-dimensional application. First, the performance of classical stepwise regression is compared to stage-wise regression, as implemented by a component-wise likelihood-based boosting approach. A second issues arises, when artificially transforming the response into a binary variable. The effects of the resulting loss of efficiency and potential bias in a high-dimensional setting are illustrated, and a link to competing risks models is provided. Finally, we discuss conditions for adequately quantifying the added value of high-dimensional gene expression measurements, both at the stage of model fitting and when performing evaluation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. A Penalized Likelihood Framework For High-Dimensional Phylogenetic Comparative Methods And An Application To New-World Monkeys Brain Evolution.

    Science.gov (United States)

    Julien, Clavel; Leandro, Aristide; Hélène, Morlon

    2018-06-19

    Working with high-dimensional phylogenetic comparative datasets is challenging because likelihood-based multivariate methods suffer from low statistical performances as the number of traits p approaches the number of species n and because some computational complications occur when p exceeds n. Alternative phylogenetic comparative methods have recently been proposed to deal with the large p small n scenario but their use and performances are limited. Here we develop a penalized likelihood framework to deal with high-dimensional comparative datasets. We propose various penalizations and methods for selecting the intensity of the penalties. We apply this general framework to the estimation of parameters (the evolutionary trait covariance matrix and parameters of the evolutionary model) and model comparison for the high-dimensional multivariate Brownian (BM), Early-burst (EB), Ornstein-Uhlenbeck (OU) and Pagel's lambda models. We show using simulations that our penalized likelihood approach dramatically improves the estimation of evolutionary trait covariance matrices and model parameters when p approaches n, and allows for their accurate estimation when p equals or exceeds n. In addition, we show that penalized likelihood models can be efficiently compared using Generalized Information Criterion (GIC). We implement these methods, as well as the related estimation of ancestral states and the computation of phylogenetic PCA in the R package RPANDA and mvMORPH. Finally, we illustrate the utility of the new proposed framework by evaluating evolutionary models fit, analyzing integration patterns, and reconstructing evolutionary trajectories for a high-dimensional 3-D dataset of brain shape in the New World monkeys. We find a clear support for an Early-burst model suggesting an early diversification of brain morphology during the ecological radiation of the clade. Penalized likelihood offers an efficient way to deal with high-dimensional multivariate comparative data.

  2. MetaFIND: A feature analysis tool for metabolomics data

    Directory of Open Access Journals (Sweden)

    Cunningham Pádraig

    2008-11-01

    Full Text Available Abstract Background Metabolomics, or metabonomics, refers to the quantitative analysis of all metabolites present within a biological sample and is generally carried out using NMR spectroscopy or Mass Spectrometry. Such analysis produces a set of peaks, or features, indicative of the metabolic composition of the sample and may be used as a basis for sample classification. Feature selection may be employed to improve classification accuracy or aid model explanation by establishing a subset of class discriminating features. Factors such as experimental noise, choice of technique and threshold selection may adversely affect the set of selected features retrieved. Furthermore, the high dimensionality and multi-collinearity inherent within metabolomics data may exacerbate discrepancies between the set of features retrieved and those required to provide a complete explanation of metabolite signatures. Given these issues, the latter in particular, we present the MetaFIND application for 'post-feature selection' correlation analysis of metabolomics data. Results In our evaluation we show how MetaFIND may be used to elucidate metabolite signatures from the set of features selected by diverse techniques over two metabolomics datasets. Importantly, we also show how MetaFIND may augment standard feature selection and aid the discovery of additional significant features, including those which represent novel class discriminating metabolites. MetaFIND also supports the discovery of higher level metabolite correlations. Conclusion Standard feature selection techniques may fail to capture the full set of relevant features in the case of high dimensional, multi-collinear metabolomics data. We show that the MetaFIND 'post-feature selection' analysis tool may aid metabolite signature elucidation, feature discovery and inference of metabolic correlations.

  3. Featuring animacy

    Directory of Open Access Journals (Sweden)

    Elizabeth Ritter

    2015-01-01

    Full Text Available Algonquian languages are famous for their animacy-based grammatical properties—an animacy based noun classification system and direct/inverse system which gives rise to animacy hierarchy effects in the determination of verb agreement. In this paper I provide new evidence for the proposal that the distinctive properties of these languages is due to the use of participant-based features, rather than spatio-temporal ones, for both nominal and verbal functional categories (Ritter & Wiltschko 2009, 2014. Building on Wiltschko (2012, I develop a formal treatment of the Blackfoot aspectual system that assumes a category Inner Aspect (cf. MacDonald 2008, Travis 1991, 2010. Focusing on lexical aspect in Blackfoot, I demonstrate that the classification of both nouns (Seinsarten and verbs (Aktionsarten is based on animacy, rather than boundedness, resulting in a strikingly different aspectual system for both categories. 

  4. Image Classification Using Biomimetic Pattern Recognition with Convolutional Neural Networks Features

    Science.gov (United States)

    Huo, Guanying

    2017-01-01

    As a typical deep-learning model, Convolutional Neural Networks (CNNs) can be exploited to automatically extract features from images using the hierarchical structure inspired by mammalian visual system. For image classification tasks, traditional CNN models employ the softmax function for classification. However, owing to the limited capacity of the softmax function, there are some shortcomings of traditional CNN models in image classification. To deal with this problem, a new method combining Biomimetic Pattern Recognition (BPR) with CNNs is proposed for image classification. BPR performs class recognition by a union of geometrical cover sets in a high-dimensional feature space and therefore can overcome some disadvantages of traditional pattern recognition. The proposed method is evaluated on three famous image classification benchmarks, that is, MNIST, AR, and CIFAR-10. The classification accuracies of the proposed method for the three datasets are 99.01%, 98.40%, and 87.11%, respectively, which are much higher in comparison with the other four methods in most cases. PMID:28316614

  5. Underground spaces/cybernetic spaces

    Directory of Open Access Journals (Sweden)

    Tomaž Novljan

    2000-01-01

    Full Text Available A modern city space is a space where in the vertical and horizontal direction dynamic, non-linear processes exist, similar as in nature. Alongside the “common” city surface, cities have underground spaces as well that are increasingly affecting the functioning of the former. It is the space of material and cybernetic communication/transport. The psychophysical specifics of using underground places have an important role in their conceptualisation. The most evident facts being their limited volume and often limited connections to the surface and increased level of potential dangers of all kinds. An efficient mode for alleviating the effects of these specific features are artistic interventions, such as: shape, colour, lighting, all applications of the basic principles of fractal theory.

  6. Space polypropulsion

    Science.gov (United States)

    Kellett, B. J.; Griffin, D. K.; Bingham, R.; Campbell, R. N.; Forbes, A.; Michaelis, M. M.

    2008-05-01

    Hybrid space propulsion has been a feature of most space missions. Only the very early rocket propulsion experiments like the V2, employed a single form of propulsion. By the late fifties multi-staging was routine and the Space Shuttle employs three different kinds of fuel and rocket engines. During the development of chemical rockets, other forms of propulsion were being slowly tested, both theoretically and, relatively slowly, in practice. Rail and gas guns, ion engines, "slingshot" gravity assist, nuclear and solar power, tethers, solar sails have all seen some real applications. Yet the earliest type of non-chemical space propulsion to be thought of has never been attempted in space: laser and photon propulsion. The ideas of Eugen Saenger, Georgii Marx, Arthur Kantrowitz, Leik Myrabo, Claude Phipps and Robert Forward remain Earth-bound. In this paper we summarize the various forms of nonchemical propulsion and their results. We point out that missions beyond Saturn would benefit from a change of attitude to laser-propulsion as well as consideration of hybrid "polypropulsion" - which is to say using all the rocket "tools" available rather than possibly not the most appropriate. We conclude with three practical examples, two for the next decades and one for the next century; disposal of nuclear waste in space; a grand tour of the Jovian and Saturnian moons - with Huygens or Lunoxod type, landers; and eventually mankind's greatest space dream: robotic exploration of neighbouring planetary systems.

  7. Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data

    Directory of Open Access Journals (Sweden)

    Raftery Adrian E

    2009-02-01

    Full Text Available Abstract Background Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. Results We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test. Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p

  8. Resonance-Based Time-Frequency Manifold for Feature Extraction of Ship-Radiated Noise

    Science.gov (United States)

    Yan, Jiaquan; Sun, Haixin; Chen, Hailan; Junejo, Naveed Ur Rehman; Cheng, En

    2018-01-01

    In this paper, a novel time-frequency signature using resonance-based sparse signal decomposition (RSSD), phase space reconstruction (PSR), time-frequency distribution (TFD) and manifold learning is proposed for feature extraction of ship-radiated noise, which is called resonance-based time-frequency manifold (RTFM). This is suitable for analyzing signals with oscillatory, non-stationary and non-linear characteristics in a situation of serious noise pollution. Unlike the traditional methods which are sensitive to noise and just consider one side of oscillatory, non-stationary and non-linear characteristics, the proposed RTFM can provide the intact feature signature of all these characteristics in the form of a time-frequency signature by the following steps: first, RSSD is employed on the raw signal to extract the high-oscillatory component and abandon the low-oscillatory component. Second, PSR is performed on the high-oscillatory component to map the one-dimensional signal to the high-dimensional phase space. Third, TFD is employed to reveal non-stationary information in the phase space. Finally, manifold learning is applied to the TFDs to fetch the intrinsic non-linear manifold. A proportional addition of the top two RTFMs is adopted to produce the improved RTFM signature. All of the case studies are validated on real audio recordings of ship-radiated noise. Case studies of ship-radiated noise on different datasets and various degrees of noise pollution manifest the effectiveness and robustness of the proposed method. PMID:29565288

  9. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional feature...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....

  10. Imaging features of thalassemia

    Energy Technology Data Exchange (ETDEWEB)

    Tunaci, M.; Tunaci, A.; Engin, G.; Oezkorkmaz, B.; Acunas, G.; Acunas, B. [Dept. of Radiology, Istanbul Univ. (Turkey); Dincol, G. [Dept. of Internal Medicine, Istanbul Univ. (Turkey)

    1999-07-01

    Thalassemia is a kind of chronic, inherited, microcytic anemia characterized by defective hemoglobin synthesis and ineffective erythropoiesis. In all thalassemias clinical features that result from anemia, transfusional, and absorptive iron overload are similar but vary in severity. The radiographic features of {beta}-thalassemia are due in large part to marrow hyperplasia. Markedly expanded marrow space lead to various skeletal manifestations including spine, skull, facial bones, and ribs. Extramedullary hematopoiesis (ExmH), hemosiderosis, and cholelithiasis are among the non-skeletal manifestations of thalassemia. The skeletal X-ray findings show characteristics of chronic overactivity of the marrow. In this article both skeletal and non-skeletal manifestations of thalassemia are discussed with an overview of X-ray findings, including MRI and CT findings. (orig.)

  11. Imaging features of thalassemia

    International Nuclear Information System (INIS)

    Tunaci, M.; Tunaci, A.; Engin, G.; Oezkorkmaz, B.; Acunas, G.; Acunas, B.; Dincol, G.

    1999-01-01

    Thalassemia is a kind of chronic, inherited, microcytic anemia characterized by defective hemoglobin synthesis and ineffective erythropoiesis. In all thalassemias clinical features that result from anemia, transfusional, and absorptive iron overload are similar but vary in severity. The radiographic features of β-thalassemia are due in large part to marrow hyperplasia. Markedly expanded marrow space lead to various skeletal manifestations including spine, skull, facial bones, and ribs. Extramedullary hematopoiesis (ExmH), hemosiderosis, and cholelithiasis are among the non-skeletal manifestations of thalassemia. The skeletal X-ray findings show characteristics of chronic overactivity of the marrow. In this article both skeletal and non-skeletal manifestations of thalassemia are discussed with an overview of X-ray findings, including MRI and CT findings. (orig.)

  12. High-Dimensional Modeling for Cytometry: Building Rock Solid Models Using GemStone™ and Verity Cen-se'™ High-Definition t-SNE Mapping.

    Science.gov (United States)

    Bruce Bagwell, C

    2018-01-01

    This chapter outlines how to approach the complex tasks associated with designing models for high-dimensional cytometry data. Unlike gating approaches, modeling lends itself to automation and accounts for measurement overlap among cellular populations. Designing these models is now easier because of a new technique called high-definition t-SNE mapping. Nontrivial examples are provided that serve as a guide to create models that are consistent with data.

  13. Sparse representation of multi parametric DCE-MRI features using K-SVD for classifying gene expression based breast cancer recurrence risk

    Science.gov (United States)

    Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina

    2014-03-01

    We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.

  14. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  15. Slow feature analysis: unsupervised learning of invariances.

    Science.gov (United States)

    Wiskott, Laurenz; Sejnowski, Terrence J

    2002-04-01

    Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.

  16. Convolutional Neural Networks as Feature Extractors for Data Scarce Visual Searches

    Science.gov (United States)

    2016-09-01

    created with the t-SNE technique, the fully connected layers FC6 and FC7 better represent the features’ compactness in high dimensional space . In the...This topology is called Fully Connected (FC) layers, and it is shown in Figure 2.2. A CNN model consists of several combinations of CONV and FC...to generate a new representation of the images. These representations are classified with K-Nearest Neighbors within a target space that has just a few

  17. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    Science.gov (United States)

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

    Directory of Open Access Journals (Sweden)

    Zena M Hira

    Full Text Available Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process--it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.

  19. Mid-Infrared Emission Features in the ISM: Feature-to-Features Flux Ratios

    Science.gov (United States)

    Lu, N. Y.

    1998-01-01

    Using a limited, but representative sample of sources in the ISM of our Galaxy with published spectra from the Infrared Space Observatory, we analyze flux ratios between the major mid-IR emission features (EFs) centered around 6.2, 7.7, 8.6 and 11.3 mu, respectively.

  20. Multispectral Image Feature Points

    Directory of Open Access Journals (Sweden)

    Cristhian Aguilera

    2012-09-01

    Full Text Available This paper presents a novel feature point descriptor for the multispectral image case: Far-Infrared and Visible Spectrum images. It allows matching interest points on images of the same scene but acquired in different spectral bands. Initially, points of interest are detected on both images through a SIFT-like based scale space representation. Then, these points are characterized using an Edge Oriented Histogram (EOH descriptor. Finally, points of interest from multispectral images are matched by finding nearest couples using the information from the descriptor. The provided experimental results and comparisons with similar methods show both the validity of the proposed approach as well as the improvements it offers with respect to the current state-of-the-art.

  1. HSTLBO: A hybrid algorithm based on Harmony Search and Teaching-Learning-Based Optimization for complex high-dimensional optimization problems.

    Directory of Open Access Journals (Sweden)

    Shouheng Tuo

    Full Text Available Harmony Search (HS and Teaching-Learning-Based Optimization (TLBO as new swarm intelligent optimization algorithms have received much attention in recent years. Both of them have shown outstanding performance for solving NP-Hard optimization problems. However, they also suffer dramatic performance degradation for some complex high-dimensional optimization problems. Through a lot of experiments, we find that the HS and TLBO have strong complementarity each other. The HS has strong global exploration power but low convergence speed. Reversely, the TLBO has much fast convergence speed but it is easily trapped into local search. In this work, we propose a hybrid search algorithm named HSTLBO that merges the two algorithms together for synergistically solving complex optimization problems using a self-adaptive selection strategy. In the HSTLBO, both HS and TLBO are modified with the aim of balancing the global exploration and exploitation abilities, where the HS aims mainly to explore the unknown regions and the TLBO aims to rapidly exploit high-precision solutions in the known regions. Our experimental results demonstrate better performance and faster speed than five state-of-the-art HS variants and show better exploration power than five good TLBO variants with similar run time, which illustrates that our method is promising in solving complex high-dimensional optimization problems. The experiment on portfolio optimization problems also demonstrate that the HSTLBO is effective in solving complex read-world application.

  2. Identifying significant environmental features using feature recognition.

    Science.gov (United States)

    2015-10-01

    The Department of Environmental Analysis at the Kentucky Transportation Cabinet has expressed an interest in feature-recognition capability because it may help analysts identify environmentally sensitive features in the landscape, : including those r...

  3. Cosmological dynamics of spatially flat Einstein-Gauss-Bonnet models in various dimensions: high-dimensional Λ-term case

    Energy Technology Data Exchange (ETDEWEB)

    Pavluchenko, Sergey A. [Universidade Federal do Maranhao (UFMA), Programa de Pos-Graduacao em Fisica, Sao Luis, Maranhao (Brazil)

    2017-08-15

    In this paper we perform a systematic study of spatially flat [(3+D)+1]-dimensional Einstein-Gauss-Bonnet cosmological models with Λ-term. We consider models that topologically are the product of two flat isotropic subspaces with different scale factors. One of these subspaces is three-dimensional and represents our space and the other is D-dimensional and represents extra dimensions. We consider no ansatz of the scale factors, which makes our results quite general. With both Einstein-Hilbert and Gauss-Bonnet contributions in play, D = 3 and the general D ≥ 4 cases have slightly different dynamics due to the different structure of the equations of motion. We analytically study the equations of motion in both cases and describe all possible regimes with special interest on the realistic regimes. Our analysis suggests that the only realistic regime is the transition from high-energy (Gauss-Bonnet) Kasner regime, which is the standard cosmological singularity in that case, to the anisotropic exponential regime with expanding three and contracting extra dimensions. Availability of this regime allows us to put a constraint on the value of Gauss-Bonnet coupling α and the Λ-term - this regime appears in two regions on the (α, Λ) plane: α < 0, Λ > 0, αΛ ≤ -3/2 and α > 0, αΛ ≤ (3D{sup 2} - 7D + 6)/(4D(D-1)), including the entire Λ < 0 region. The obtained bounds are confronted with the restrictions on α and Λ from other considerations, like causality, entropy-to-viscosity ratio in AdS/CFT and others. Joint analysis constrains (α, Λ) even further: α > 0, D ≥ 2 with (3D{sup 2} - 7D + 6)/(4D(D-1)) ≥ αΛ ≥ -(D+2)(D+3)(D{sup 2} + 5D + 12)/(8(D{sup 2} + 3D + 6){sup 2}). (orig.)

  4. Key Frame Extraction in the Summary Space.

    Science.gov (United States)

    Li, Xuelong; Zhao, Bin; Lu, Xiaoqiang; Xuelong Li; Bin Zhao; Xiaoqiang Lu; Lu, Xiaoqiang; Li, Xuelong; Zhao, Bin

    2018-06-01

    Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.

  5. Robust estimation of the expected survival probabilities from high-dimensional Cox models with biomarker-by-treatment interactions in randomized clinical trials

    Directory of Open Access Journals (Sweden)

    Nils Ternès

    2017-05-01

    Full Text Available Abstract Background Thanks to the advances in genomics and targeted treatments, more and more prediction models based on biomarkers are being developed to predict potential benefit from treatments in a randomized clinical trial. Despite the methodological framework for the development and validation of prediction models in a high-dimensional setting is getting more and more established, no clear guidance exists yet on how to estimate expected survival probabilities in a penalized model with biomarker-by-treatment interactions. Methods Based on a parsimonious biomarker selection in a penalized high-dimensional Cox model (lasso or adaptive lasso, we propose a unified framework to: estimate internally the predictive accuracy metrics of the developed model (using double cross-validation; estimate the individual survival probabilities at a given timepoint; construct confidence intervals thereof (analytical or bootstrap; and visualize them graphically (pointwise or smoothed with spline. We compared these strategies through a simulation study covering scenarios with or without biomarker effects. We applied the strategies to a large randomized phase III clinical trial that evaluated the effect of adding trastuzumab to chemotherapy in 1574 early breast cancer patients, for which the expression of 462 genes was measured. Results In our simulations, penalized regression models using the adaptive lasso estimated the survival probability of new patients with low bias and standard error; bootstrapped confidence intervals had empirical coverage probability close to the nominal level across very different scenarios. The double cross-validation performed on the training data set closely mimicked the predictive accuracy of the selected models in external validation data. We also propose a useful visual representation of the expected survival probabilities using splines. In the breast cancer trial, the adaptive lasso penalty selected a prediction model with 4

  6. A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction.

    Directory of Open Access Journals (Sweden)

    Orkun Oztürk

    Full Text Available BACKGROUND: Predicting type-1 Human Immunodeficiency Virus (HIV-1 protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy. RESULTS: We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs. We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations. CONCLUSION: Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper. SOFTWARE AVAILABILITY: The software is available at http

  7. Innovations in individual feature history management - The significance of feature-based temporal model

    Science.gov (United States)

    Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

    2008-01-01

    A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.

  8. Data driven analysis of rain events: feature extraction, clustering, microphysical /macro physical relationship

    Science.gov (United States)

    Djallel Dilmi, Mohamed; Mallet, Cécile; Barthes, Laurent; Chazottes, Aymeric

    2017-04-01

    The study of rain time series records is mainly carried out using rainfall rate or rain accumulation parameters estimated on a fixed integration time (typically 1 min, 1 hour or 1 day). In this study we used the concept of rain event. In fact, the discrete and intermittent natures of rain processes make the definition of some features inadequate when defined on a fixed duration. Long integration times (hour, day) lead to mix rainy and clear air periods in the same sample. Small integration time (seconds, minutes) will lead to noisy data with a great sensibility to detector characteristics. The analysis on the whole rain event instead of individual short duration samples of a fixed duration allows to clarify relationships between features, in particular between macro physical and microphysical ones. This approach allows suppressing the intra-event variability partly due to measurement uncertainties and allows focusing on physical processes. An algorithm based on Genetic Algorithm (GA) and Self Organising Maps (SOM) is developed to obtain a parsimonious characterisation of rain events using a minimal set of variables. The use of self-organizing map (SOM) is justified by the fact that it allows to map a high dimensional data space in a two-dimensional space while preserving as much as possible the initial space topology in an unsupervised way. The obtained SOM allows providing the dependencies between variables and consequently removing redundant variables leading to a minimal subset of only five features (the event duration, the rain rate peak, the rain event depth, the event rain rate standard deviation and the absolute rain rate variation of order 0.5). To confirm relevance of the five selected features the corresponding SOM is analyzed. This analysis shows clearly the existence of relationships between features. It also shows the independence of the inter-event time (IETp) feature or the weak dependence of the Dry percentage in event (Dd%e) feature. This confirms

  9. Reliable Fault Classification of Induction Motors Using Texture Feature Extraction and a Multiclass Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Jia Uddin

    2014-01-01

    Full Text Available This paper proposes a method for the reliable fault detection and classification of induction motors using two-dimensional (2D texture features and a multiclass support vector machine (MCSVM. The proposed model first converts time-domain vibration signals to 2D gray images, resulting in texture patterns (or repetitive patterns, and extracts these texture features by generating the dominant neighborhood structure (DNS map. The principal component analysis (PCA is then used for the purpose of dimensionality reduction of the high-dimensional feature vector including the extracted texture features due to the fact that the high-dimensional feature vector can degrade classification performance, and this paper configures an effective feature vector including discriminative fault features for diagnosis. Finally, the proposed approach utilizes the one-against-all (OAA multiclass support vector machines (MCSVMs to identify induction motor failures. In this study, the Gaussian radial basis function kernel cooperates with OAA MCSVMs to deal with nonlinear fault features. Experimental results demonstrate that the proposed approach outperforms three state-of-the-art fault diagnosis algorithms in terms of fault classification accuracy, yielding an average classification accuracy of 100% even in noisy environments.

  10. Seizure-Onset Mapping Based on Time-Variant Multivariate Functional Connectivity Analysis of High-Dimensional Intracranial EEG: A Kalman Filter Approach.

    Science.gov (United States)

    Lie, Octavian V; van Mierlo, Pieter

    2017-01-01

    The visual interpretation of intracranial EEG (iEEG) is the standard method used in complex epilepsy surgery cases to map the regions of seizure onset targeted for resection. Still, visual iEEG analysis is labor-intensive and biased due to interpreter dependency. Multivariate parametric functional connectivity measures using adaptive autoregressive (AR) modeling of the iEEG signals based on the Kalman filter algorithm have been used successfully to localize the electrographic seizure onsets. Due to their high computational cost, these methods have been applied to a limited number of iEEG time-series (Kalman filter implementations, a well-known multivariate adaptive AR model (Arnold et al. 1998) and a simplified, computationally efficient derivation of it, for their potential application to connectivity analysis of high-dimensional (up to 192 channels) iEEG data. When used on simulated seizures together with a multivariate connectivity estimator, the partial directed coherence, the two AR models were compared for their ability to reconstitute the designed seizure signal connections from noisy data. Next, focal seizures from iEEG recordings (73-113 channels) in three patients rendered seizure-free after surgery were mapped with the outdegree, a graph-theory index of outward directed connectivity. Simulation results indicated high levels of mapping accuracy for the two models in the presence of low-to-moderate noise cross-correlation. Accordingly, both AR models correctly mapped the real seizure onset to the resection volume. This study supports the possibility of conducting fully data-driven multivariate connectivity estimations on high-dimensional iEEG datasets using the Kalman filter approach.

  11. Predicting CT Image From MRI Data Through Feature Matching With Learned Nonlinear Local Descriptors.

    Science.gov (United States)

    Yang, Wei; Zhong, Liming; Chen, Yang; Lin, Liyan; Lu, Zhentai; Liu, Shupeng; Wu, Yao; Feng, Qianjin; Chen, Wufan

    2018-04-01

    Attenuation correction for positron-emission tomography (PET)/magnetic resonance (MR) hybrid imaging systems and dose planning for MR-based radiation therapy remain challenging due to insufficient high-energy photon attenuation information. We present a novel approach that uses the learned nonlinear local descriptors and feature matching to predict pseudo computed tomography (pCT) images from T1-weighted and T2-weighted magnetic resonance imaging (MRI) data. The nonlinear local descriptors are obtained by projecting the linear descriptors into the nonlinear high-dimensional space using an explicit feature map and low-rank approximation with supervised manifold regularization. The nearest neighbors of each local descriptor in the input MR images are searched in a constrained spatial range of the MR images among the training dataset. Then the pCT patches are estimated through k-nearest neighbor regression. The proposed method for pCT prediction is quantitatively analyzed on a dataset consisting of paired brain MRI and CT images from 13 subjects. Our method generates pCT images with a mean absolute error (MAE) of 75.25 ± 18.05 Hounsfield units, a peak signal-to-noise ratio of 30.87 ± 1.15 dB, a relative MAE of 1.56 ± 0.5% in PET attenuation correction, and a dose relative structure volume difference of 0.055 ± 0.107% in , as compared with true CT. The experimental results also show that our method outperforms four state-of-the-art methods.

  12. Arabic Feature-Based Level Sentiment Analysis Using Lexicon ...

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... structured reviews being prior knowledge for mining unstructured reviews. ... FDSO has been introduced, which defines a space of product features ... polarity of a review using feature ontology and sentiment lexicons.

  13. Presence of multiple spondyloarthritis (SpA) features is important but not sufficient for a diagnosis of axial spondyloarthritis: data from the SPondyloArthritis Caught Early (SPACE) cohort

    NARCIS (Netherlands)

    Ez-Zaitouni, Z.; Bakker, P. A. C.; van Lunteren, M.; Berg, I. J.; Landewe, R.; van Oosterhout, M.; Lorenzin, M.; van der Heijde, D.; van Gaalen, F. A.

    2017-01-01

    Objectives Concerns have been raised about overdiagnosis of axial spondyloarthritis (axSpA). We investigated whether patients with chronic back pain (CBP) of short duration and multiple SpA features are always diagnosed with axSpA by the rheumatologist, and to what extent fulfilment of the

  14. RESEARCH ON FEATURE POINTS EXTRACTION METHOD FOR BINARY MULTISCALE AND ROTATION INVARIANT LOCAL FEATURE DESCRIPTOR

    Directory of Open Access Journals (Sweden)

    Hongwei Ying

    2014-08-01

    Full Text Available An extreme point of scale space extraction method for binary multiscale and rotation invariant local feature descriptor is studied in this paper in order to obtain a robust and fast method for local image feature descriptor. Classic local feature description algorithms often select neighborhood information of feature points which are extremes of image scale space, obtained by constructing the image pyramid using certain signal transform method. But build the image pyramid always consumes a large amount of computing and storage resources, is not conducive to the actual applications development. This paper presents a dual multiscale FAST algorithm, it does not need to build the image pyramid, but can extract feature points of scale extreme quickly. Feature points extracted by proposed method have the characteristic of multiscale and rotation Invariant and are fit to construct the local feature descriptor.

  15. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some ir...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  16. Features for detecting smoke in laparoscopic videos

    Directory of Open Access Journals (Sweden)

    Jalal Nour Aldeen

    2017-09-01

    Full Text Available Video-based smoke detection in laparoscopic surgery has different potential applications, such as the automatic addressing of surgical events associated with the electrocauterization task and the development of automatic smoke removal. In the literature, video-based smoke detection has been studied widely for fire surveillance systems. Nevertheless, the proposed methods are insufficient for smoke detection in laparoscopic videos because they often depend on assumptions which rarely hold in laparoscopic surgery such as static camera. In this paper, ten visual features based on motion, texture and colour of smoke are proposed and evaluated for smoke detection in laparoscopic videos. These features are RGB channels, energy-based feature, texture features based on gray level co-occurrence matrix (GLCM, HSV colour space feature, features based on the detection of moving regions using optical flow and the smoke colour in HSV colour space. These features were tested on four laparoscopic cholecystectomy videos. Experimental observations show that each feature can provide valuable information in performing the smoke detection task. However, each feature has weaknesses to detect the presence of smoke in some cases. By combining all proposed features smoke with high and even low density can be identified robustly and the classification accuracy increases significantly.

  17. SPACE: Enhancing Life on Earth. Proceedings Report

    Science.gov (United States)

    Hobden, Alan (Editor); Hobden, Beverly (Editor); Bagley, Larry E. (Editor); Bolton, Ed (Editor); Campaigne, Len O. (Editor); Cole, Ron (Editor); France, Marty (Editor); Hand, Rich (Editor); McKinley, Cynthia (Editor); Zimkas, Chuck (Editor)

    1996-01-01

    The proceedings of the 12th National Space Symposium on Enhancing Life on Earth is presented. Technological areas discussed include: Space applications and cooperation; Earth sensing, communication, and navigation applications; Global security interests in space; and International space station and space launch capabilities. An appendices that include featured speakers, program participants, and abbreviation & acronyms glossary is also attached.

  18. An Autonomous Sensor Tasking Approach for Large Scale Space Object Cataloging

    Science.gov (United States)

    Linares, R.; Furfaro, R.

    The field of Space Situational Awareness (SSA) has progressed over the last few decades with new sensors coming online, the development of new approaches for making observations, and new algorithms for processing them. Although there has been success in the development of new approaches, a missing piece is the translation of SSA goals to sensors and resource allocation; otherwise known as the Sensor Management Problem (SMP). This work solves the SMP using an artificial intelligence approach called Deep Reinforcement Learning (DRL). Stable methods for training DRL approaches based on neural networks exist, but most of these approaches are not suitable for high dimensional systems. The Asynchronous Advantage Actor-Critic (A3C) method is a recently developed and effective approach for high dimensional systems, and this work leverages these results and applies this approach to decision making in SSA. The decision space for the SSA problems can be high dimensional, even for tasking of a single telescope. Since the number of SOs in space is relatively high, each sensor will have a large number of possible actions at a given time. Therefore, efficient DRL approaches are required when solving the SMP for SSA. This work develops a A3C based method for DRL applied to SSA sensor tasking. One of the key benefits of DRL approaches is the ability to handle high dimensional data. For example DRL methods have been applied to image processing for the autonomous car application. For example, a 256x256 RGB image has 196608 parameters (256*256*3=196608) which is very high dimensional, and deep learning approaches routinely take images like this as inputs. Therefore, when applied to the whole catalog the DRL approach offers the ability to solve this high dimensional problem. This work has the potential to, for the first time, solve the non-myopic sensor tasking problem for the whole SO catalog (over 22,000 objects) providing a truly revolutionary result.

  19. Online prediction of respiratory motion: multidimensional processing with low-dimensional feature learning

    International Nuclear Information System (INIS)

    Ruan, Dan; Keall, Paul

    2010-01-01

    Accurate real-time prediction of respiratory motion is desirable for effective motion management in radiotherapy for lung tumor targets. Recently, nonparametric methods have been developed and their efficacy in predicting one-dimensional respiratory-type motion has been demonstrated. To exploit the correlation among various coordinates of the moving target, it is natural to extend the 1D method to multidimensional processing. However, the amount of learning data required for such extension grows exponentially with the dimensionality of the problem, a phenomenon known as the 'curse of dimensionality'. In this study, we investigate a multidimensional prediction scheme based on kernel density estimation (KDE) in an augmented covariate-response space. To alleviate the 'curse of dimensionality', we explore the intrinsic lower dimensional manifold structure and utilize principal component analysis (PCA) to construct a proper low-dimensional feature space, where kernel density estimation is feasible with the limited training data. Interestingly, the construction of this lower dimensional representation reveals a useful decomposition of the variations in respiratory motion into the contribution from semiperiodic dynamics and that from the random noise, as it is only sensible to perform prediction with respect to the former. The dimension reduction idea proposed in this work is closely related to feature extraction used in machine learning, particularly support vector machines. This work points out a pathway in processing high-dimensional data with limited training instances, and this principle applies well beyond the problem of target-coordinate-based respiratory-based prediction. A natural extension is prediction based on image intensity directly, which we will investigate in the continuation of this work. We used 159 lung target motion traces obtained with a Synchrony respiratory tracking system. Prediction performance of the low-dimensional feature learning

  20. Safe Exploration of State and Action Spaces in Reinforcement Learning

    OpenAIRE

    Garcia, Javier; Fernandez, Fernando

    2014-01-01

    In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some sta...

  1. An opinion formation based binary optimization approach for feature selection

    Science.gov (United States)

    Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo

    2018-02-01

    This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.

  2. Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm

    Science.gov (United States)

    Annavarapu, Chandra Sekhara Rao; Dara, Suresh; Banka, Haider

    2016-01-01

    Cancer investigations in microarray data play a major role in cancer analysis and the treatment. Cancer microarray data consists of complex gene expressed patterns of cancer. In this article, a Multi-Objective Binary Particle Swarm Optimization (MOBPSO) algorithm is proposed for analyzing cancer gene expression data. Due to its high dimensionality, a fast heuristic based pre-processing technique is employed to reduce some of the crude domain features from the initial feature set. Since these pre-processed and reduced features are still high dimensional, the proposed MOBPSO algorithm is used for finding further feature subsets. The objective functions are suitably modeled by optimizing two conflicting objectives i.e., cardinality of feature subsets and distinctive capability of those selected subsets. As these two objective functions are conflicting in nature, they are more suitable for multi-objective modeling. The experiments are carried out on benchmark gene expression datasets, i.e., Colon, Lymphoma and Leukaemia available in literature. The performance of the selected feature subsets with their classification accuracy and validated using 10 fold cross validation techniques. A detailed comparative study is also made to show the betterment or competitiveness of the proposed algorithm. PMID:27822174

  3. A new method for automated high-dimensional lesion segmentation evaluated in vascular injury and applied to the human occipital lobe.

    Science.gov (United States)

    Mah, Yee-Haur; Jager, Rolf; Kennard, Christopher; Husain, Masud; Nachev, Parashkev

    2014-07-01

    Making robust inferences about the functional neuroanatomy of the brain is critically dependent on experimental techniques that examine the consequences of focal loss of brain function. Unfortunately, the use of the most comprehensive such technique-lesion-function mapping-is complicated by the need for time-consuming and subjective manual delineation of the lesions, greatly limiting the practicability of the approach. Here we exploit a recently-described general measure of statistical anomaly, zeta, to devise a fully-automated, high-dimensional algorithm for identifying the parameters of lesions within a brain image given a reference set of normal brain images. We proceed to evaluate such an algorithm in the context of diffusion-weighted imaging of the commonest type of lesion used in neuroanatomical research: ischaemic damage. Summary performance metrics exceed those previously published for diffusion-weighted imaging and approach the current gold standard-manual segmentation-sufficiently closely for fully-automated lesion-mapping studies to become a possibility. We apply the new method to 435 unselected images of patients with ischaemic stroke to derive a probabilistic map of the pattern of damage in lesions involving the occipital lobe, demonstrating the variation of anatomical resolvability of occipital areas so as to guide future lesion-function studies of the region. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Comparing the accuracy of high-dimensional neural network potentials and the systematic molecular fragmentation method: A benchmark study for all-trans alkanes

    International Nuclear Information System (INIS)

    Gastegger, Michael; Kauffmann, Clemens; Marquetand, Philipp; Behler, Jörg

    2016-01-01

    Many approaches, which have been developed to express the potential energy of large systems, exploit the locality of the atomic interactions. A prominent example is the fragmentation methods in which the quantum chemical calculations are carried out for overlapping small fragments of a given molecule that are then combined in a second step to yield the system’s total energy. Here we compare the accuracy of the systematic molecular fragmentation approach with the performance of high-dimensional neural network (HDNN) potentials introduced by Behler and Parrinello. HDNN potentials are similar in spirit to the fragmentation approach in that the total energy is constructed as a sum of environment-dependent atomic energies, which are derived indirectly from electronic structure calculations. As a benchmark set, we use all-trans alkanes containing up to eleven carbon atoms at the coupled cluster level of theory. These molecules have been chosen because they allow to extrapolate reliable reference energies for very long chains, enabling an assessment of the energies obtained by both methods for alkanes including up to 10 000 carbon atoms. We find that both methods predict high-quality energies with the HDNN potentials yielding smaller errors with respect to the coupled cluster reference.

  5. Comparing the accuracy of high-dimensional neural network potentials and the systematic molecular fragmentation method: A benchmark study for all-trans alkanes

    Energy Technology Data Exchange (ETDEWEB)

    Gastegger, Michael; Kauffmann, Clemens; Marquetand, Philipp, E-mail: philipp.marquetand@univie.ac.at [Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Straße 17, Vienna (Austria); Behler, Jörg [Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, Universitätsstraße 150, Bochum (Germany)

    2016-05-21

    Many approaches, which have been developed to express the potential energy of large systems, exploit the locality of the atomic interactions. A prominent example is the fragmentation methods in which the quantum chemical calculations are carried out for overlapping small fragments of a given molecule that are then combined in a second step to yield the system’s total energy. Here we compare the accuracy of the systematic molecular fragmentation approach with the performance of high-dimensional neural network (HDNN) potentials introduced by Behler and Parrinello. HDNN potentials are similar in spirit to the fragmentation approach in that the total energy is constructed as a sum of environment-dependent atomic energies, which are derived indirectly from electronic structure calculations. As a benchmark set, we use all-trans alkanes containing up to eleven carbon atoms at the coupled cluster level of theory. These molecules have been chosen because they allow to extrapolate reliable reference energies for very long chains, enabling an assessment of the energies obtained by both methods for alkanes including up to 10 000 carbon atoms. We find that both methods predict high-quality energies with the HDNN potentials yielding smaller errors with respect to the coupled cluster reference.

  6. Maximally resolved anharmonic OH vibrational spectrum of the water/ZnO(101 \\xAF 0) interface from a high-dimensional neural network potential

    Science.gov (United States)

    Quaranta, Vanessa; Hellström, Matti; Behler, Jörg; Kullgren, Jolla; Mitev, Pavlin D.; Hermansson, Kersti

    2018-06-01

    Unraveling the atomistic details of solid/liquid interfaces, e.g., by means of vibrational spectroscopy, is of vital importance in numerous applications, from electrochemistry to heterogeneous catalysis. Water-oxide interfaces represent a formidable challenge because a large variety of molecular and dissociated water species are present at the surface. Here, we present a comprehensive theoretical analysis of the anharmonic OH stretching vibrations at the water/ZnO(101 ¯ 0) interface as a prototypical case. Molecular dynamics simulations employing a reactive high-dimensional neural network potential based on density functional theory calculations have been used to sample the interfacial structures. In the second step, one-dimensional potential energy curves have been generated for a large number of configurations to solve the nuclear Schrödinger equation. We find that (i) the ZnO surface gives rise to OH frequency shifts up to a distance of about 4 Å from the surface; (ii) the spectrum contains a number of overlapping signals arising from different chemical species, with the frequencies decreasing in the order ν(adsorbed hydroxide) > ν(non-adsorbed water) > ν(surface hydroxide) > ν(adsorbed water); (iii) stretching frequencies are strongly influenced by the hydrogen bond pattern of these interfacial species. Finally, we have been able to identify substantial correlations between the stretching frequencies and hydrogen bond lengths for all species.

  7. High-dimensional analysis of the aging immune system: verification of age-associated differences in immune signaling responses in healthy donors.

    Science.gov (United States)

    Longo, Diane M; Louie, Brent; Ptacek, Jason; Friedland, Greg; Evensen, Erik; Putta, Santosh; Atallah, Michelle; Spellmeyer, David; Wang, Ena; Pos, Zoltan; Marincola, Francesco M; Schaeffer, Andrea; Lukac, Suzanne; Railkar, Radha; Beals, Chan R; Cesano, Alessandra; Carayannopoulos, Leonidas N; Hawtin, Rachael E

    2014-06-21

    Single-cell network profiling (SCNP) is a multiparametric flow cytometry-based approach that simultaneously measures evoked signaling in multiple cell subsets. Previously, using the SCNP approach, age-associated immune signaling responses were identified in a cohort of 60 healthy donors. In the current study, a high-dimensional analysis of intracellular signaling was performed by measuring 24 signaling nodes in 7 distinct immune cell subsets within PBMCs in an independent cohort of 174 healthy donors [144 elderly (>65 yrs); 30 young (25-40 yrs)]. Associations between age and 9 immune signaling responses identified in the previously published 60 donor cohort were confirmed in the current study. Furthermore, within the current study cohort, 48 additional immune signaling responses differed significantly between young and elderly donors. These associations spanned all profiled modulators and immune cell subsets. These results demonstrate that SCNP, a systems-based approach, can capture the complexity of the cellular mechanisms underlying immunological aging. Further, the confirmation of age associations in an independent donor cohort supports the use of SCNP as a tool for identifying reproducible predictive biomarkers in areas such as vaccine response and response to cancer immunotherapies.

  8. Feature Selection by Reordering

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2005-01-01

    Roč. 2, č. 1 (2005), s. 155-161 ISSN 1738-6438 Institutional research plan: CEZ:AV0Z10300504 Keywords : feature selection * data reduction * ordering of features Subject RIV: BA - General Mathematics

  9. Space nuclear reactor safety

    International Nuclear Information System (INIS)

    Damon, D.; Temme, M.; Brown, N.

    1990-01-01

    Definition of safety requirements and design features of the SP-100 space reactor power system has been guided by a mission risk analysis. The analysis quantifies risk from accidental radiological consequences for a reference mission. Results show that the radiological risk from a space reactor can be made very low. The total mission risk from radiological consequences for a shuttle-launched, earth orbit SP-100 mission is estimated to be 0.05 Person-REM (expected values) based on a 1 mREM/yr de Minimus dose. Results are given for each mission phase. The safety benefits of specific design features are evaluated through risk sensitivity analyses

  10. A redundancy-removing feature selection algorithm for nominal data

    Directory of Open Access Journals (Sweden)

    Zhihua Li

    2015-10-01

    Full Text Available No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

  11. A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization.

    Science.gov (United States)

    He, Xiaofei; Ji, Ming; Zhang, Chiyuan; Bao, Hujun

    2011-10-01

    In many information processing tasks, one is often confronted with very high-dimensional data. Feature selection techniques are designed to find the meaningful feature subset of the original features which can facilitate clustering, classification, and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenarios, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. Based on Laplacian regularized least squares, which finds a smooth function on the data manifold and minimizes the empirical loss, we propose two novel feature selection algorithms which aim to minimize the expected prediction error of the regularized regression model. Specifically, we select those features such that the size of the parameter covariance matrix of the regularized regression model is minimized. Motivated from experimental design, we use trace and determinant operators to measure the size of the covariance matrix. Efficient computational schemes are also introduced to solve the corresponding optimization problems. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithms.

  12. Screening for Plant Features

    NARCIS (Netherlands)

    Heijden, van der G.W.A.M.; Polder, G.

    2015-01-01

    In this chapter, an overview of different plant features is given, from (sub)cellular to canopy level. A myriad of methods is available to measure these features using image analysis, and often, multiple methods can be used to measure the same feature. Several criteria are listed for choosing a

  13. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  14. Space Charge Effects

    CERN Document Server

    Ferrario, M.; Palumbo, L.

    2014-12-19

    The space charge forces are those generated directly by the charge distribution, with the inclusion of the image charges and currents due to the interaction of the beam with a perfectly conducting smooth pipe. Space charge forces are responsible for several unwanted phenomena related to beam dynamics, such as energy loss, shift of the synchronous phase and frequency , shift of the betatron frequencies, and instabilities. We will discuss in this lecture the main feature of space charge effects in high-energy storage rings as well as in low-energy linacs and transport lines.

  15. Hyperspectral Image Classification Based on the Combination of Spatial-spectral Feature and Sparse Representation

    Directory of Open Access Journals (Sweden)

    YANG Zhaoxia

    2015-07-01

    Full Text Available In order to avoid the problem of being over-dependent on high-dimensional spectral feature in the traditional hyperspectral image classification, a novel approach based on the combination of spatial-spectral feature and sparse representation is proposed in this paper. Firstly, we extract the spatial-spectral feature by reorganizing the local image patch with the first d principal components(PCs into a vector representation, followed by a sorting scheme to make the vector invariant to local image rotation. Secondly, we learn the dictionary through a supervised method, and use it to code the features from test samples afterwards. Finally, we embed the resulting sparse feature coding into the support vector machine(SVM for hyperspectral image classification. Experiments using three hyperspectral data show that the proposed method can effectively improve the classification accuracy comparing with traditional classification methods.

  16. Analysis of the Thermo-Elastic Response of Space Reflectors to Simulated Space Environment

    Science.gov (United States)

    Allegri, G.; Ivagnes, M. M.; Marchetti, M.; Poscente, F.

    2002-01-01

    The evaluation of space environment effects on materials and structures is a key matter to develop a proper design of long duration missions: since a large part of satellites operating in the earth orbital environment are employed for telecommunications, the development of space antennas and reflectors featured by high dimensional stability versus space environment interactions represents a major challenge for designers. The structural layout of state of the art space antennas and reflectors is very complex, since several different sensible elements and materials are employed: particular care must be placed in evaluating the actual geometrical configuration of the reflectors operating in the space environment, since very limited distortions of the designed layout can produce severe effects on the quality of the signal both received and transmitted, especially for antennas operating at high frequencies. The effects of thermal loads due to direct sunlight exposition and to earth and moon albedo can be easily taken into account employing the standard methods of structural analysis: on the other hand the thermal cycling and the exposition to the vacuum environment produce a long term damage accumulation which affects the whole structure. The typical effects of the just mentioned exposition are the outgassing of polymeric materials and the contamination of the exposed surface, which can affect sensibly the thermo-mechanical properties of the materials themselves and, therefore, the structural global response. The main aim of the present paper is to evaluate the synergistic effects of thermal cycling and of the exposition to high vacuum environment on an innovative antenna developed by Alenia Spazio S.p.a.: to this purpose, both an experimental and numerical research activity has been developed. A complete prototype of the antenna has been exposed to the space environment simulated by the SAS facility: this latter is constituted by an high vacuum chamber, equipped by

  17. Features of Duration and Borders of the Bedding of Snow Cover in the Conditions of Climatic Changes in the Territory of Northern Kazakhstan According to Land and Space Monitoring

    Science.gov (United States)

    Salnikov, Vitaliy; Turulina, Galina; Polyakova, Svetlana; Muratova, Nadiya; Kauazov, Azamat; Abugalieva, Aigul; Tazhibayeva, Tamara

    2014-05-01

    Precipitation and air temperature datasets from 34 meteorological stations were analyzed to reveal the regional climate changes at the territory in North Kazakhstan over the last 58 years (i.e., 1950-2008). Peculiarities and conditions of snow cover formation and melting have been analyzed at territory of Northern Kazakhstan using surface and space monitoring data. Methods of both the geo-informational processing of remote probing data and statistical processing of databases on snow cover, air temperature and precipitations have been used. Analysis of snow cover observations data for territory of Northern Kazakhstan has shown that the stable snow cover might be observed since the middle of November till the beginning of April. In a few last decades the tendency is observed for longevity decrease of snow cover bedding that appears to be on the background air temperature increase and insignificant increase of cold period precipitations due to the later bedding of the snow cover and its earlier destruction. Peculiarities of atmospheric circulation in Atlantic-Eurasian sector of Northern Semi sphere and their influence of formation of snow cover at territory of Northern Kazakhstan. The higher longevity of the snow cover bedding is defined by the predominance of E form circulation and lower longevity - by the predominance of W+C circulation form. Analysis conducted of the highest height of snow cover bedding has shown that for period of 1936-2012 in the most cases the statistically reliable decreasing trends are observed with the linear trend coefficients of 0,50 - 0,60 cm/year. The method is offered for determination of probable characteristics of the snow cover decade height. Using data of space monitoring are allocated the frontiers of snow cover bedding for the period of snow melting 1982-2008 and the snow cover melting maps are developed. The results further confirm the proposition that snow cover availability is an important and limiting factor in the generation

  18. Effects of aggregation of drug and diagnostic codes on the performance of the high-dimensional propensity score algorithm: an empirical example.

    Science.gov (United States)

    Le, Hoa V; Poole, Charles; Brookhart, M Alan; Schoenbach, Victor J; Beach, Kathleen J; Layton, J Bradley; Stürmer, Til

    2013-11-19

    The High-Dimensional Propensity Score (hd-PS) algorithm can select and adjust for baseline confounders of treatment-outcome associations in pharmacoepidemiologic studies that use healthcare claims data. How hd-PS performance is affected by aggregating medications or medical diagnoses has not been assessed. We evaluated the effects of aggregating medications or diagnoses on hd-PS performance in an empirical example using resampled cohorts with small sample size, rare outcome incidence, or low exposure prevalence. In a cohort study comparing the risk of upper gastrointestinal complications in celecoxib or traditional NSAIDs (diclofenac, ibuprofen) initiators with rheumatoid arthritis and osteoarthritis, we (1) aggregated medications and International Classification of Diseases-9 (ICD-9) diagnoses into hierarchies of the Anatomical Therapeutic Chemical classification (ATC) and the Clinical Classification Software (CCS), respectively, and (2) sampled the full cohort using techniques validated by simulations to create 9,600 samples to compare 16 aggregation scenarios across 50% and 20% samples with varying outcome incidence and exposure prevalence. We applied hd-PS to estimate relative risks (RR) using 5 dimensions, predefined confounders, ≤ 500 hd-PS covariates, and propensity score deciles. For each scenario, we calculated: (1) the geometric mean RR; (2) the difference between the scenario mean ln(RR) and the ln(RR) from published randomized controlled trials (RCT); and (3) the proportional difference in the degree of estimated confounding between that scenario and the base scenario (no aggregation). Compared with the base scenario, aggregations of medications into ATC level 4 alone or in combination with aggregation of diagnoses into CCS level 1 improved the hd-PS confounding adjustment in most scenarios, reducing residual confounding compared with the RCT findings by up to 19%. Aggregation of codes using hierarchical coding systems may improve the performance of

  19. High-dimensional nested analysis of variance to assess the effect of production season, quality grade and steam pasteurization on the phenolic composition of fermented rooibos herbal tea.

    Science.gov (United States)

    Stanimirova, I; Kazura, M; de Beer, D; Joubert, E; Schulze, A E; Beelders, T; de Villiers, A; Walczak, B

    2013-10-15

    A nested analysis of variance combined with simultaneous component analysis, ASCA, was proposed to model high-dimensional chromatographic data. The data were obtained from an experiment designed to investigate the effect of production season, quality grade and post-production processing (steam pasteurization) on the phenolic content of the infusion of the popular herbal tea, rooibos, at 'cup-of-tea' strength. Specifically, a four-way analysis of variance where the experimental design involves nesting in two of the three crossed factors was considered. For the purpose of the study, batches of fermented rooibos plant material were sampled from each of four quality grades during three production seasons (2009, 2010 and 2011) and a sub-sample of each batch was steam-pasteurized. The phenolic content of each rooibos infusion was characterized by high performance liquid chromatography (HPLC)-diode array detection (DAD). In contrast to previous studies, the complete HPLC-DAD signals were used in the chemometric analysis in order to take into account the entire phenolic profile. All factors had a significant effect on the phenolic content of a 'cup-of-tea' strength rooibos infusion. In particular, infusions prepared from the grade A (highest quality) samples contained a higher content of almost all phenolic compounds than the lower quality plant material. The variations of the content of isoorientin and orientin in the different quality grade infusions over production seasons are larger than the variations in the content of aspalathin and quercetin-3-O-robinobioside. Ferulic acid can be used as an indicator of the quality of rooibos tea as its content generally decreases with increasing tea quality. Steam pasteurization decreased the content of the majority of phenolic compounds in a 'cup-of-tea' strength rooibos infusion. © 2013 Elsevier B.V. All rights reserved.

  20. Prostatic adenocarcinoma with glomeruloid features.

    Science.gov (United States)

    Pacelli, A; Lopez-Beltran, A; Egan, A J; Bostwick, D G

    1998-05-01

    A wide variety of architectural patterns of adenocarcinoma may be seen in the prostate. We have recently encountered a hitherto-undescribed pattern of growth characterized by intraluminal ball-like clusters of cancer cells reminiscent of renal glomeruli, which we refer to as prostatic adenocarcinoma with glomeruloid features. To define the architectural features, frequency, and distribution of prostatic adenocarcinoma with glomeruloid features, we reviewed 202 totally embedded radical prostatectomy specimens obtained between October 1992 and April 1994 from the files of the Mayo Clinic. This series was supplemented by 100 consecutive needle biopsies with prostatic cancer from January to February 1996. Prostatic adenocarcinoma with glomeruloid features was characterized by round to oval epithelial tufts growing within malignant acini, often supported by a fibrovascular core. The epithelial cells were sometimes arranged in semicircular concentric rows separated by clefted spaces. In the radical prostatectomy specimens, nine cases (4.5%) had glomeruloid features. The glomeruloid pattern constituted 5% to 20% of each cancer (mean, 8.33%) and was usually located at the apex or in the peripheral zone of the prostate. Seven cases were associated with a high Gleason score (7 or 8), one with a score of 6, and one with a score of 5. All cases were associated with high-grade prostatic intraepithelial neoplasia and extensive perineural invasion. Pathological stages included T2c (three cases), T3b (four cases), and T3c (two cases); one of the T3b cases had lymph node metastases (N1). Three (3%) of 100 consecutive routine needle biopsy specimens with cancer showed glomeruloid features, and this pattern constituted 5% to 10% of each cancer (mean, 6.7%). The Gleason score was 6 for two cases and 8 for one case. Two cases were associated with high-grade prostatic intraepithelial neoplasia, and one case had perineural invasion. Glomeruloid features were not observed in any benign or