Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Mathur, Sunil; Sadana, Ajit
2015-12-01
We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.
Li, Xiangyu; Cai, Hao; Wang, Xianlong; Ao, Lu; Guo, You; He, Jun; Gu, Yunyan; Qi, Lishuang; Guan, Qingzhou; Lin, Xu; Guo, Zheng
2017-10-13
To detect differentially expressed genes (DEGs) in small-scale cell line experiments, usually with only two or three technical replicates for each state, the commonly used statistical methods such as significance analysis of microarrays (SAM), limma and RankProd (RP) lack statistical power, while the fold change method lacks any statistical control. In this study, we demonstrated that the within-sample relative expression orderings (REOs) of gene pairs were highly stable among technical replicates of a cell line but often widely disrupted after certain treatments such like gene knockdown, gene transfection and drug treatment. Based on this finding, we customized the RankComp algorithm, previously designed for individualized differential expression analysis through REO comparison, to identify DEGs with certain statistical control for small-scale cell line data. In both simulated and real data, the new algorithm, named CellComp, exhibited high precision with much higher sensitivity than the original RankComp, SAM, limma and RP methods. Therefore, CellComp provides an efficient tool for analyzing small-scale cell line data. © The Author 2017. Published by Oxford University Press.
Statistical Optimality in Multipartite Ranking and Ordinal Regression.
Uematsu, Kazuki; Lee, Yoonkyung
2015-05-01
Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.
A Case-Based Reasoning Method with Rank Aggregation
Sun, Jinhua; Du, Jiao; Hu, Jian
2018-03-01
In order to improve the accuracy of case-based reasoning (CBR), this paper addresses a new CBR framework with the basic principle of rank aggregation. First, the ranking methods are put forward in each attribute subspace of case. The ordering relation between cases on each attribute is got between cases. Then, a sorting matrix is got. Second, the similar case retrieval process from ranking matrix is transformed into a rank aggregation optimal problem, which uses the Kemeny optimal. On the basis, a rank aggregation case-based reasoning algorithm, named RA-CBR, is designed. The experiment result on UCI data sets shows that case retrieval accuracy of RA-CBR algorithm is higher than euclidean distance CBR and mahalanobis distance CBR testing.So we can get the conclusion that RA-CBR method can increase the performance and efficiency of CBR.
A generalization of Friedman's rank statistic
Kroon, de J.; Laan, van der P.
1983-01-01
In this paper a very natural generalization of the two·way analysis of variance rank statistic of FRIEDMAN is given. The general distribution-free test procedure based on this statistic for the effect of J treatments in a random block design can be applied in general two-way layouts without
The exact probability distribution of the rank product statistics for replicated experiments.
Eisinga, Rob; Breitling, Rainer; Heskes, Tom
2013-03-18
The rank product method is a widely accepted technique for detecting differentially regulated genes in replicated microarray experiments. To approximate the sampling distribution of the rank product statistic, the original publication proposed a permutation approach, whereas recently an alternative approximation based on the continuous gamma distribution was suggested. However, both approximations are imperfect for estimating small tail probabilities. In this paper we relate the rank product statistic to number theory and provide a derivation of its exact probability distribution and the true tail probabilities. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
A novel three-stage distance-based consensus ranking method
Aghayi, Nazila; Tavana, Madjid
2018-05-01
In this study, we propose a three-stage weighted sum method for identifying the group ranks of alternatives. In the first stage, a rank matrix, similar to the cross-efficiency matrix, is obtained by computing the individual rank position of each alternative based on importance weights. In the second stage, a secondary goal is defined to limit the vector of weights since the vector of weights obtained in the first stage is not unique. Finally, in the third stage, the group rank position of alternatives is obtained based on a distance of individual rank positions. The third stage determines a consensus solution for the group so that the ranks obtained have a minimum distance from the ranks acquired by each alternative in the previous stage. A numerical example is presented to demonstrate the applicability and exhibit the efficacy of the proposed method and algorithms.
Logic-based aggregation methods for ranking student applicants
Directory of Open Access Journals (Sweden)
Milošević Pavle
2017-01-01
Full Text Available In this paper, we present logic-based aggregation models used for ranking student applicants and we compare them with a number of existing aggregation methods, each more complex than the previous one. The proposed models aim to include depen- dencies in the data using Logical aggregation (LA. LA is a aggregation method based on interpolative Boolean algebra (IBA, a consistent multi-valued realization of Boolean algebra. This technique is used for a Boolean consistent aggregation of attributes that are logically dependent. The comparison is performed in the case of student applicants for master programs at the University of Belgrade. We have shown that LA has some advantages over other presented aggregation methods. The software realization of all applied aggregation methods is also provided. This paper may be of interest not only for student ranking, but also for similar problems of ranking people e.g. employees, team members, etc.
A cautionary note on the rank product statistic.
Koziol, James A
2016-06-01
The rank product method introduced by Breitling R et al. [2004, FEBS Letters 573, 83-92] has rapidly generated popularity in practical settings, in particular, detecting differential expression of genes in microarray experiments. The purpose of this note is to point out a particular property of the rank product method, namely, its differential sensitivity to over- and underexpression. It turns out that overexpression is less likely to be detected than underexpression with the rank product statistic. We have conducted both empirical and exact power studies that demonstrate this phenomenon, and summarize these findings in this note. © 2016 Federation of European Biochemical Societies.
Wilcoxon's signed-rank statistic: what null hypothesis and why it matters.
Li, Heng; Johnson, Terri
2014-01-01
In statistical literature, the term 'signed-rank test' (or 'Wilcoxon signed-rank test') has been used to refer to two distinct tests: a test for symmetry of distribution and a test for the median of a symmetric distribution, sharing a common test statistic. To avoid potential ambiguity, we propose to refer to those two tests by different names, as 'test for symmetry based on signed-rank statistic' and 'test for median based on signed-rank statistic', respectively. The utility of such terminological differentiation should become evident through our discussion of how those tests connect and contrast with sign test and one-sample t-test. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Toward optimal feature selection using ranking methods and classification algorithms
Directory of Open Access Journals (Sweden)
Novaković Jasmina
2011-01-01
Full Text Available We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.
Exact distributions of two-sample rank statistics and block rank statistics using computer algebra
Wiel, van de M.A.
1998-01-01
We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for
Paired comparisons analysis: an axiomatic approach to ranking methods
Gonzalez-Diaz, J.; Hendrickx, Ruud; Lohmann, E.R.M.A.
2014-01-01
In this paper we present an axiomatic analysis of several ranking methods for general tournaments. We find that the ranking method obtained by applying maximum likelihood to the (Zermelo-)Bradley-Terry model, the most common method in statistics and psychology, is one of the ranking methods that
THE USE OF RANKING SAMPLING METHOD WITHIN MARKETING RESEARCH
Directory of Open Access Journals (Sweden)
CODRUŢA DURA
2011-01-01
Full Text Available Marketing and statistical literature available to practitioners provides a wide range of sampling methods that can be implemented in the context of marketing research. Ranking sampling method is based on taking apart the general population into several strata, namely into several subdivisions which are relatively homogenous regarding a certain characteristic. In fact, the sample will be composed by selecting, from each stratum, a certain number of components (which can be proportional or non-proportional to the size of the stratum until the pre-established volume of the sample is reached. Using ranking sampling within marketing research requires the determination of some relevant statistical indicators - average, dispersion, sampling error etc. To that end, the paper contains a case study which illustrates the actual approach used in order to apply the ranking sample method within a marketing research made by a company which provides Internet connection services, on a particular category of customers – small and medium enterprises.
Citation graph based ranking in Invenio
Marian, Ludmila; Rajman, Martin; Vesely, Martin
2010-01-01
Invenio is the web-based integrated digital library system developed at CERN. Within this framework, we present four types of ranking models based on the citation graph that complement the simple approach based on citation counts: time-dependent citation counts, a relevancy ranking which extends the PageRank model, a time-dependent ranking which combines the freshness of citations with PageRank and a ranking that takes into consideration the external citations. We present our analysis and results obtained on two main data sets: Inspire and CERN Document Server. Our main contributions are: (i) a study of the currently available ranking methods based on the citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods such as treating all citations of equal importance, not taking time into account or considering the citation graph complete; (iii) a detailed study of the key parameters for these ranking methods. (The original publication is ava...
Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks
Frahm, Klaus M.; Shepelyansky, Dima L.
2014-04-01
We use the methods of quantum chaos and Random Matrix Theory for analysis of statistical fluctuations of PageRank probabilities in directed networks. In this approach the effective energy levels are given by a logarithm of PageRank probability at a given node. After the standard energy level unfolding procedure we establish that the nearest spacing distribution of PageRank probabilities is described by the Poisson law typical for integrable quantum systems. Our studies are done for the Twitter network and three networks of Wikipedia editions in English, French and German. We argue that due to absence of level repulsion the PageRank order of nearby nodes can be easily interchanged. The obtained Poisson law implies that the nearby PageRank probabilities fluctuate as random independent variables.
Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking
Huang, Huang
2017-07-16
This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as
SibRank: Signed bipartite network analysis for neighbor-based collaborative ranking
Shams, Bita; Haratizadeh, Saman
2016-09-01
Collaborative ranking is an emerging field of recommender systems that utilizes users' preference data rather than rating values. Unfortunately, neighbor-based collaborative ranking has gained little attention despite its more flexibility and justifiability. This paper proposes a novel framework, called SibRank that seeks to improve the state of the art neighbor-based collaborative ranking methods. SibRank represents users' preferences as a signed bipartite network, and finds similar users, through a novel personalized ranking algorithm in signed networks.
Chan, Kwun Chuen Gary; Qin, Jing
2015-10-01
Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.
Time Series Analysis Based on Running Mann Whitney Z Statistics
A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...
PageRank as a method to rank biomedical literature by importance.
Yates, Elliot J; Dixon, Louise C
2015-01-01
Optimal ranking of literature importance is vital in overcoming article overload. Existing ranking methods are typically based on raw citation counts, giving a sum of 'inbound' links with no consideration of citation importance. PageRank, an algorithm originally developed for ranking webpages at the search engine, Google, could potentially be adapted to bibliometrics to quantify the relative importance weightings of a citation network. This article seeks to validate such an approach on the freely available, PubMed Central open access subset (PMC-OAS) of biomedical literature. On-demand cloud computing infrastructure was used to extract a citation network from over 600,000 full-text PMC-OAS articles. PageRanks and citation counts were calculated for each node in this network. PageRank is highly correlated with citation count (R = 0.905, P PageRank can be trivially computed on commodity cluster hardware and is linearly correlated with citation count. Given its putative benefits in quantifying relative importance, we suggest it may enrich the citation network, thereby overcoming the existing inadequacy of citation counts alone. We thus suggest PageRank as a feasible supplement to, or replacement of, existing bibliometric ranking methods.
Network-based ranking methods for prediction of novel disease associated microRNAs.
Le, Duc-Hau
2015-10-01
Many studies have shown roles of microRNAs on human disease and a number of computational methods have been proposed to predict such associations by ranking candidate microRNAs according to their relevance to a disease. Among them, machine learning-based methods usually have a limitation in specifying non-disease microRNAs as negative training samples. Meanwhile, network-based methods are becoming dominant since they well exploit a "disease module" principle in microRNA functional similarity networks. Of which, random walk with restart (RWR) algorithm-based method is currently state-of-the-art. The use of this algorithm was inspired from its success in predicting disease gene because the "disease module" principle also exists in protein interaction networks. Besides, many algorithms designed for webpage ranking have been successfully applied in ranking disease candidate genes because web networks share topological properties with protein interaction networks. However, these algorithms have not yet been utilized for disease microRNA prediction. We constructed microRNA functional similarity networks based on shared targets of microRNAs, and then we integrated them with a microRNA functional synergistic network, which was recently identified. After analyzing topological properties of these networks, in addition to RWR, we assessed the performance of (i) PRINCE (PRIoritizatioN and Complex Elucidation), which was proposed for disease gene prediction; (ii) PageRank with Priors (PRP) and K-Step Markov (KSM), which were used for studying web networks; and (iii) a neighborhood-based algorithm. Analyses on topological properties showed that all microRNA functional similarity networks are small-worldness and scale-free. The performance of each algorithm was assessed based on average AUC values on 35 disease phenotypes and average rankings of newly discovered disease microRNAs. As a result, the performance on the integrated network was better than that on individual ones. In
Register-based statistics statistical methods for administrative data
Wallgren, Anders
2014-01-01
This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi
The optimized expansion based low-rank method for wavefield extrapolation
Wu, Zedong
2014-03-01
Spectral methods are fast becoming an indispensable tool for wavefield extrapolation, especially in anisotropic media because it tends to be dispersion and artifact free as well as highly accurate when solving the wave equation. However, for inhomogeneous media, we face difficulties in dealing with the mixed space-wavenumber domain extrapolation operator efficiently. To solve this problem, we evaluated an optimized expansion method that can approximate this operator with a low-rank variable separation representation. The rank defines the number of inverse Fourier transforms for each time extrapolation step, and thus, the lower the rank, the faster the extrapolation. The method uses optimization instead of matrix decomposition to find the optimal wavenumbers and velocities needed to approximate the full operator with its explicit low-rank representation. As a result, we obtain lower rank representations compared with the standard low-rank method within reasonable accuracy and thus cheaper extrapolations. Additional bounds set on the range of propagated wavenumbers to adhere to the physical wave limits yield unconditionally stable extrapolations regardless of the time step. An application on the BP model provided superior results compared to those obtained using the decomposition approach. For transversely isotopic media, because we used the pure P-wave dispersion relation, we obtained solutions that were free of the shear wave artifacts, and the algorithm does not require that n > 0. In addition, the required rank for the optimization approach to obtain high accuracy in anisotropic media was lower than that obtained by the decomposition approach, and thus, it was more efficient. A reverse time migration result for the BP tilted transverse isotropy model using this method as a wave propagator demonstrated the ability of the algorithm.
Litvinenko, Alexander
2018-03-12
Part 1: Parallel H-matrices in spatial statistics 1. Motivation: improve statistical model 2. Tools: Hierarchical matrices 3. Matern covariance function and joint Gaussian likelihood 4. Identification of unknown parameters via maximizing Gaussian log-likelihood 5. Implementation with HLIBPro. Part 2: Low-rank Tucker tensor methods in spatial statistics
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods
Monte Carlo methods of PageRank computation
Litvak, Nelli
2004-01-01
We describe and analyze an on-line Monte Carlo method of PageRank computation. The PageRank is being estimated basing on results of a large number of short independent simulation runs initiated from each page that contains outgoing hyperlinks. The method does not require any storage of the hyperlink
CNN-based ranking for biomedical entity normalization.
Li, Haodi; Chen, Qingcai; Tang, Buzhou; Wang, Xiaolong; Xu, Hua; Wang, Baohua; Huang, Dong
2017-10-03
Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical entity normalization as a ranking problem and benefits from semantic information of biomedical entities. The CNN-based ranking method first generates candidates using handcrafted rules, and then ranks the candidates according to their semantic information modeled by CNN as well as their morphological information. Experiments on two benchmark datasets for biomedical entity normalization show that our proposed CNN-based ranking method outperforms traditional rule-based method with state-of-the-art performance. We propose a CNN architecture that regards biomedical entity normalization as a ranking problem. Comparison results show that semantic information is beneficial to biomedical entity normalization and can be well combined with morphological information in our CNN architecture for further improvement.
A tilting approach to ranking influence
Genton, Marc G.
2014-12-01
We suggest a new approach, which is applicable for general statistics computed from random samples of univariate or vector-valued or functional data, to assessing the influence that individual data have on the value of a statistic, and to ranking the data in terms of that influence. Our method is based on, first, perturbing the value of the statistic by ‘tilting’, or reweighting, each data value, where the total amount of tilt is constrained to be the least possible, subject to achieving a given small perturbation of the statistic, and, then, taking the ranking of the influence of data values to be that which corresponds to ranking the changes in data weights. It is shown, both theoretically and numerically, that this ranking does not depend on the size of the perturbation, provided that the perturbation is sufficiently small. That simple result leads directly to an elegant geometric interpretation of the ranks; they are the ranks of the lengths of projections of the weights onto a ‘line’ determined by the first empirical principal component function in a generalized measure of covariance. To illustrate the generality of the method we introduce and explore it in the case of functional data, where (for example) it leads to generalized boxplots. The method has the advantage of providing an interpretable ranking that depends on the statistic under consideration. For example, the ranking of data, in terms of their influence on the value of a statistic, is different for a measure of location and for a measure of scale. This is as it should be; a ranking of data in terms of their influence should depend on the manner in which the data are used. Additionally, the ranking recognizes, rather than ignores, sign, and in particular can identify left- and right-hand ‘tails’ of the distribution of a random function or vector.
A cross-benchmark comparison of 87 learning to rank methods
Tax, N.; Bockting, S.; Hiemstra, D.
2015-01-01
Learning to rank is an increasingly important scientific field that comprises the use of machine learning for the ranking task. New learning to rank methods are generally evaluated on benchmark test collections. However, comparison of learning to rank methods based on evaluation results is hindered
RankExplorer: Visualization of Ranking Changes in Large Time Series Data.
Shi, Conglei; Cui, Weiwei; Liu, Shixia; Xu, Panpan; Chen, Wei; Qu, Huamin
2012-12-01
For many applications involving time series data, people are often interested in the changes of item values over time as well as their ranking changes. For example, people search many words via search engines like Google and Bing every day. Analysts are interested in both the absolute searching number for each word as well as their relative rankings. Both sets of statistics may change over time. For very large time series data with thousands of items, how to visually present ranking changes is an interesting challenge. In this paper, we propose RankExplorer, a novel visualization method based on ThemeRiver to reveal the ranking changes. Our method consists of four major components: 1) a segmentation method which partitions a large set of time series curves into a manageable number of ranking categories; 2) an extended ThemeRiver view with embedded color bars and changing glyphs to show the evolution of aggregation values related to each ranking category over time as well as the content changes in each ranking category; 3) a trend curve to show the degree of ranking changes over time; 4) rich user interactions to support interactive exploration of ranking changes. We have applied our method to some real time series data and the case studies demonstrate that our method can reveal the underlying patterns related to ranking changes which might otherwise be obscured in traditional visualizations.
Distant Supervision for Relation Extraction with Ranking-Based Methods
Directory of Open Access Journals (Sweden)
Yang Xiang
2016-05-01
Full Text Available Relation extraction has benefited from distant supervision in recent years with the development of natural language processing techniques and data explosion. However, distant supervision is still greatly limited by the quality of training data, due to its natural motivation for greatly reducing the heavy cost of data annotation. In this paper, we construct an architecture called MIML-sort (Multi-instance Multi-label Learning with Sorting Strategies, which is built on the famous MIML framework. Based on MIML-sort, we propose three ranking-based methods for sample selection with which we identify relation extractors from a subset of the training data. Experiments are set up on the KBP (Knowledge Base Propagation corpus, one of the benchmark datasets for distant supervision, which is large and noisy. Compared with previous work, the proposed methods produce considerably better results. Furthermore, the three methods together achieve the best F1 on the official testing set, with an optimal enhancement of F1 from 27.3% to 29.98%.
The effect of uncertainties in distance-based ranking methods for multi-criteria decision making
Jaini, Nor I.; Utyuzhnikov, Sergei V.
2017-08-01
Data in the multi-criteria decision making are often imprecise and changeable. Therefore, it is important to carry out sensitivity analysis test for the multi-criteria decision making problem. The paper aims to present a sensitivity analysis for some ranking techniques based on the distance measures in multi-criteria decision making. Two types of uncertainties are considered for the sensitivity analysis test. The first uncertainty is related to the input data, while the second uncertainty is towards the Decision Maker preferences (weights). The ranking techniques considered in this study are TOPSIS, the relative distance and trade-off ranking methods. TOPSIS and the relative distance method measure a distance from an alternative to the ideal and antiideal solutions. In turn, the trade-off ranking calculates a distance of an alternative to the extreme solutions and other alternatives. Several test cases are considered to study the performance of each ranking technique in both types of uncertainties.
Prototyping a Distributed Information Retrieval System That Uses Statistical Ranking.
Harman, Donna; And Others
1991-01-01
Built using a distributed architecture, this prototype distributed information retrieval system uses statistical ranking techniques to provide better service to the end user. Distributed architecture was shown to be a feasible alternative to centralized or CD-ROM information retrieval, and user testing of the ranking methodology showed both…
[Rank distributions in community ecology from the statistical viewpoint].
Maksimov, V N
2004-01-01
Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.
A Multiobjective Programming Method for Ranking All Units Based on Compensatory DEA Model
Directory of Open Access Journals (Sweden)
Haifang Cheng
2014-01-01
Full Text Available In order to rank all decision making units (DMUs on the same basis, this paper proposes a multiobjective programming (MOP model based on a compensatory data envelopment analysis (DEA model to derive a common set of weights that can be used for the full ranking of all DMUs. We first revisit a compensatory DEA model for ranking all units, point out the existing problem for solving the model, and present an improved algorithm for which an approximate global optimal solution of the model can be obtained by solving a sequence of linear programming. Then, we applied the key idea of the compensatory DEA model to develop the MOP model in which the objectives are to simultaneously maximize all common weights under constraints that the sum of efficiency values of all DMUs is equal to unity and the sum of all common weights is also equal to unity. In order to solve the MOP model, we transform it into a single objective programming (SOP model using a fuzzy programming method and solve the SOP model using the proposed approximation algorithm. To illustrate the ranking method using the proposed method, two numerical examples are solved.
The choice of statistical methods for comparisons of dosimetric data in radiotherapy.
Chaikh, Abdulhamid; Giraud, Jean-Yves; Perrin, Emmanuel; Bresciani, Jean-Pierre; Balosso, Jacques
2014-09-18
Novel irradiation techniques are continuously introduced in radiotherapy to optimize the accuracy, the security and the clinical outcome of treatments. These changes could raise the question of discontinuity in dosimetric presentation and the subsequent need for practice adjustments in case of significant modifications. This study proposes a comprehensive approach to compare different techniques and tests whether their respective dose calculation algorithms give rise to statistically significant differences in the treatment doses for the patient. Statistical investigation principles are presented in the framework of a clinical example based on 62 fields of radiotherapy for lung cancer. The delivered doses in monitor units were calculated using three different dose calculation methods: the reference method accounts the dose without tissues density corrections using Pencil Beam Convolution (PBC) algorithm, whereas new methods calculate the dose with tissues density correction for 1D and 3D using Modified Batho (MB) method and Equivalent Tissue air ratio (ETAR) method, respectively. The normality of the data and the homogeneity of variance between groups were tested using Shapiro-Wilks and Levene test, respectively, then non-parametric statistical tests were performed. Specifically, the dose means estimated by the different calculation methods were compared using Friedman's test and Wilcoxon signed-rank test. In addition, the correlation between the doses calculated by the three methods was assessed using Spearman's rank and Kendall's rank tests. The Friedman's test showed a significant effect on the calculation method for the delivered dose of lung cancer patients (p Wilcoxon signed-rank test of paired comparisons indicated that the delivered dose was significantly reduced using density-corrected methods as compared to the reference method. Spearman's and Kendall's rank tests indicated a positive correlation between the doses calculated with the different methods
Critical review of methods for risk ranking of food related hazards, based on risks for human health
DEFF Research Database (Denmark)
van der Fels-Klerx, H. J.; van Asselt, E. D.; Raley, M.
2018-01-01
This study aimed to critically review methods for ranking risks related to food safety and dietary hazards on the basis of their anticipated human health impacts. A literature review was performed to identify and characterize methods for risk ranking from the fields of food, environmental science......, and the risk ranking method characterized. The methods were then clustered - based on their characteristics - into eleven method categories. These categories included: risk assessment, comparative risk assessment, risk ratio method, scoring method, cost of illness, health adjusted life years, multi......-criteria decision analysis, risk matrix, flow charts/decision trees, stated preference techniques and expert synthesis. Method categories were described by their characteristics, weaknesses and strengths, data resources, and fields of applications. It was concluded there is no single best method for risk ranking...
Directory of Open Access Journals (Sweden)
Diana Purwitasari
2008-01-01
Full Text Available Ranking module is an important component of search process which sorts through relevant pages. Since collection of Web pages has additional information inherent in the hyperlink structure of the Web, it can be represented as link score and then combined with the usual information retrieval techniques of content score. In this paper we report our studies about ranking score of Web pages combined from link analysis, PageRank Scoring, and content analysis, Fourier Domain Scoring. Our experiments use collection of Web pages relate to Statistic subject from Wikipedia with objectives to check correctness and performance evaluation of combination ranking method. Evaluation of PageRank Scoring show that the highest score does not always relate to Statistic. Since the links within Wikipedia articles exists so that users are always one click away from more information on any point that has a link attached, it it possible that unrelated topics to Statistic are most likely frequently mentioned in the collection. While the combination method show link score which is given proportional weight to content score of Web pages does effect the retrieval results.
Image Re-Ranking Based on Topic Diversity.
Qian, Xueming; Lu, Dan; Wang, Yaxiong; Zhu, Li; Tang, Yuan Yan; Wang, Meng
2017-08-01
Social media sharing Websites allow users to annotate images with free tags, which significantly contribute to the development of the web image retrieval. Tag-based image search is an important method to find images shared by users in social networks. However, how to make the top ranked result relevant and with diversity is challenging. In this paper, we propose a topic diverse ranking approach for tag-based image retrieval with the consideration of promoting the topic coverage performance. First, we construct a tag graph based on the similarity between each tag. Then, the community detection method is conducted to mine the topic community of each tag. After that, inter-community and intra-community ranking are introduced to obtain the final retrieved results. In the inter-community ranking process, an adaptive random walk model is employed to rank the community based on the multi-information of each topic community. Besides, we build an inverted index structure for images to accelerate the searching process. Experimental results on Flickr data set and NUS-Wide data sets show the effectiveness of the proposed approach.
Ranking Journals Using Social Choice Theory Methods: A Novel Approach in Bibliometrics
Energy Technology Data Exchange (ETDEWEB)
Aleskerov, F.T.; Pislyakov, V.; Subochev, A.N.
2016-07-01
We use data on economic, management and political science journals to produce quantitative estimates of (in)consistency of evaluations based on seven popular bibliometric indica (impact factor, 5-year impact factor, immediacy index, article influence score, h-index, SNIP and SJR). We propose a new approach to aggregating journal rankings: since rank aggregation is a multicriteria decision problem, ordinal ranking methods from social choice theory may solve it. We apply either a direct ranking method based on majority rule (the Copeland rule, the Markovian method) or a sorting procedure based on a tournament solution, such as the uncovered set and the minimal externally stable set. We demonstrate that aggregate rankings reduce the number of contradictions and represent the set of single-indicator-based rankings better than any of the seven rankings themselves. (Author)
Augmenting the Deliberative Method for Ranking Risks.
Susel, Irving; Lasley, Trace; Montezemolo, Mark; Piper, Joel
2016-01-01
The Department of Homeland Security (DHS) characterized and prioritized the physical cross-border threats and hazards to the nation stemming from terrorism, market-driven illicit flows of people and goods (illegal immigration, narcotics, funds, counterfeits, and weaponry), and other nonmarket concerns (movement of diseases, pests, and invasive species). These threats and hazards pose a wide diversity of consequences with very different combinations of magnitudes and likelihoods, making it very challenging to prioritize them. This article presents the approach that was used at DHS to arrive at a consensus regarding the threats and hazards that stand out from the rest based on the overall risk they pose. Due to time constraints for the decision analysis, it was not feasible to apply multiattribute methodologies like multiattribute utility theory or the analytic hierarchy process. Using a holistic approach was considered, such as the deliberative method for ranking risks first published in this journal. However, an ordinal ranking alone does not indicate relative or absolute magnitude differences among the risks. Therefore, the use of the deliberative method for ranking risks is not sufficient for deciding whether there is a material difference between the top-ranked and bottom-ranked risks, let alone deciding what the stand-out risks are. To address this limitation of ordinal rankings, the deliberative method for ranking risks was augmented by adding an additional step to transform the ordinal ranking into a ratio scale ranking. This additional step enabled the selection of stand-out risks to help prioritize further analysis. © 2015 Society for Risk Analysis.
The choice of statistical methods for comparisons of dosimetric data in radiotherapy
International Nuclear Information System (INIS)
Chaikh, Abdulhamid; Giraud, Jean-Yves; Perrin, Emmanuel; Bresciani, Jean-Pierre; Balosso, Jacques
2014-01-01
Novel irradiation techniques are continuously introduced in radiotherapy to optimize the accuracy, the security and the clinical outcome of treatments. These changes could raise the question of discontinuity in dosimetric presentation and the subsequent need for practice adjustments in case of significant modifications. This study proposes a comprehensive approach to compare different techniques and tests whether their respective dose calculation algorithms give rise to statistically significant differences in the treatment doses for the patient. Statistical investigation principles are presented in the framework of a clinical example based on 62 fields of radiotherapy for lung cancer. The delivered doses in monitor units were calculated using three different dose calculation methods: the reference method accounts the dose without tissues density corrections using Pencil Beam Convolution (PBC) algorithm, whereas new methods calculate the dose with tissues density correction for 1D and 3D using Modified Batho (MB) method and Equivalent Tissue air ratio (ETAR) method, respectively. The normality of the data and the homogeneity of variance between groups were tested using Shapiro-Wilks and Levene test, respectively, then non-parametric statistical tests were performed. Specifically, the dose means estimated by the different calculation methods were compared using Friedman’s test and Wilcoxon signed-rank test. In addition, the correlation between the doses calculated by the three methods was assessed using Spearman’s rank and Kendall’s rank tests. The Friedman’s test showed a significant effect on the calculation method for the delivered dose of lung cancer patients (p <0.001). The density correction methods yielded to lower doses as compared to PBC by on average (−5 ± 4.4 SD) for MB and (−4.7 ± 5 SD) for ETAR. Post-hoc Wilcoxon signed-rank test of paired comparisons indicated that the delivered dose was significantly reduced using density
Evaluating ranking methods on heterogeneous digital library collections
Canévet, Olivier; Marian, Ludmila; Chonavel, Thierry
In the frame of research in particle physics, CERN has been developing its own web-based software /Invenio/ to run the digital library of all the documents related to CERN and fundamental physics. The documents (articles, photos, news, thesis, ...) can be retrieved through a search engine. The results matching the query of the user can be displayed in several ways: sorted by latest first, author, title and also ranked by word similarity. The purpose of this project is to study and implement a new ranking method in Invenio: distributed-ranking (D-Rank). This method aims at aggregating several ranking scores coming from different ranking methods into a new score. In addition to query-related scores such as word similarity, the goal of the work is to take into account non-query-related scores such as citations, journal impact factor and in particular scores related to the document access frequency in the database. The idea is that for two equally query-relevant documents, if one has been more downloaded for inst...
An Improved Rank Correlation Effect Size Statistic for Single-Case Designs: Baseline Corrected Tau.
Tarlow, Kevin R
2017-07-01
Measuring treatment effects when an individual's pretreatment performance is improving poses a challenge for single-case experimental designs. It may be difficult to determine whether improvement is due to the treatment or due to the preexisting baseline trend. Tau- U is a popular single-case effect size statistic that purports to control for baseline trend. However, despite its strengths, Tau- U has substantial limitations: Its values are inflated and not bound between -1 and +1, it cannot be visually graphed, and its relatively weak method of trend control leads to unacceptable levels of Type I error wherein ineffective treatments appear effective. An improved effect size statistic based on rank correlation and robust regression, Baseline Corrected Tau, is proposed and field-tested with both published and simulated single-case time series. A web-based calculator for Baseline Corrected Tau is also introduced for use by single-case investigators.
Heskes, Tom; Eisinga, Rob; Breitling, Rainer
2014-11-21
The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .
An Improved Fuzzy Based Missing Value Estimation in DNA Microarray Validated by Gene Ranking
Directory of Open Access Journals (Sweden)
Sujay Saha
2016-01-01
Full Text Available Most of the gene expression data analysis algorithms require the entire gene expression matrix without any missing values. Hence, it is necessary to devise methods which would impute missing data values accurately. There exist a number of imputation algorithms to estimate those missing values. This work starts with a microarray dataset containing multiple missing values. We first apply the modified version of the fuzzy theory based existing method LRFDVImpute to impute multiple missing values of time series gene expression data and then validate the result of imputation by genetic algorithm (GA based gene ranking methodology along with some regular statistical validation techniques, like RMSE method. Gene ranking, as far as our knowledge, has not been used yet to validate the result of missing value estimation. Firstly, the proposed method has been tested on the very popular Spellman dataset and results show that error margins have been drastically reduced compared to some previous works, which indirectly validates the statistical significance of the proposed method. Then it has been applied on four other 2-class benchmark datasets, like Colorectal Cancer tumours dataset (GDS4382, Breast Cancer dataset (GSE349-350, Prostate Cancer dataset, and DLBCL-FL (Leukaemia for both missing value estimation and ranking the genes, and the results show that the proposed method can reach 100% classification accuracy with very few dominant genes, which indirectly validates the biological significance of the proposed method.
Ranking Scientific Publications Based on Their Citation Graph
Marian, L; Rajman, M
2009-01-01
CDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods based on the bibliographic citation graph extracted from the CDS Invenio database. As a first step, we implemented the Citation Count as a baseline ranking method. The major disadvantage of this method is that all citations are treated equally, disregarding their importance and their publication date. To overcome this drawback, we consider two different approaches: a link-based approach which extends the PageRank model to the bibliographic citation graph and a time-dependent approach which takes into account time in the citation counts. In addition, we also combined these two approaches in a hybrid model based on a time-dependent PageRank. In the present document, we describe the conceptual background behind our new...
Virtual drug screen schema based on multiview similarity integration and ranking aggregation.
Kang, Hong; Sheng, Zhen; Zhu, Ruixin; Huang, Qi; Liu, Qi; Cao, Zhiwei
2012-03-26
The current drug virtual screen (VS) methods mainly include two categories. i.e., ligand/target structure-based virtual screen and that, utilizing protein-ligand interaction fingerprint information based on the large number of complex structures. Since the former one focuses on the one-side information while the later one focuses on the whole complex structure, they are thus complementary and can be boosted by each other. However, a common problem faced here is how to present a comprehensive understanding and evaluation of the various virtual screen results derived from various VS methods. Furthermore, there is still an urgent need for developing an efficient approach to fully integrate various VS methods from a comprehensive multiview perspective. In this study, our virtual screen schema based on multiview similarity integration and ranking aggregation was tested comprehensively with statistical evaluations, providing several novel and useful clues on how to perform drug VS from multiple heterogeneous data sources. (1) 18 complex structures of HIV-1 protease with ligands from the PDB were curated as a test data set and the VS was performed with five different drug representations. Ritonavir ( 1HXW ) was selected as the query in VS and the weighted ranks of the query results were aggregated from multiple views through four similarity integration approaches. (2) Further, one of the ranking aggregation methods was used to integrate the similarity ranks calculated by gene ontology (GO) fingerprint and structural fingerprint on the data set from connectivity map, and two typical HDAC and HSP90 inhibitors were chosen as the queries. The results show that rank aggregation can enhance the result of similarity searching in VS when two or more descriptions are involved and provide a more reasonable similarity rank result. Our study shows that integrated VS based on multiple data fusion can achieve a remarkable better performance compared to that from individual ones and
A Hybrid Distance-Based Ideal-Seeking Consensus Ranking Model
Directory of Open Access Journals (Sweden)
Madjid Tavana
2007-01-01
Full Text Available Ordinal consensus ranking problems have received much attention in the management science literature. A problem arises in situations where a group of k decision makers (DMs is asked to rank order n alternatives. The question is how to combine the DM rankings into one consensus ranking. Several different approaches have been suggested to aggregate DM responses into a compromise or consensus ranking; however, the similarity of consensus rankings generated by the different algorithms is largely unknown. In this paper, we propose a new hybrid distance-based ideal-seeking consensus ranking model (DCM. The proposed hybrid model combines parts of the two commonly used consensus ranking techniques of Beck and Lin (1983 and Cook and Kress (1985 into an intuitive and computationally simple model. We illustrate our method and then run a Monte Carlo simulation across a range of k and n to compare the similarity of the consensus rankings generated by our method with the best-known method of Borda and Kendall (Kendall 1962 and the two methods proposed by Beck and Lin (1983 and Cook and Kress (1985. DCM and Beck and Lin's method yielded the most similar consensus rankings, whereas the Cook-Kress method and the Borda-Kendall method yielded the least similar consensus rankings.
Quantum probability ranking principle for ligand-based virtual screening
Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal
2017-04-01
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Quantum probability ranking principle for ligand-based virtual screening.
Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal
2017-04-01
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Rank-based permutation approaches for non-parametric factorial designs.
Umlauft, Maria; Konietschke, Frank; Pauly, Markus
2017-11-01
Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal-Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. © 2017 The British Psychological Society.
Inverted rank distributions: Macroscopic statistics, universality classes, and critical exponents
Eliazar, Iddo; Cohen, Morrel H.
2014-01-01
An inverted rank distribution is an infinite sequence of positive sizes ordered in a monotone increasing fashion. Interlacing together Lorenzian and oligarchic asymptotic analyses, we establish a macroscopic classification of inverted rank distributions into five “socioeconomic” universality classes: communism, socialism, criticality, feudalism, and absolute monarchy. We further establish that: (i) communism and socialism are analogous to a “disordered phase”, feudalism and absolute monarchy are analogous to an “ordered phase”, and criticality is the “phase transition” between order and disorder; (ii) the universality classes are characterized by two critical exponents, one governing the ordered phase, and the other governing the disordered phase; (iii) communism, criticality, and absolute monarchy are characterized by sharp exponent values, and are inherently deterministic; (iv) socialism is characterized by a continuous exponent range, is inherently stochastic, and is universally governed by continuous power-law statistics; (v) feudalism is characterized by a continuous exponent range, is inherently stochastic, and is universally governed by discrete exponential statistics. The results presented in this paper yield a universal macroscopic socioeconophysical perspective of inverted rank distributions.
Diagrammatic perturbation methods in networks and sports ranking combinatorics
International Nuclear Information System (INIS)
Park, Juyong
2010-01-01
Analytic and computational tools developed in statistical physics are being increasingly applied to the study of complex networks. Here we present recent developments in the diagrammatic perturbation methods for the exponential random graph models, and apply them to the combinatoric problem of determining the ranking of nodes in directed networks that represent pairwise competitions
Solutions of interval type-2 fuzzy polynomials using a new ranking method
Rahman, Nurhakimah Ab.; Abdullah, Lazim; Ghani, Ahmad Termimi Ab.; Ahmad, Noor'Ani
2015-10-01
A few years ago, a ranking method have been introduced in the fuzzy polynomial equations. Concept of the ranking method is proposed to find actual roots of fuzzy polynomials (if exists). Fuzzy polynomials are transformed to system of crisp polynomials, performed by using ranking method based on three parameters namely, Value, Ambiguity and Fuzziness. However, it was found that solutions based on these three parameters are quite inefficient to produce answers. Therefore in this study a new ranking method have been developed with the aim to overcome the inherent weakness. The new ranking method which have four parameters are then applied in the interval type-2 fuzzy polynomials, covering the interval type-2 of fuzzy polynomial equation, dual fuzzy polynomial equations and system of fuzzy polynomials. The efficiency of the new ranking method then numerically considered in the triangular fuzzy numbers and the trapezoidal fuzzy numbers. Finally, the approximate solutions produced from the numerical examples indicate that the new ranking method successfully produced actual roots for the interval type-2 fuzzy polynomials.
Recurrent fuzzy ranking methods
Hajjari, Tayebeh
2012-11-01
With the increasing development of fuzzy set theory in various scientific fields and the need to compare fuzzy numbers in different areas. Therefore, Ranking of fuzzy numbers plays a very important role in linguistic decision-making, engineering, business and some other fuzzy application systems. Several strategies have been proposed for ranking of fuzzy numbers. Each of these techniques has been shown to produce non-intuitive results in certain case. In this paper, we reviewed some recent ranking methods, which will be useful for the researchers who are interested in this area.
MCDM based evaluation and ranking of commercial off-the-shelf using fuzzy based matrix method
Directory of Open Access Journals (Sweden)
Rakesh Garg
2017-04-01
Full Text Available In today’s scenario, software has become an essential component in all kinds of systems. The size and the complexity of the software increases with a corresponding increase in its functionality, hence leads to the development of the modular software systems. Software developers emphasize on the concept of component based software engineering (CBSE for the development of modular software systems. The CBSE concept consists of dividing the software into a number of modules; selecting Commercial Off-the-Shelf (COTS for each module; and finally integrating the modules to develop the final software system. The selection of COTS for any module plays a vital role in software development. To address the problem of selection of COTS, a framework for ranking and selection of various COTS components for any software system based on expert opinion elicitation and fuzzy-based matrix methodology is proposed in this research paper. The selection problem is modeled as a multi-criteria decision making (MCDM problem. The evaluation criteria are identified through extensive literature study and the COTS components are ranked based on these identified and selected evaluation criteria using the proposed methods according to the value of a permanent function of their criteria matrices. The methodology is explained through an example and is validated by comparing with an existing method.
Ranking the Online Documents Based on Relative Credibility Measures
Directory of Open Access Journals (Sweden)
Ahmad Dahlan
2013-09-01
Full Text Available Information searching is the most popular activity in Internet. Usually the search engine provides the search results ranked by the relevance. However, for a certain purpose that concerns with information credibility, particularly citing information for scientific works, another approach of ranking the search engine results is required. This paper presents a study on developing a new ranking method based on the credibility of information. The method is built up upon two well-known algorithms, PageRank and Citation Analysis. The result of the experiment that used Spearman Rank Correlation Coefficient to compare the proposed rank (generated by the method with the standard rank (generated manually by a group of experts showed that the average Spearman 0 < rS < critical value. It means that the correlation was proven but it was not significant. Hence the proposed rank does not satisfy the standard but the performance could be improved.
Ranking the Online Documents Based on Relative Credibility Measures
Directory of Open Access Journals (Sweden)
Ahmad Dahlan
2009-05-01
Full Text Available Information searching is the most popular activity in Internet. Usually the search engine provides the search results ranked by the relevance. However, for a certain purpose that concerns with information credibility, particularly citing information for scientific works, another approach of ranking the search engine results is required. This paper presents a study on developing a new ranking method based on the credibility of information. The method is built up upon two well-known algorithms, PageRank and Citation Analysis. The result of the experiment that used Spearman Rank Correlation Coefficient to compare the proposed rank (generated by the method with the standard rank (generated manually by a group of experts showed that the average Spearman 0 < rS < critical value. It means that the correlation was proven but it was not significant. Hence the proposed rank does not satisfy the standard but the performance could be improved.
Yager’s ranking method for solving the trapezoidal fuzzy number linear programming
Karyati; Wutsqa, D. U.; Insani, N.
2018-03-01
In the previous research, the authors have studied the fuzzy simplex method for trapezoidal fuzzy number linear programming based on the Maleki’s ranking function. We have found some theories related to the term conditions for the optimum solution of fuzzy simplex method, the fuzzy Big-M method, the fuzzy two-phase method, and the sensitivity analysis. In this research, we study about the fuzzy simplex method based on the other ranking function. It is called Yager's ranking function. In this case, we investigate the optimum term conditions. Based on the result of research, it is found that Yager’s ranking function is not like Maleki’s ranking function. Using the Yager’s function, the simplex method cannot work as well as when using the Maleki’s function. By using the Yager’s function, the value of the subtraction of two equal fuzzy numbers is not equal to zero. This condition makes the optimum table of the fuzzy simplex table is undetected. As a result, the simplified fuzzy simplex table becomes stopped and does not reach the optimum solution.
Van der Fels-Klerx, H J; Van Asselt, E D; Raley, M; Poulsen, M; Korsgaard, H; Bredsdorff, L; Nauta, M; D'agostino, M; Coles, D; Marvin, H J P; Frewer, L J
2018-01-22
This study aimed to critically review methods for ranking risks related to food safety and dietary hazards on the basis of their anticipated human health impacts. A literature review was performed to identify and characterize methods for risk ranking from the fields of food, environmental science and socio-economic sciences. The review used a predefined search protocol, and covered the bibliographic databases Scopus, CAB Abstracts, Web of Sciences, and PubMed over the period 1993-2013. All references deemed relevant, on the basis of predefined evaluation criteria, were included in the review, and the risk ranking method characterized. The methods were then clustered-based on their characteristics-into eleven method categories. These categories included: risk assessment, comparative risk assessment, risk ratio method, scoring method, cost of illness, health adjusted life years (HALY), multi-criteria decision analysis, risk matrix, flow charts/decision trees, stated preference techniques and expert synthesis. Method categories were described by their characteristics, weaknesses and strengths, data resources, and fields of applications. It was concluded there is no single best method for risk ranking. The method to be used should be selected on the basis of risk manager/assessor requirements, data availability, and the characteristics of the method. Recommendations for future use and application are provided.
International Nuclear Information System (INIS)
Chou, Jui-Sheng; Ongkowijoyo, Citra Satria
2015-01-01
Corporate competitiveness is heavily influenced by the information acquired, processed, utilized and transferred by professional staff involved in the supply chain. This paper develops a decision aid for selecting on-site ready-mix concrete (RMC) unloading type in decision making situations involving multiple stakeholders and evaluation criteria. The uncertainty of criteria weights set by expert judgment can be transformed in random ways based on the probabilistic virtual-scale method within a prioritization matrix. The ranking is performed by grey relational grade systems considering stochastic criteria weight based on individual preference. Application of the decision aiding model in actual RMC case confirms that the method provides a robust and effective tool for facilitating decision making under uncertainty. - Highlights: • This study models decision aiding method to assess ready-mix concrete unloading type. • Applying Monte Carlo simulation to virtual-scale method achieves a reliable process. • Individual preference ranking method enhances the quality of global decision making. • Robust stochastic superiority and inferiority ranking obtains reasonable results
International Nuclear Information System (INIS)
Wilson, G.E.
1992-01-01
The Analytic Hierarchy Process (AHP) has been used to help determine the importance of components and phenomena in thermal-hydraulic safety analyses of nuclear reactors. The AHP results are based, in part on expert opinion. Therefore, it is prudent to evaluate the uncertainty of the AHP ranks of importance. Prior applications have addressed uncertainty with experimental data comparisons and bounding sensitivity calculations. These methods work well when a sufficient experimental data base exists to justify the comparisons. However, in the case of limited or no experimental data the size of the uncertainty is normally made conservatively large. Accordingly, the author has taken another approach, that of performing a statistically based uncertainty analysis. The new work is based on prior evaluations of the importance of components and phenomena in the thermal-hydraulic safety analysis of the Advanced Neutron Source Reactor (ANSR), a new facility now in the design phase. The uncertainty during large break loss of coolant, and decay heat removal scenarios is estimated by assigning a probability distribution function (pdf) to the potential error in the initial expert estimates of pair-wise importance between the components. Using a Monte Carlo sampling technique, the error pdfs are propagated through the AHP software solutions to determine a pdf of uncertainty in the system wide importance of each component. To enhance the generality of the results, study of one other problem having different number of elements is reported, as are the effects of a larger assumed pdf error in the expert ranks. Validation of the Monte Carlo sample size and repeatability are also documented
How Many Alternatives Can Be Ranked? A Comparison of the Paired Comparison and Ranking Methods.
Ock, Minsu; Yi, Nari; Ahn, Jeonghoon; Jo, Min-Woo
2016-01-01
To determine the feasibility of converting ranking data into paired comparison (PC) data and suggest the number of alternatives that can be ranked by comparing a PC and a ranking method. Using a total of 222 health states, a household survey was conducted in a sample of 300 individuals from the general population. Each respondent performed a PC 15 times and a ranking method 6 times (two attempts of ranking three, four, and five health states, respectively). The health states of the PC and the ranking method were constructed to overlap each other. We converted the ranked data into PC data and examined the consistency of the response rate. Applying probit regression, we obtained the predicted probability of each method. Pearson correlation coefficients were determined between the predicted probabilities of those methods. The mean absolute error was also assessed between the observed and the predicted values. The overall consistency of the response rate was 82.8%. The Pearson correlation coefficients were 0.789, 0.852, and 0.893 for ranking three, four, and five health states, respectively. The lowest mean absolute error was 0.082 (95% confidence interval [CI] 0.074-0.090) in ranking five health states, followed by 0.123 (95% CI 0.111-0.135) in ranking four health states and 0.126 (95% CI 0.113-0.138) in ranking three health states. After empirically examining the consistency of the response rate between a PC and a ranking method, we suggest that using five alternatives in the ranking method may be superior to using three or four alternatives. Copyright © 2016 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
A Ranking Method for Neutral Pion and Eta Selection in Hadronic Events
International Nuclear Information System (INIS)
Bingoel, A.
2004-01-01
The selection of neutral pions and etas with a high purity while maintaining also a high efficiency can be important in the formation of statistically significant mass spectra in the reconstruction of short-lived particles such as the omega meson (ω→π + + π - + π 0 ). In this study a Ranking method has been optimized for data from the ALEPH Experiment, CERN. The results show that the Ranking method, when applied to high multiplicity events, yields significant improvements in the purity of selected pion candidates and facilitates the relaxation of standard cuts thereby avoiding some systematic uncertainties
Directory of Open Access Journals (Sweden)
Samah Ibrahim Abdel Aal
2018-03-01
Full Text Available The concept of neutrosophic can provide a generalization of fuzzy set and intuitionistic fuzzy set that make it is the best fit in representing indeterminacy and uncertainty. Single Valued Triangular Numbers (SVTrN-numbers is a special case of neutrosophic set that can handle ill-known quantity very difficult problems. This work intended to introduce a framework with two types of ranking methods. The results indicated that each ranking method has its own advantage. In this perspective, the weighted value and ambiguity based method gives more attention to uncertainty in ranking and evaluating ISQ as well as it takes into account cut sets of SVTrN numbers that can reflect the information on Truth-membership-membership degree, false membership-membership degree and Indeterminacy-membership degree. The value index and ambiguity index method can reflect the decision maker's subjectivity attitude to the SVTrN- numbers.
Weighted Discriminative Dictionary Learning based on Low-rank Representation
International Nuclear Information System (INIS)
Chang, Heyou; Zheng, Hao
2017-01-01
Low-rank representation has been widely used in the field of pattern classification, especially when both training and testing images are corrupted with large noise. Dictionary plays an important role in low-rank representation. With respect to the semantic dictionary, the optimal representation matrix should be block-diagonal. However, traditional low-rank representation based dictionary learning methods cannot effectively exploit the discriminative information between data and dictionary. To address this problem, this paper proposed weighted discriminative dictionary learning based on low-rank representation, where a weighted representation regularization term is constructed. The regularization associates label information of both training samples and dictionary atoms, and encourages to generate a discriminative representation with class-wise block-diagonal structure, which can further improve the classification performance where both training and testing images are corrupted with large noise. Experimental results demonstrate advantages of the proposed method over the state-of-the-art methods. (paper)
DEFF Research Database (Denmark)
Cavaliere, Giuseppe; Angelis, Luca De; Rahbek, Anders
2015-01-01
In this article, we investigate the behaviour of a number of methods for estimating the co-integration rank in VAR systems characterized by heteroskedastic innovation processes. In particular, we compare the efficacy of the most widely used information criteria, such as Akaike Information Criterion....... The relative finite-sample properties of the different methods are investigated by means of a Monte Carlo simulation study. For the simulation DGPs considered in the analysis, we find that the BIC-based procedure and the bootstrap sequential test procedure deliver the best overall performance in terms......-based method to over-estimate the co-integration rank in relatively small sample sizes....
Probabilistic real-time contingency ranking method
International Nuclear Information System (INIS)
Mijuskovic, N.A.; Stojnic, D.
2000-01-01
This paper describes a real-time contingency method based on a probabilistic index-expected energy not supplied. This way it is possible to take into account the stochastic nature of the electric power system equipment outages. This approach enables more comprehensive ranking of contingencies and it is possible to form reliability cost values that can form the basis for hourly spot price calculations. The electric power system of Serbia is used as an example for the method proposed. (author)
Statistical approach for selection of regression model during validation of bioanalytical method
Directory of Open Access Journals (Sweden)
Natalija Nakov
2014-06-01
Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.
Co-integration Rank Testing under Conditional Heteroskedasticity
DEFF Research Database (Denmark)
Cavaliere, Guiseppe; Rahbæk, Anders; Taylor, A.M. Robert
null distributions of the rank statistics coincide with those derived by previous authors who assume either i.i.d. or (strict and covariance) stationary martingale difference innovations. We then propose wild bootstrap implementations of the co-integrating rank tests and demonstrate that the associated...... bootstrap rank statistics replicate the first-order asymptotic null distributions of the rank statistics. We show the same is also true of the corresponding rank tests based on the i.i.d. bootstrap of Swensen (2006). The wild bootstrap, however, has the important property that, unlike the i.i.d. bootstrap......, it preserves in the re-sampled data the pattern of heteroskedasticity present in the original shocks. Consistent with this, numerical evidence sug- gests that, relative to tests based on the asymptotic critical values or the i.i.d. bootstrap, the wild bootstrap rank tests perform very well in small samples un...
Sailaukhanuly, Yerbolat; Zhakupbekova, Arai; Amutova, Farida; Carlsen, Lars
2013-01-01
Knowledge of the environmental behavior of chemicals is a fundamental part of the risk assessment process. The present paper discusses various methods of ranking of a series of persistent organic pollutants (POPs) according to the persistence, bioaccumulation and toxicity (PBT) characteristics. Traditionally ranking has been done as an absolute (total) ranking applying various multicriteria data analysis methods like simple additive ranking (SAR) or various utility functions (UFs) based rankings. An attractive alternative to these ranking methodologies appears to be partial order ranking (POR). The present paper compares different ranking methods like SAR, UF and POR. Significant discrepancies between the rankings are noted and it is concluded that partial order ranking, as a method without any pre-assumptions concerning possible relation between the single parameters, appears as the most attractive ranking methodology. In addition to the initial ranking partial order methodology offers a wide variety of analytical tools to elucidate the interplay between the objects to be ranked and the ranking parameters. In the present study is included an analysis of the relative importance of the single P, B and T parameters. Copyright © 2012 Elsevier Ltd. All rights reserved.
A stable systemic risk ranking in China's banking sector: Based on principal component analysis
Fang, Libing; Xiao, Binqing; Yu, Honghai; You, Qixing
2018-02-01
In this paper, we compare five popular systemic risk rankings, and apply principal component analysis (PCA) model to provide a stable systemic risk ranking for the Chinese banking sector. Our empirical results indicate that five methods suggest vastly different systemic risk rankings for the same bank, while the combined systemic risk measure based on PCA provides a reliable ranking. Furthermore, according to factor loadings of the first component, PCA combined ranking is mainly based on fundamentals instead of market price data. We clearly find that price-based rankings are not as practical a method as fundamentals-based ones. This PCA combined ranking directly shows systemic risk contributions of each bank for banking supervision purpose and reminds banks to prevent and cope with the financial crisis in advance.
Mallik, Saurav; Maulik, Ujjwal
2015-10-01
Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided. Copyright © 2015 Elsevier Inc. All rights reserved.
An Adaptive Reordered Method for Computing PageRank
Directory of Open Access Journals (Sweden)
Yi-Ming Bu
2013-01-01
Full Text Available We propose an adaptive reordered method to deal with the PageRank problem. It has been shown that one can reorder the hyperlink matrix of PageRank problem to calculate a reduced system and get the full PageRank vector through forward substitutions. This method can provide a speedup for calculating the PageRank vector. We observe that in the existing reordered method, the cost of the recursively reordering procedure could offset the computational reduction brought by minimizing the dimension of linear system. With this observation, we introduce an adaptive reordered method to accelerate the total calculation, in which we terminate the reordering procedure appropriately instead of reordering to the end. Numerical experiments show the effectiveness of this adaptive reordered method.
Solving the interval type-2 fuzzy polynomial equation using the ranking method
Rahman, Nurhakimah Ab.; Abdullah, Lazim
2014-07-01
Polynomial equations with trapezoidal and triangular fuzzy numbers have attracted some interest among researchers in mathematics, engineering and social sciences. There are some methods that have been developed in order to solve these equations. In this study we are interested in introducing the interval type-2 fuzzy polynomial equation and solving it using the ranking method of fuzzy numbers. The ranking method concept was firstly proposed to find real roots of fuzzy polynomial equation. Therefore, the ranking method is applied to find real roots of the interval type-2 fuzzy polynomial equation. We transform the interval type-2 fuzzy polynomial equation to a system of crisp interval type-2 fuzzy polynomial equation. This transformation is performed using the ranking method of fuzzy numbers based on three parameters, namely value, ambiguity and fuzziness. Finally, we illustrate our approach by numerical example.
Ranking mutual funds using Sortino method
Directory of Open Access Journals (Sweden)
Khosro Faghani Makrani
2014-04-01
Full Text Available One of the primary concerns on most business activities is to determine an efficient method for ranking mutual funds. This paper performs an empirical investigation to rank 42 mutual funds listed on Tehran Stock Exchange using Sortino method over the period 2011-2012. The results of survey have been compared with market return and the results have confirmed that there were some positive and meaningful relationships between Sortino return and market return. In addition, there were some positive and meaningful relationship between two Sortino methods.
A Rational Method for Ranking Engineering Programs.
Glower, Donald D.
1980-01-01
Compares two methods for ranking academic programs, the opinion poll v examination of career successes of the program's alumni. For the latter, "Who's Who in Engineering" and levels of research funding provided data. Tables display resulting data and compare rankings by the two methods for chemical engineering and civil engineering. (CS)
Del Carratore, Francesco; Jankevics, Andris; Eisinga, Rob; Heskes, Tom; Hong, Fangxin; Breitling, Rainer
2017-09-01
The Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable. We implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P -value estimation methods have been replaced by exact methods, providing faster and more accurate results. RankProd 2.0 is available at Bioconductor ( https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html ) and as part of the mzMatch pipeline ( http://www.mzmatch.sourceforge.net ). rainer.breitling@manchester.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Population based ranking of frameless CT-MRI registration methods
Energy Technology Data Exchange (ETDEWEB)
Opposits, Gabor; Kis, Sandor A.; Tron, Lajos; Emri, Miklos [Debrecen Univ. (Hungary). Dept. of Nuclear Medicine; Berenyi, Ervin [Debrecen Univ. (Hungary). Dept. of Biomedical Laboratory and Imaging Science; Takacs, Endre [Rotating Gamma Ltd., Debrecen (Hungary); Dobai, Jozsef G.; Bognar, Laszlo [Debrecen Univ., Medical Center (Hungary). Dept. of Neurosurgery; Szuecs, Bernadett [ScanoMed Ltd., Debrecen (Hungary)
2015-07-01
Clinical practice often requires simultaneous information obtained by two different imaging modalities. Registration algorithms are commonly used for this purpose. Automated procedures are very helpful in cases when the same kind of registration has to be performed on images of a high number of subjects. Radiotherapists would prefer to use the best automated method to assist therapy planning, however there are not accepted procedures for ranking the different registration algorithms. We were interested in developing a method to measure the population level performance of CT-MRI registration algorithms by a parameter of values in the [0,1] interval. Pairs of CT and MRI images were collected from 1051 subjects. Results of an automated registration were corrected manually until a radiologist and a neurosurgeon expert both accepted the result as good. This way 1051 registered MRI images were produced by the same pair of experts to be used as gold standards for the evaluation of the performance of other registration algorithms. Pearson correlation coefficient, mutual information, normalized mutual information, Kullback-Leibler divergence, L{sub 1} norm and square L{sub 2} norm (dis)similarity measures were tested for sensitivity to indicate the extent of (dis)similarity of a pair of individual mismatched images. The square Hellinger distance proved suitable to grade the performance of registration algorithms at population level providing the developers with a valuable tool to rank algorithms. The developed procedure provides an objective method to find the registration algorithm performing the best on the population level out of newly constructed or available preselected ones.
Statistical regularities in the rank-citation profile of scientists.
Petersen, Alexander M; Stanley, H Eugene; Succi, Sauro
2011-01-01
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile c(i)(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c(i)(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different c(i)(r) profiles, our results demonstrate the utility of the β(i) scaling parameter in conjunction with h(i) for quantifying individual publication impact. We show that the total number of citations C(i) tallied from a scientist's N(i) papers scales as [Formula: see text]. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.
Rank-based model selection for multiple ions quantum tomography
International Nuclear Information System (INIS)
Guţă, Mădălin; Kypraios, Theodore; Dryden, Ian
2012-01-01
The statistical analysis of measurement data has become a key component of many quantum engineering experiments. As standard full state tomography becomes unfeasible for large dimensional quantum systems, one needs to exploit prior information and the ‘sparsity’ properties of the experimental state in order to reduce the dimensionality of the estimation problem. In this paper we propose model selection as a general principle for finding the simplest, or most parsimonious explanation of the data, by fitting different models and choosing the estimator with the best trade-off between likelihood fit and model complexity. We apply two well established model selection methods—the Akaike information criterion (AIC) and the Bayesian information criterion (BIC)—two models consisting of states of fixed rank and datasets such as are currently produced in multiple ions experiments. We test the performance of AIC and BIC on randomly chosen low rank states of four ions, and study the dependence of the selected rank with the number of measurement repetitions for one ion states. We then apply the methods to real data from a four ions experiment aimed at creating a Smolin state of rank 4. By applying the two methods together with the Pearson χ 2 test we conclude that the data can be suitably described with a model whose rank is between 7 and 9. Additionally we find that the mean square error of the maximum likelihood estimator for pure states is close to that of the optimal over all possible measurements. (paper)
Feature selection for splice site prediction: A new method using EDA-based feature ranking
Directory of Open Access Journals (Sweden)
Rouzé Pierre
2004-05-01
Full Text Available Abstract Background The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. Results In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. Conclusion We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.
A ranking method for the concurrent learning of compounds with various activity profiles.
Dörr, Alexander; Rosenbaum, Lars; Zell, Andreas
2015-01-01
In this study, we present a SVM-based ranking algorithm for the concurrent learning of compounds with different activity profiles and their varying prioritization. To this end, a specific labeling of each compound was elaborated in order to infer virtual screening models against multiple targets. We compared the method with several state-of-the-art SVM classification techniques that are capable of inferring multi-target screening models on three chemical data sets (cytochrome P450s, dehydrogenases, and a trypsin-like protease data set) containing three different biological targets each. The experiments show that ranking-based algorithms show an increased performance for single- and multi-target virtual screening. Moreover, compounds that do not completely fulfill the desired activity profile are still ranked higher than decoys or compounds with an entirely undesired profile, compared to other multi-target SVM methods. SVM-based ranking methods constitute a valuable approach for virtual screening in multi-target drug design. The utilization of such methods is most helpful when dealing with compounds with various activity profiles and the finding of many ligands with an already perfectly matching activity profile is not to be expected.
Noma, Hisashi; Matsui, Shigeyuki
2013-05-20
The main purpose of microarray studies is screening of differentially expressed genes as candidates for further investigation. Because of limited resources in this stage, prioritizing genes are relevant statistical tasks in microarray studies. For effective gene selections, parametric empirical Bayes methods for ranking and selection of genes with largest effect sizes have been proposed (Noma et al., 2010; Biostatistics 11: 281-289). The hierarchical mixture model incorporates the differential and non-differential components and allows information borrowing across differential genes with separation from nuisance, non-differential genes. In this article, we develop empirical Bayes ranking methods via a semiparametric hierarchical mixture model. A nonparametric prior distribution, rather than parametric prior distributions, for effect sizes is specified and estimated using the "smoothing by roughening" approach of Laird and Louis (1991; Computational statistics and data analysis 12: 27-37). We present applications to childhood and infant leukemia clinical studies with microarrays for exploring genes related to prognosis or disease progression. Copyright © 2012 John Wiley & Sons, Ltd.
Intuitive introductory statistics
Wolfe, Douglas A
2017-01-01
This textbook is designed to give an engaging introduction to statistics and the art of data analysis. The unique scope includes, but also goes beyond, classical methodology associated with the normal distribution. What if the normal model is not valid for a particular data set? This cutting-edge approach provides the alternatives. It is an introduction to the world and possibilities of statistics that uses exercises, computer analyses, and simulations throughout the core lessons. These elementary statistical methods are intuitive. Counting and ranking features prominently in the text. Nonparametric methods, for instance, are often based on counts and ranks and are very easy to integrate into an introductory course. The ease of computation with advanced calculators and statistical software, both of which factor into this text, allows important techniques to be introduced earlier in the study of statistics. This book's novel scope also includes measuring symmetry with Walsh averages, finding a nonp...
Statistical trend analysis methods for temporal phenomena
Energy Technology Data Exchange (ETDEWEB)
Lehtinen, E.; Pulkkinen, U. [VTT Automation, (Finland); Poern, K. [Poern Consulting, Nykoeping (Sweden)
1997-04-01
We consider point events occurring in a random way in time. In many applications the pattern of occurrence is of intrinsic interest as indicating a trend or some other systematic feature in the rate of occurrence. The purpose of this report is to survey briefly different statistical trend analysis methods and illustrate their applicability to temporal phenomena in particular. The trend testing of point events is usually seen as the testing of the hypotheses concerning the intensity of the occurrence of events. When the intensity function is parametrized, the testing of trend is a typical parametric testing problem. In industrial applications the operational experience generally does not suggest any specified model and method in advance. Therefore, and particularly, if the Poisson process assumption is very questionable, it is desirable to apply tests that are valid for a wide variety of possible processes. The alternative approach for trend testing is to use some non-parametric procedure. In this report we have presented four non-parametric tests: The Cox-Stuart test, the Wilcoxon signed ranks test, the Mann test, and the exponential ordered scores test. In addition to the classical parametric and non-parametric approaches we have also considered the Bayesian trend analysis. First we discuss a Bayesian model, which is based on a power law intensity model. The Bayesian statistical inferences are based on the analysis of the posterior distribution of the trend parameters, and the probability of trend is immediately seen from these distributions. We applied some of the methods discussed in an example case. It should be noted, that this report is a feasibility study rather than a scientific evaluation of statistical methods, and the examples can only be seen as demonstrations of the methods. 14 refs, 10 figs.
Statistical trend analysis methods for temporal phenomena
International Nuclear Information System (INIS)
Lehtinen, E.; Pulkkinen, U.; Poern, K.
1997-04-01
We consider point events occurring in a random way in time. In many applications the pattern of occurrence is of intrinsic interest as indicating a trend or some other systematic feature in the rate of occurrence. The purpose of this report is to survey briefly different statistical trend analysis methods and illustrate their applicability to temporal phenomena in particular. The trend testing of point events is usually seen as the testing of the hypotheses concerning the intensity of the occurrence of events. When the intensity function is parametrized, the testing of trend is a typical parametric testing problem. In industrial applications the operational experience generally does not suggest any specified model and method in advance. Therefore, and particularly, if the Poisson process assumption is very questionable, it is desirable to apply tests that are valid for a wide variety of possible processes. The alternative approach for trend testing is to use some non-parametric procedure. In this report we have presented four non-parametric tests: The Cox-Stuart test, the Wilcoxon signed ranks test, the Mann test, and the exponential ordered scores test. In addition to the classical parametric and non-parametric approaches we have also considered the Bayesian trend analysis. First we discuss a Bayesian model, which is based on a power law intensity model. The Bayesian statistical inferences are based on the analysis of the posterior distribution of the trend parameters, and the probability of trend is immediately seen from these distributions. We applied some of the methods discussed in an example case. It should be noted, that this report is a feasibility study rather than a scientific evaluation of statistical methods, and the examples can only be seen as demonstrations of the methods
TWO MEASURES OF THE DEPENDENCE OF PREFERENTIAL RANKINGS ON CATEGORICAL VARIABLES
Directory of Open Access Journals (Sweden)
Lissowski Grzegorz
2017-06-01
Full Text Available The aim of this paper is to apply a general methodology for constructing statistical methods, which is based on decision theory, to give a statistical description of preferential rankings, with a focus on the rankings’ dependence on categorical variables. In the paper, I use functions of description errors that are based on the Kemeny and Hamming distances between preferential orderings, but the proposed methodology can also be applied to other methods of estimating description errors.
Directory of Open Access Journals (Sweden)
Darjan Karabasevic
2016-05-01
Full Text Available Corporate sector and companies have recognized the importance of implementation of strategy of corporate social responsibility in order to increase the company's image and responsibility towards society and the communities where they operate. Multinational companies in their everyday activities and operations pay more attention to sustainable models of corporate social responsibility. The focus of this paper is to identify the indicators of corporate social responsibility and to rank companies according to the indicators. Proposed framework for evaluation and ranking is based on the SWARA and the ARAS methods. The usability and efficiency of the proposed framework is shown on an illustrative example.
A framework for the economic analysis of data collection methods for vital statistics.
Jimenez-Soto, Eliana; Hodge, Andrew; Nguyen, Kim-Huong; Dettrick, Zoe; Lopez, Alan D
2014-01-01
Over recent years there has been a strong movement towards the improvement of vital statistics and other types of health data that inform evidence-based policies. Collecting such data is not cost free. To date there is no systematic framework to guide investment decisions on methods of data collection for vital statistics or health information in general. We developed a framework to systematically assess the comparative costs and outcomes/benefits of the various data methods for collecting vital statistics. The proposed framework is four-pronged and utilises two major economic approaches to systematically assess the available data collection methods: cost-effectiveness analysis and efficiency analysis. We built a stylised example of a hypothetical low-income country to perform a simulation exercise in order to illustrate an application of the framework. Using simulated data, the results from the stylised example show that the rankings of the data collection methods are not affected by the use of either cost-effectiveness or efficiency analysis. However, the rankings are affected by how quantities are measured. There have been several calls for global improvements in collecting useable data, including vital statistics, from health information systems to inform public health policies. Ours is the first study that proposes a systematic framework to assist countries undertake an economic evaluation of DCMs. Despite numerous challenges, we demonstrate that a systematic assessment of outputs and costs of DCMs is not only necessary, but also feasible. The proposed framework is general enough to be easily extended to other areas of health information.
CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition
Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe
2013-01-01
Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764
Identification of reliable gridded reference data for statistical downscaling methods in Alberta
Eum, H. I.; Gupta, A.
2017-12-01
Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system
Fast and precise method of contingency ranking in modern power system
DEFF Research Database (Denmark)
Rather, Zakir Hussain; Chen, Zhe; Thøgersen, Paul
2011-01-01
Contingency Analysis is one of the most important aspect of Power System Security Analysis. This paper presents a fast and precise method of contingency ranking for effective power system security analysis. The method proposed in this research work takes due consideration of both apparent power o...... is based on realistic approach taking practical situations into account. Besides taking real situations into consideration the proposed method is fast enough to be considered for on-line security analysis.......Contingency Analysis is one of the most important aspect of Power System Security Analysis. This paper presents a fast and precise method of contingency ranking for effective power system security analysis. The method proposed in this research work takes due consideration of both apparent power...
The Playground Game: Inquiry‐Based Learning About Research Methods and Statistics
Westera, Wim; Slootmaker, Aad; Kurvers, Hub
2014-01-01
The Playground Game is a web-based game that was developed for teaching research methods and statistics to nursing and social sciences students in higher education and vocational training. The complexity and abstract nature of research methods and statistics poses many challenges for students. The
Efficient nonrigid registration using ranked order statistics
DEFF Research Database (Denmark)
Tennakoon, Ruwan B.; Bab-Hadiashar, Alireza; de Bruijne, Marleen
2013-01-01
of research. In this paper we propose a fast and accurate non-rigid registration method for intra-modality volumetric images. Our approach exploits the information provided by an order statistics based segmentation method, to find the important regions for registration and use an appropriate sampling scheme......Non-rigid image registration techniques are widely used in medical imaging applications. Due to high computational complexities of these techniques, finding appropriate registration method to both reduce the computation burden and increase the registration accuracy has become an intense area...... to target those areas and reduce the registration computation time. A unique advantage of the proposed method is its ability to identify the point of diminishing returns and stop the registration process. Our experiments on registration of real lung CT images, with expert annotated landmarks, show...
Iris Template Protection Based on Local Ranking
Directory of Open Access Journals (Sweden)
Dongdong Zhao
2018-01-01
Full Text Available Biometrics have been widely studied in recent years, and they are increasingly employed in real-world applications. Meanwhile, a number of potential threats to the privacy of biometric data arise. Iris template protection demands that the privacy of iris data should be protected when performing iris recognition. According to the international standard ISO/IEC 24745, iris template protection should satisfy the irreversibility, revocability, and unlinkability. However, existing works about iris template protection demonstrate that it is difficult to satisfy the three privacy requirements simultaneously while supporting effective iris recognition. In this paper, we propose an iris template protection method based on local ranking. Specifically, the iris data are first XORed (Exclusive OR operation with an application-specific string; next, we divide the results into blocks and then partition the blocks into groups. The blocks in each group are ranked according to their decimal values, and original blocks are transformed to their rank values for storage. We also extend the basic method to support the shifting strategy and masking strategy, which are two important strategies for iris recognition. We demonstrate that the proposed method satisfies the irreversibility, revocability, and unlinkability. Experimental results on typical iris datasets (i.e., CASIA-IrisV3-Interval, CASIA-IrisV4-Lamp, UBIRIS-V1-S1, and MMU-V1 show that the proposed method could maintain the recognition performance while protecting the privacy of iris data.
Diffusion of scientific credits and the ranking of scientists
Radicchi, Filippo; Fortunato, Santo; Markines, Benjamin; Vespignani, Alessandro
2009-11-01
Recently, the abundance of digital data is enabling the implementation of graph-based ranking algorithms that provide system level analysis for ranking publications and authors. Here, we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors’ networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechanism of scientific credit transfer. On this network, we define a ranking method based on a diffusion algorithm that mimics the spreading of scientific credits on the network. We compare the results obtained with our algorithm with those obtained by local measures such as the citation count and provide a statistical analysis of the assignment of major career awards in the area of physics. A website where the algorithm is made available to perform customized rank analysis can be found at the address http://www.physauthorsrank.org.
Rank-based Tests of the Cointegrating Rank in Semiparametric Error Correction Models
Hallin, M.; van den Akker, R.; Werker, B.J.M.
2012-01-01
Abstract: This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model with i.i.d. elliptical innovations. The tests are asymptotically distribution-free, and their validity does not depend on the actual distribution of the innovations. This result holds despite the
Hybrid statistics-simulations based method for atom-counting from ADF STEM images.
De Wael, Annelies; De Backer, Annick; Jones, Lewys; Nellist, Peter D; Van Aert, Sandra
2017-06-01
A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. Copyright © 2017 Elsevier B.V. All rights reserved.
Algebraic and computational aspects of real tensor ranks
Sakata, Toshio; Miyazaki, Mitsuhiro
2016-01-01
This book provides comprehensive summaries of theoretical (algebraic) and computational aspects of tensor ranks, maximal ranks, and typical ranks, over the real number field. Although tensor ranks have been often argued in the complex number field, it should be emphasized that this book treats real tensor ranks, which have direct applications in statistics. The book provides several interesting ideas, including determinant polynomials, determinantal ideals, absolutely nonsingular tensors, absolutely full column rank tensors, and their connection to bilinear maps and Hurwitz-Radon numbers. In addition to reviews of methods to determine real tensor ranks in details, global theories such as the Jacobian method are also reviewed in details. The book includes as well an accessible and comprehensive introduction of mathematical backgrounds, with basics of positive polynomials and calculations by using the Groebner basis. Furthermore, this book provides insights into numerical methods of finding tensor ranks through...
Directory of Open Access Journals (Sweden)
Dániel Bánky
Full Text Available Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks that compensates for the low degree (non-hub vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well, but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus, and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures
Bánky, Dániel; Iván, Gábor; Grolmusz, Vince
2013-01-01
Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks) that compensates for the low degree (non-hub) vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges) of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well), but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus), and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures importance in the
Khan, Haseeb Ahmad
2005-01-28
Due to versatile diagnostic and prognostic fidelity molecular signatures or fingerprints are anticipated as the most powerful tools for cancer management in the near future. Notwithstanding the experimental advancements in microarray technology, methods for analyzing either whole arrays or gene signatures have not been firmly established. Recently, an algorithm, ArraySolver has been reported by Khan for two-group comparison of microarray gene expression data using two-tailed Wilcoxon signed-rank test. Most of the molecular signatures are composed of two sets of genes (hybrid signatures) wherein up-regulation of one set and down-regulation of the other set collectively define the purpose of a gene signature. Since the direction of a selected gene's expression (positive or negative) with respect to a particular disease condition is known, application of one-tailed statistics could be a more relevant choice. A novel method, ArrayVigil, is described for comparing hybrid signatures using segregated-one-tailed (SOT) Wilcoxon signed-rank test and the results compared with integrated-two-tailed (ITT) procedures (SPSS and ArraySolver). ArrayVigil resulted in lower P values than those obtained from ITT statistics while comparing real data from four signatures.
Survey of sampling-based methods for uncertainty and sensitivity analysis
International Nuclear Information System (INIS)
Helton, J.C.; Johnson, J.D.; Sallaberry, C.J.; Storlie, C.B.
2006-01-01
Sampling-based methods for uncertainty and sensitivity analysis are reviewed. The following topics are considered: (i) definition of probability distributions to characterize epistemic uncertainty in analysis inputs (ii) generation of samples from uncertain analysis inputs (iii) propagation of sampled inputs through an analysis (iv) presentation of uncertainty analysis results, and (v) determination of sensitivity analysis results. Special attention is given to the determination of sensitivity analysis results, with brief descriptions and illustrations given for the following procedures/techniques: examination of scatterplots, correlation analysis, regression analysis, partial correlation analysis, rank transformations, statistical tests for patterns based on gridding, entropy tests for patterns based on gridding, nonparametric regression analysis, squared rank differences/rank correlation coefficient test, two-dimensional Kolmogorov-Smirnov test, tests for patterns based on distance measures, top down coefficient of concordance, and variance decomposition
Survey of sampling-based methods for uncertainty and sensitivity analysis.
Energy Technology Data Exchange (ETDEWEB)
Johnson, Jay Dean; Helton, Jon Craig; Sallaberry, Cedric J. PhD. (.; .); Storlie, Curt B. (Colorado State University, Fort Collins, CO)
2006-06-01
Sampling-based methods for uncertainty and sensitivity analysis are reviewed. The following topics are considered: (1) Definition of probability distributions to characterize epistemic uncertainty in analysis inputs, (2) Generation of samples from uncertain analysis inputs, (3) Propagation of sampled inputs through an analysis, (4) Presentation of uncertainty analysis results, and (5) Determination of sensitivity analysis results. Special attention is given to the determination of sensitivity analysis results, with brief descriptions and illustrations given for the following procedures/techniques: examination of scatterplots, correlation analysis, regression analysis, partial correlation analysis, rank transformations, statistical tests for patterns based on gridding, entropy tests for patterns based on gridding, nonparametric regression analysis, squared rank differences/rank correlation coefficient test, two dimensional Kolmogorov-Smirnov test, tests for patterns based on distance measures, top down coefficient of concordance, and variance decomposition.
Deviation-based spam-filtering method via stochastic approach
Lee, Daekyung; Lee, Mi Jin; Kim, Beom Jun
2018-03-01
In the presence of a huge number of possible purchase choices, ranks or ratings of items by others often play very important roles for a buyer to make a final purchase decision. Perfectly objective rating is an impossible task to achieve, and we often use an average rating built on how previous buyers estimated the quality of the product. The problem of using a simple average rating is that it can easily be polluted by careless users whose evaluation of products cannot be trusted, and by malicious spammers who try to bias the rating result on purpose. In this letter we suggest how trustworthiness of individual users can be systematically and quantitatively reflected to build a more reliable rating system. We compute the suitably defined reliability of each user based on the user's rating pattern for all products she evaluated. We call our proposed method as the deviation-based ranking, since the statistical significance of each user's rating pattern with respect to the average rating pattern is the key ingredient. We find that our deviation-based ranking method outperforms existing methods in filtering out careless random evaluators as well as malicious spammers.
BridgeRank: A novel fast centrality measure based on local structure of the network
Salavati, Chiman; Abdollahpouri, Alireza; Manbari, Zhaleh
2018-04-01
Ranking nodes in complex networks have become an important task in many application domains. In a complex network, influential nodes are those that have the most spreading ability. Thus, identifying influential nodes based on their spreading ability is a fundamental task in different applications such as viral marketing. One of the most important centrality measures to ranking nodes is closeness centrality which is efficient but suffers from high computational complexity O(n3) . This paper tries to improve closeness centrality by utilizing the local structure of nodes and presents a new ranking algorithm, called BridgeRank centrality. The proposed method computes local centrality value for each node. For this purpose, at first, communities are detected and the relationship between communities is completely ignored. Then, by applying a centrality in each community, only one best critical node from each community is extracted. Finally, the nodes are ranked based on computing the sum of the shortest path length of nodes to obtained critical nodes. We have also modified the proposed method by weighting the original BridgeRank and selecting several nodes from each community based on the density of that community. Our method can find the best nodes with high spread ability and low time complexity, which make it applicable to large-scale networks. To evaluate the performance of the proposed method, we use the SIR diffusion model. Finally, experiments on real and artificial networks show that our method is able to identify influential nodes so efficiently, and achieves better performance compared to other recent methods.
Testing for Statistical Discrimination based on Gender
DEFF Research Database (Denmark)
Lesner, Rune Vammen
. It is shown that the implications of both screening discrimination and stereotyping are consistent with observable wage dynamics. In addition, it is found that the gender wage gap decreases in tenure but increases in job transitions and that the fraction of women in high-ranking positions within a firm does......This paper develops a model which incorporates the two most commonly cited strands of the literature on statistical discrimination, namely screening discrimination and stereotyping. The model is used to provide empirical evidence of statistical discrimination based on gender in the labour market...... not affect the level of statistical discrimination by gender....
Ranking of input parameters importance for BWR stability based on Ringhals-1
International Nuclear Information System (INIS)
Gajev, Ivan; Kozlowski, Tomasz; Xu, Yunlin; Downar, Thomas
2011-01-01
Unstable behavior of Boiling Water Reactors (BWRs) is known to occur during operation at certain power and flow conditions. Uncertainty calculations for BWR stability, based on the Wilks' formula, have been already done for the Ringhals-1 benchmark. In this work, these calculations have been used to identify and rank the most important parameters affecting the stability of the Ringhals-1 plant. The ranking has been done in two different ways and a comparison of these two methods has been demonstrated. Results show that the methods provide different, but meaningful evaluations of the ranking. (author)
Population models and simulation methods: The case of the Spearman rank correlation.
Astivia, Oscar L Olvera; Zumbo, Bruno D
2017-11-01
The purpose of this paper is to highlight the importance of a population model in guiding the design and interpretation of simulation studies used to investigate the Spearman rank correlation. The Spearman rank correlation has been known for over a hundred years to applied researchers and methodologists alike and is one of the most widely used non-parametric statistics. Still, certain misconceptions can be found, either explicitly or implicitly, in the published literature because a population definition for this statistic is rarely discussed within the social and behavioural sciences. By relying on copula distribution theory, a population model is presented for the Spearman rank correlation, and its properties are explored both theoretically and in a simulation study. Through the use of the Iman-Conover algorithm (which allows the user to specify the rank correlation as a population parameter), simulation studies from previously published articles are explored, and it is found that many of the conclusions purported in them regarding the nature of the Spearman correlation would change if the data-generation mechanism better matched the simulation design. More specifically, issues such as small sample bias and lack of power of the t-test and r-to-z Fisher transformation disappear when the rank correlation is calculated from data sampled where the rank correlation is the population parameter. A proof for the consistency of the sample estimate of the rank correlation is shown as well as the flexibility of the copula model to encompass results previously published in the mathematical literature. © 2017 The British Psychological Society.
A robust statistical method for association-based eQTL analysis.
Directory of Open Access Journals (Sweden)
Ning Jiang
Full Text Available It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS is statistical inference of linkage disequilibrium (LD between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.
A model-based approach to operational event groups ranking
Energy Technology Data Exchange (ETDEWEB)
Simic, Zdenko [European Commission Joint Research Centre, Petten (Netherlands). Inst. for Energy and Transport; Maqua, Michael [Gesellschaft fuer Anlagen- und Reaktorsicherheit mbH (GRS), Koeln (Germany); Wattrelos, Didier [Institut de Radioprotection et de Surete Nucleaire (IRSN), Fontenay-aux-Roses (France)
2014-04-15
The operational experience (OE) feedback provides improvements in all industrial activities. Identification of the most important and valuable groups of events within accumulated experience is important in order to focus on a detailed investigation of events. The paper describes the new ranking method and compares it with three others. Methods have been described and applied to OE events utilised by nuclear power plants in France and Germany for twenty years. The results show that different ranking methods only roughly agree on which of the event groups are the most important ones. In the new ranking method the analytical hierarchy process is applied in order to assure consistent and comprehensive weighting determination for ranking indexes. The proposed method allows a transparent and flexible event groups ranking and identification of the most important OE for further more detailed investigation in order to complete the feedback. (orig.)
Hybrid statistics-simulations based method for atom-counting from ADF STEM images
Energy Technology Data Exchange (ETDEWEB)
De wael, Annelies, E-mail: annelies.dewael@uantwerpen.be [Electron Microscopy for Materials Science (EMAT), University of Antwerp, Groenenborgerlaan 171, 2020 Antwerp (Belgium); De Backer, Annick [Electron Microscopy for Materials Science (EMAT), University of Antwerp, Groenenborgerlaan 171, 2020 Antwerp (Belgium); Jones, Lewys; Nellist, Peter D. [Department of Materials, University of Oxford, Parks Road, OX1 3PH Oxford (United Kingdom); Van Aert, Sandra, E-mail: sandra.vanaert@uantwerpen.be [Electron Microscopy for Materials Science (EMAT), University of Antwerp, Groenenborgerlaan 171, 2020 Antwerp (Belgium)
2017-06-15
A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. - Highlights: • A hybrid method for atom-counting from ADF STEM images is introduced. • Image simulations are incorporated into a statistical framework in a reliable manner. • Limits of the existing methods for atom-counting are far exceeded. • Reliable counting results from an experimental low dose image are obtained. • Progress towards reliable quantitative analysis of beam-sensitive materials is made.
AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs.
Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael
2017-06-15
Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
An Efficient Graph-based Method for Long-term Land-use Change Statistics
Directory of Open Access Journals (Sweden)
Yipeng Zhang
2015-12-01
Full Text Available Statistical analysis of land-use change plays an important role in sustainable land management and has received increasing attention from scholars and administrative departments. However, the statistical process involving spatial overlay analysis remains difficult and needs improvement to deal with mass land-use data. In this paper, we introduce a spatio-temporal flow network model to reveal the hidden relational information among spatio-temporal entities. Based on graph theory, the constant condition of saturated multi-commodity flow is derived. A new method based on a network partition technique of spatio-temporal flow network are proposed to optimize the transition statistical process. The effectiveness and efficiency of the proposed method is verified through experiments using land-use data in Hunan from 2009 to 2014. In the comparison among three different land-use change statistical methods, the proposed method exhibits remarkable superiority in efficiency.
Variants of the Borda count method for combining ranked classifier hypotheses
van Erp, Merijn; Schomaker, Lambert; Schomaker, Lambert; Vuurpijl, Louis
2000-01-01
The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier
Web-based tool for subjective observer ranking of compressed medical images
Langer, Steven G.; Stewart, Brent K.; Andrew, Rex K.
1999-05-01
In the course of evaluating various compression schemes for ultrasound teleradiology applications, it became obvious that paper based methods of data collection were time consuming and error prone. A method was sought which allowed participating radiologists to view the ultrasound video clips (compressed to varying degree) at their desks. Furthermore, the method should allow observers to enter their evaluations and when finished, automatically submit the data to our statistical analysis engine. We have found the World Wide Web offered a ready solution. A web page was constructed that contains 18 embedded AVI video clips. The 18 clips represent 6 distinct anatomical areas, compressed by various methods and amounts, and then randomly distributed through the web page. To the right of each video, a series of questions are presented which ask the observer to rank (1 - 5) his/her ability to answer diagnostically relevant questions. When completed, the observer presses 'Submit' and a file of tab delimited test is created which can then be imported to an Excel workbook. Kappa analysis is then performed and the resulting plots demonstrate observer preferences.
Multi-energy CT based on a prior rank, intensity and sparsity model (PRISM)
International Nuclear Information System (INIS)
Gao, Hao; Osher, Stanley; Yu, Hengyong; Wang, Ge
2011-01-01
We propose a compressive sensing approach for multi-energy computed tomography (CT), namely the prior rank, intensity and sparsity model (PRISM). To further compress the multi-energy image for allowing the reconstruction with fewer CT data and less radiation dose, the PRISM models a multi-energy image as the superposition of a low-rank matrix and a sparse matrix (with row dimension in space and column dimension in energy), where the low-rank matrix corresponds to the stationary background over energy that has a low matrix rank, and the sparse matrix represents the rest of distinct spectral features that are often sparse. Distinct from previous methods, the PRISM utilizes the generalized rank, e.g., the matrix rank of tight-frame transform of a multi-energy image, which offers a way to characterize the multi-level and multi-filtered image coherence across the energy spectrum. Besides, the energy-dependent intensity information can be incorporated into the PRISM in terms of the spectral curves for base materials, with which the restoration of the multi-energy image becomes the reconstruction of the energy-independent material composition matrix. In other words, the PRISM utilizes prior knowledge on the generalized rank and sparsity of a multi-energy image, and intensity/spectral characteristics of base materials. Furthermore, we develop an accurate and fast split Bregman method for the PRISM and demonstrate the superior performance of the PRISM relative to several competing methods in simulations. (papers)
An R package for analyzing and modeling ranking data.
Lee, Paul H; Yu, Philip L H
2013-05-14
In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought
Directory of Open Access Journals (Sweden)
P. Phani Bushan Rao
2011-01-01
Full Text Available Ranking fuzzy numbers are an important aspect of decision making in a fuzzy environment. Since their inception in 1965, many authors have proposed different methods for ranking fuzzy numbers. However, there is no method which gives a satisfactory result to all situations. Most of the methods proposed so far are nondiscriminating and counterintuitive. This paper proposes a new method for ranking fuzzy numbers based on the Circumcenter of Centroids and uses an index of optimism to reflect the decision maker's optimistic attitude and also an index of modality that represents the neutrality of the decision maker. This method ranks various types of fuzzy numbers which include normal, generalized trapezoidal, and triangular fuzzy numbers along with crisp numbers with the particularity that crisp numbers are to be considered particular cases of fuzzy numbers.
THE FLUORBOARD A STATISTICALLY BASED DASHBOARD METHOD FOR IMPROVING SAFETY
International Nuclear Information System (INIS)
PREVETTE, S.S.
2005-01-01
The FluorBoard is a statistically based dashboard method for improving safety. Fluor Hanford has achieved significant safety improvements--including more than a 80% reduction in OSHA cases per 200,000 hours, during its work at the US Department of Energy's Hanford Site in Washington state. The massive project on the former nuclear materials production site is considered one of the largest environmental cleanup projects in the world. Fluor Hanford's safety improvements were achieved by a committed partnering of workers, managers, and statistical methodology. Safety achievements at the site have been due to a systematic approach to safety. This includes excellent cooperation between the field workers, the safety professionals, and management through OSHA Voluntary Protection Program principles. Fluor corporate values are centered around safety, and safety excellence is important for every manager in every project. In addition, Fluor Hanford has utilized a rigorous approach to using its safety statistics, based upon Dr. Shewhart's control charts, and Dr. Deming's management and quality methods
Multi-Label Classiﬁcation Based on Low Rank Representation for Image Annotation
Directory of Open Access Journals (Sweden)
Qiaoyu Tan
2017-01-01
Full Text Available Annotating remote sensing images is a challenging task for its labor demanding annotation process and requirement of expert knowledge, especially when images can be annotated with multiple semantic concepts (or labels. To automatically annotate these multi-label images, we introduce an approach called Multi-Label Classification based on Low Rank Representation (MLC-LRR. MLC-LRR firstly utilizes low rank representation in the feature space of images to compute the low rank constrained coefficient matrix, then it adapts the coefficient matrix to define a feature-based graph and to capture the global relationships between images. Next, it utilizes low rank representation in the label space of labeled images to construct a semantic graph. Finally, these two graphs are exploited to train a graph-based multi-label classifier. To validate the performance of MLC-LRR against other related graph-based multi-label methods in annotating images, we conduct experiments on a public available multi-label remote sensing images (Land Cover. We perform additional experiments on five real-world multi-label image datasets to further investigate the performance of MLC-LRR. Empirical study demonstrates that MLC-LRR achieves better performance on annotating images than these comparing methods across various evaluation criteria; it also can effectively exploit global structure and label correlations of multi-label images.
Adaptive designs for the one-sample log-rank test.
Schmidt, Rene; Faldum, Andreas; Kwiecien, Robert
2017-09-22
Traditional designs in phase IIa cancer trials are single-arm designs with a binary outcome, for example, tumor response. In some settings, however, a time-to-event endpoint might appear more appropriate, particularly in the presence of loss to follow-up. Then the one-sample log-rank test might be the method of choice. It allows to compare the survival curve of the patients under treatment to a prespecified reference survival curve. The reference curve usually represents the expected survival under standard of the care. In this work, convergence of the one-sample log-rank statistic to Brownian motion is proven using Rebolledo's martingale central limit theorem while accounting for staggered entry times of the patients. On this basis, a confirmatory adaptive one-sample log-rank test is proposed where provision is made for data dependent sample size reassessment. The focus is to apply the inverse normal method. This is done in two different directions. The first strategy exploits the independent increments property of the one-sample log-rank statistic. The second strategy is based on the patient-wise separation principle. It is shown by simulation that the proposed adaptive test might help to rescue an underpowered trial and at the same time lowers the average sample number (ASN) under the null hypothesis as compared to a single-stage fixed sample design. © 2017, The International Biometric Society.
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Adaptive Game Level Creation through Rank-based Interactive Evolution
DEFF Research Database (Denmark)
Liapis, Antonios; Martínez, Héctor Pérez; Togelius, Julian
2013-01-01
as fitness functions for the optimization of the generated content. The preference models are built via ranking-based preference learning, while the content is generated via evolutionary search. The proposed method is evaluated on the creation of strategy game maps, and its performance is tested using...
Dominance-based ranking functions for interval-valued intuitionistic fuzzy sets.
Chen, Liang-Hsuan; Tu, Chien-Cheng
2014-08-01
The ranking of interval-valued intuitionistic fuzzy sets (IvIFSs) is difficult since they include the interval values of membership and nonmembership. This paper proposes ranking functions for IvIFSs based on the dominance concept. The proposed ranking functions consider the degree to which an IvIFS dominates and is not dominated by other IvIFSs. Based on the bivariate framework and the dominance concept, the functions incorporate not only the boundary values of membership and nonmembership, but also the relative relations among IvIFSs in comparisons. The dominance-based ranking functions include bipolar evaluations with a parameter that allows the decision-maker to reflect his actual attitude in allocating the various kinds of dominance. The relationship for two IvIFSs that satisfy the dual couple is defined based on four proposed ranking functions. Importantly, the proposed ranking functions can achieve a full ranking for all IvIFSs. Two examples are used to demonstrate the applicability and distinctiveness of the proposed ranking functions.
Doerr, Timothy; Alves, Gelio; Yu, Yi-Kuo
2006-03-01
Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time. This suggests a way to efficiently find approximate solutions - - find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the fininte number of high- ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks - - peptide sequencing using tandem mass spectrometry data.
Physics-based statistical model and simulation method of RF propagation in urban environments
Pao, Hsueh-Yuan; Dvorak, Steven L.
2010-09-14
A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.
The Extrapolation-Accelerated Multilevel Aggregation Method in PageRank Computation
Directory of Open Access Journals (Sweden)
Bing-Yuan Pu
2013-01-01
Full Text Available An accelerated multilevel aggregation method is presented for calculating the stationary probability vector of an irreducible stochastic matrix in PageRank computation, where the vector extrapolation method is its accelerator. We show how to periodically combine the extrapolation method together with the multilevel aggregation method on the finest level for speeding up the PageRank computation. Detailed numerical results are given to illustrate the behavior of this method, and comparisons with the typical methods are also made.
Generalized Reduced Rank Tests using the Singular Value Decomposition
F.R. Kleibergen (Frank); R. Paap (Richard)
2003-01-01
textabstractWe propose a novel statistic to test the rank of a matrix. The rank statistic overcomes deficiencies of existing rank statistics, like: necessity of a Kronecker covariance matrix for the canonical correlation rank statistic of Anderson (1951), sensitivity to the ordering of the variables
Eum, H. I.; Cannon, A. J.
2015-12-01
Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the
Generalized reduced rank tests using the singular value decomposition
Kleibergen, F.R.; Paap, R.
2002-01-01
We propose a novel statistic to test the rank of a matrix. The rank statistic overcomes deficiencies of existing rank statistics, like: necessity of a Kronecker covariance matrix for the canonical correlation rank statistic of Anderson (1951), sensitivity to the ordering of the variables for the LDU
DEFF Research Database (Denmark)
Schneider, Jesper Wiborg
2012-01-01
In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Rankin...
The application of non-parametric statistical method for an ALARA implementation
International Nuclear Information System (INIS)
Cho, Young Ho; Herr, Young Hoi
2003-01-01
The cost-effective reduction of Occupational Radiation Dose (ORD) at a nuclear power plant could not be achieved without going through an extensive analysis of accumulated ORD data of existing plants. Through the data analysis, it is required to identify what are the jobs of repetitive high ORD at the nuclear power plant. In this study, Percentile Rank Sum Method (PRSM) is proposed to identify repetitive high ORD jobs, which is based on non-parametric statistical theory. As a case study, the method is applied to ORD data of maintenance and repair jobs at Kori units 3 and 4 that are pressurized water reactors with 950 MWe capacity and have been operated since 1986 and 1987, respectively in Korea. The results was verified and validated, and PRSM has been demonstrated to be an efficient method of analyzing the data
Tomei, Krystal L; Nahass, Meghan M; Husain, Qasim; Agarwal, Nitin; Patel, Smruti K; Svider, Peter F; Eloy, Jean Anderson; Liu, James K
2014-07-01
The number of women pursuing training opportunities in neurological surgery has increased, although they are still underrepresented at senior positions relative to junior academic ranks. Research productivity is an important component of the academic advancement process. We sought to use the h-index, a bibliometric previously analyzed among neurological surgeons, to evaluate whether there are gender differences in academic rank and research productivity among academic neurological surgeons. The h-index was calculated for 1052 academic neurological surgeons from 84 institutions, and organized by gender and academic rank. Overall men had statistically higher research productivity (mean 13.3) than their female colleagues (mean 9.5), as measured by the h-index, in the overall sample (p0.05) in h-index at the assistant professor (mean 7.2 male, 6.3 female), associate professor (11.2 male, 10.8 female), and professor (20.0 male, 18.0 female) levels based on gender. There was insufficient data to determine significance at the chairperson rank, as there was only one female chairperson. Although overall gender differences in scholarly productivity were detected, these differences did not reach statistical significance upon controlling for academic rank. Women were grossly underrepresented at the level of chairpersons in this sample of 1052 academic neurological surgeons, likely a result of the low proportion of females in this specialty. Future studies may be needed to investigate gender-specific research trends for neurosurgical residents, a cohort that in recent years has seen increased representation by women. Copyright © 2013 Elsevier Ltd. All rights reserved.
Multiple graph regularized protein domain ranking.
Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin
2012-11-19
Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
A Ranking Approach to Genomic Selection.
Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori
2015-01-01
Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
rpsftm: An R Package for Rank Preserving Structural Failure Time Models.
Allison, Annabel; White, Ian R; Bond, Simon
2017-12-04
Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ , is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z ( ψ ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm.
Research of Subgraph Estimation Page Rank Algorithm for Web Page Rank
Directory of Open Access Journals (Sweden)
LI Lan-yin
2017-04-01
Full Text Available The traditional PageRank algorithm can not efficiently perform large data Webpage scheduling problem. This paper proposes an accelerated algorithm named topK-Rank，which is based on PageRank on the MapReduce platform. It can find top k nodes efficiently for a given graph without sacrificing accuracy. In order to identify top k nodes，topK-Rank algorithm prunes unnecessary nodes and edges in each iteration to dynamically construct subgraphs，and iteratively estimates lower/upper bounds of PageRank scores through subgraphs. Theoretical analysis shows that this method guarantees result exactness. Experiments show that topK-Rank algorithm can find k nodes much faster than the existing approaches.
Multiple graph regularized protein domain ranking
Wang, Jim Jing-Yan
2012-11-19
Background: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
Multiple graph regularized protein domain ranking
Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin
2012-01-01
Background: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
Multiple graph regularized protein domain ranking
Directory of Open Access Journals (Sweden)
Wang Jim
2012-11-01
Full Text Available Abstract Background Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. Results To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. Conclusion The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
Rank Dynamics of Word Usage at Multiple Scales
Directory of Open Access Journals (Sweden)
José A. Morales
2018-05-01
Full Text Available The recent dramatic increase in online data availability has allowed researchers to explore human culture with unprecedented detail, such as the growth and diversification of language. In particular, it provides statistical tools to explore whether word use is similar across languages, and if so, whether these generic features appear at different scales of language structure. Here we use the Google Books N-grams dataset to analyze the temporal evolution of word usage in several languages. We apply measures proposed recently to study rank dynamics, such as the diversity of N-grams in a given rank, the probability that an N-gram changes rank between successive time intervals, the rank entropy, and the rank complexity. Using different methods, results show that there are generic properties for different languages at different scales, such as a core of words necessary to minimally understand a language. We also propose a null model to explore the relevance of linguistic structure across multiple scales, concluding that N-gram statistics cannot be reduced to word statistics. We expect our results to be useful in improving text prediction algorithms, as well as in shedding light on the large-scale features of language use, beyond linguistic and cultural differences across human populations.
Ranking Support Vector Machine with Kernel Approximation.
Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Ranking Support Vector Machine with Kernel Approximation
Directory of Open Access Journals (Sweden)
Kai Chen
2017-01-01
Full Text Available Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels can give higher accuracy than linear RankSVM (RankSVM with a linear kernel for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Monte Carlo methods in PageRank computation: When one iteration is sufficient
Avrachenkov, K.; Litvak, Nelli; Nemirovsky, D.; Osipova, N.
2005-01-01
PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer and thus it reflects the popularity of a Web page. Google computes the PageRank using the power iteration method which requires
Monte Carlo methods in PageRank computation: When one iteration is sufficient
Avrachenkov, K.; Litvak, Nelli; Nemirovsky, D.; Osipova, N.
PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer, and thus it reflects the popularity of a Web page. Google computes the PageRank using the power iteration method, which requires
Ma, Xu; Cheng, Yongmei; Hao, Shuai
2016-12-10
Automatic classification of terrain surfaces from an aerial image is essential for an autonomous unmanned aerial vehicle (UAV) landing at an unprepared site by using vision. Diverse terrain surfaces may show similar spectral properties due to the illumination and noise that easily cause poor classification performance. To address this issue, a multi-stage classification algorithm based on low-rank recovery and multi-feature fusion sparse representation is proposed. First, color moments and Gabor texture feature are extracted from training data and stacked as column vectors of a dictionary. Then we perform low-rank matrix recovery for the dictionary by using augmented Lagrange multipliers and construct a multi-stage terrain classifier. Experimental results on an aerial map database that we prepared verify the classification accuracy and robustness of the proposed method.
Introducing trimming and function ranking to Solid Works based on function analysis
Chechurin, Leonid S.; Wits, Wessel Willems; Bakker, Hans M.; Cascini, G.; Vaneker, Thomas H.J.
2011-01-01
TRIZ based Function Analysis models existing products based on functional interactions between product parts. Such a function model description is the ideal starting point for product innovation. Design engineers can apply (TRIZ) methods such as trimming and function ranking to this function model
Introducing Trimming and Function Ranking to SolidWorks based on Function Analysis
Chechurin, L.S.; Wits, Wessel Willems; Bakker, Hans M.; Vaneker, Thomas H.J.
2015-01-01
TRIZ based Function Analysis models existing products based on functional interactions between product parts. Such a function model description is the ideal starting point for product innovation. Design engineers can apply (TRIZ) methods such as trimming and function ranking to this function model
Caster, Ola; Juhlin, Kristina; Watson, Sarah; Norén, G Niklas
2014-08-01
Detection of unknown risks with marketed medicines is key to securing the optimal care of individual patients and to reducing the societal burden from adverse drug reactions. Large collections of individual case reports remain the primary source of information and require effective analytics to guide clinical assessors towards likely drug safety signals. Disproportionality analysis is based solely on aggregate numbers of reports and naively disregards report quality and content. However, these latter features are the very fundament of the ensuing clinical assessment. Our objective was to develop and evaluate a data-driven screening algorithm for emerging drug safety signals that accounts for report quality and content. vigiRank is a predictive model for emerging safety signals, here implemented with shrinkage logistic regression to identify predictive variables and estimate their respective contributions. The variables considered for inclusion capture different aspects of strength of evidence, including quality and clinical content of individual reports, as well as trends in time and geographic spread. A reference set of 264 positive controls (historical safety signals from 2003 to 2007) and 5,280 negative controls (pairs of drugs and adverse events not listed in the Summary of Product Characteristics of that drug in 2012) was used for model fitting and evaluation; the latter used fivefold cross-validation to protect against over-fitting. All analyses were performed on a reconstructed version of VigiBase(®) as of 31 December 2004, at around which time most safety signals in our reference set were emerging. The following aspects of strength of evidence were selected for inclusion into vigiRank: the numbers of informative and recent reports, respectively; disproportional reporting; the number of reports with free-text descriptions of the case; and the geographic spread of reporting. vigiRank offered a statistically significant improvement in area under the receiver
Pathway Relevance Ranking for Tumor Samples through Network-Based Data Integration.
Directory of Open Access Journals (Sweden)
Lieven P C Verbeke
Full Text Available The study of cancer, a highly heterogeneous disease with different causes and clinical outcomes, requires a multi-angle approach and the collection of large multi-omics datasets that, ideally, should be analyzed simultaneously. We present a new pathway relevance ranking method that is able to prioritize pathways according to the information contained in any combination of tumor related omics datasets. Key to the method is the conversion of all available data into a single comprehensive network representation containing not only genes but also individual patient samples. Additionally, all data are linked through a network of previously identified molecular interactions. We demonstrate the performance of the new method by applying it to breast and ovarian cancer datasets from The Cancer Genome Atlas. By integrating gene expression, copy number, mutation and methylation data, the method's potential to identify key pathways involved in breast cancer development shared by different molecular subtypes is illustrated. Interestingly, certain pathways were ranked equally important for different subtypes, even when the underlying (epi-genetic disturbances were diverse. Next to prioritizing universally high-scoring pathways, the pathway ranking method was able to identify subtype-specific pathways. Often the score of a pathway could not be motivated by a single mutation, copy number or methylation alteration, but rather by a combination of genetic and epi-genetic disturbances, stressing the need for a network-based data integration approach. The analysis of ovarian tumors, as a function of survival-based subtypes, demonstrated the method's ability to correctly identify key pathways, irrespective of tumor subtype. A differential analysis of survival-based subtypes revealed several pathways with higher importance for the bad-outcome patient group than for the good-outcome patient group. Many of the pathways exhibiting higher importance for the bad
Statistical inference methods for two crossing survival curves: a comparison of methods.
Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng
2015-01-01
A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.
The Typicality Ranking Task: A New Method to Derive Typicality Judgments from Children
Ameel, Eef; Storms, Gert
2016-01-01
An alternative method for deriving typicality judgments, applicable in young children that are not familiar with numerical values yet, is introduced, allowing researchers to study gradedness at younger ages in concept development. Contrary to the long tradition of using rating-based procedures to derive typicality judgments, we propose a method that is based on typicality ranking rather than rating, in which items are gradually sorted according to their typicality, and that requires a minimum of linguistic knowledge. The validity of the method is investigated and the method is compared to the traditional typicality rating measurement in a large empirical study with eight different semantic concepts. The results show that the typicality ranking task can be used to assess children’s category knowledge and to evaluate how this knowledge evolves over time. Contrary to earlier held assumptions in studies on typicality in young children, our results also show that preference is not so much a confounding variable to be avoided, but that both variables are often significantly correlated in older children and even in adults. PMID:27322371
Heimann, G; Neuhaus, G
1998-03-01
In the random censorship model, the log-rank test is often used for comparing a control group with different dose groups. If the number of tumors is small, so-called exact methods are often applied for computing critical values from a permutational distribution. Two of these exact methods are discussed and shown to be incorrect. The correct permutational distribution is derived and studied with respect to its behavior under unequal censoring in the light of recent results proving that the permutational version and the unconditional version of the log-rank test are asymptotically equivalent even under unequal censoring. The log-rank test is studied by simulations of a realistic scenario from a bioassay with small numbers of tumors.
Ren, W. X.; Lin, Y. Q.; Fang, S. E.
2011-11-01
One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.
Enhancements to Graph based methods for Multi Document Summarization
Directory of Open Access Journals (Sweden)
Rengaramanujam Srinivasan
2009-01-01
Full Text Available This paper focuses its attention on extractivesummarization using popular graph based approaches. Graphbased methods can be broadly classified into two categories:non- PageRank type and PageRank type methods. Of themethods already proposed - the Centrality Degree methodbelongs to the former category while LexRank and ContinuousLexRank methods belong to later category. The paper goes on tosuggest two enhancements to both PageRank type and non-PageRank type methods. The first modification is that ofrecursively discounting the selected sentences, i.e. if a sentence isselected it is removed from further consideration and the nextsentence is selected based upon the contributions of theremaining sentences only. Next the paper suggests a method ofincorporating position weight to these schemes. In all 14methods –six of non- PageRank type and eight of PageRanktype have been investigated. To clearly distinguish betweenvarious schemes, we call the methods of incorporatingdiscounting and position weight enhancements over LexicalRank schemes as Sentence Rank (SR methods. Intrinsicevaluation of all the 14 graph based methods were done usingconventional Precision metric and metrics earlier proposed byus - Effectiveness1 (E1 and Effectiveness2 (E2. Experimentalstudy brings out that the proposed SR methods are superior toall the other methods.
A Citation-Based Ranking of Strategic Management Journals
Azar, Ofer H.; Brock, David M.
2007-01-01
Rankings of strategy journals are important for authors, readers, and promotion and tenure committees. We present several rankings, based either on the number of articles that cited the journal or the per-article impact. Our analyses cover various periods between 1991 and 2006, for most of which the Strategic Management Journal was in first place and Journal of Economics & Management Strategy (JEMS) second, although JEMS ranked first in certain instances. Long Range Planning and Technology An...
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most
Jackknife Variance Estimator for Two Sample Linear Rank Statistics
1988-11-01
Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT
Sparse/Low Rank Constrained Reconstruction for Dynamic PET Imaging.
Directory of Open Access Journals (Sweden)
Xingjian Yu
Full Text Available In dynamic Positron Emission Tomography (PET, an estimate of the radio activity concentration is obtained from a series of frames of sinogram data taken at ranging in duration from 10 seconds to minutes under some criteria. So far, all the well-known reconstruction algorithms require known data statistical properties. It limits the speed of data acquisition, besides, it is unable to afford the separated information about the structure and the variation of shape and rate of metabolism which play a major role in improving the visualization of contrast for some requirement of the diagnosing in application. This paper presents a novel low rank-based activity map reconstruction scheme from emission sinograms of dynamic PET, termed as SLCR representing Sparse/Low Rank Constrained Reconstruction for Dynamic PET Imaging. In this method, the stationary background is formulated as a low rank component while variations between successive frames are abstracted to the sparse. The resulting nuclear norm and l1 norm related minimization problem can also be efficiently solved by many recently developed numerical methods. In this paper, the linearized alternating direction method is applied. The effectiveness of the proposed scheme is illustrated on three data sets.
A nonparametric spatial scan statistic for continuous data.
Jung, Inkyung; Cho, Ho Jin
2015-10-20
Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Ranking of bank branches with undesirable and fuzzy data: A DEA-based approach
Directory of Open Access Journals (Sweden)
Sohrab Kordrostami
2016-07-01
Full Text Available Banks are one of the most important financial sectors in order to the economic development of each country. Certainly, efficiency scores and ranks of banks are significant and effective aspects towards future planning. Sometimes the performance of banks must be measured in the presence of undesirable and vague factors. For these reasons in the current paper a procedure based on data envelopment analysis (DEA is introduced for evaluating the efficiency and complete ranking of decision making units (DMUs where undesirable and fuzzy measures exist. To illustrate, in the presence of undesirable and fuzzy measures, DMUs are evaluated by using a fuzzy expected value approach and DMUs with similar efficiency scores are ranked by using constraints and the Maximal Balance Index based on the optimal shadow prices. Afterwards, the efficiency scores of 25 branches of an Iranian commercial bank are evaluated using the proposed method. Also, a complete ranking of bank branches is presented to discriminate branches.
Hybrid perturbation methods based on statistical time series models
San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario
2016-04-01
In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.
Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel
2017-07-01
Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.
Analysis of high-throughput biological data using their rank values.
Dembélé, Doulaye
2018-01-01
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
PageRank for low frequency earthquake detection
Aguiar, A. C.; Beroza, G. C.
2013-12-01
We have analyzed Hi-Net seismic waveform data during the April 2006 tremor episode in the Nankai Trough in SW Japan using the autocorrelation approach of Brown et al. (2008), which detects low frequency earthquakes (LFEs) based on pair-wise waveform matching. We have generalized this to exploit the fact that waveforms may repeat multiple times, on more than just a pair-wise basis. We are working towards developing a sound statistical basis for event detection, but that is complicated by two factors. First, the statistical behavior of the autocorrelations varies between stations. Analyzing one station at a time assures that the detection threshold will only depend on the station being analyzed. Second, the positive detections do not satisfy "closure." That is, if window A correlates with window B, and window B correlates with window C, then window A and window C do not necessarily correlate with one another. We want to evaluate whether or not a linked set of windows are correlated due to chance. To do this, we map our problem on to one that has previously been solved for web search, and apply Google's PageRank algorithm. PageRank is the probability of a 'random surfer' to visit a particular web page; it assigns a ranking for a webpage based on the amount of links associated with that page. For windows of seismic data instead of webpages, the windows with high probabilities suggest likely LFE signals. Once identified, we stack the matched windows to improve the snr and use these stacks as template signals to find other LFEs within continuous data. We compare the results among stations and declare a detection if they are found in a statistically significant number of stations, based on multinomial statistics. We compare our detections using the single-station method to detections found by Shelly et al. (2007) for the April 2006 tremor sequence in Shikoku, Japan. We find strong similarity between the results, as well as many new detections that were not found using
Ranking metrics in gene set enrichment analysis: do they matter?
Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna
2017-05-12
There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner
González-Galván, María del Carmen; Mosqueda-Taylor, Adalberto; Bologna-Molina, Ronell; Setien-Olarra, Amaia; Marichalar-Mendia, Xabier; Aguirre-Urizar, José-Manuel
2018-01-01
Background Odontogenic myxoma (OM) is a benign intraosseous neoplasm that exhibits local aggressiveness and high recurrence rates. Osteoclastogenesis is an important phenomenon in the tumor growth of maxillary neoplasms. RANK (Receptor Activator of Nuclear Factor κappa B) is the signaling receptor of RANK-L (Receptor activator of nuclear factor kappa-Β ligand) that activates the osteoclasts. OPG (osteoprotegerin) is a decoy receptor for RANK-L that inhibits pro-osteoclastogenesis. The RANK / RANKL / OPG system participates in the regulation of osteolytic activity under normal conditions, and its alteration has been associated with greater bone destruction, and also with tumor growth. Objectives To analyze the immunohistochemical expression of OPG, RANK and RANK-L proteins in odontogenic myxomas (OMs) and their relationship with the tumor size. Material and Methods Eighteen OMs, 4 small ( 3cm) and 18 dental follicles (DF) that were included as control were studied by means of standard immunohistochemical procedure with RANK, RANKL and OPG antibodies. For the evaluation, 5 fields (40x) of representative areas of OM and DF were selected where the expression of each antibody was determined. Descriptive and comparative statistical analyses were performed with the obtained data. Results There are significant differences in the expression of RANK in OM samples as compared to DF (p = 0.022) and among the OMSs and OMLs (p = 0.032). Also a strong association is recognized in the expression of RANK-L and OPG in OM samples. Conclusions Activation of the RANK / RANK-L / OPG triad seems to be involved in the mechanisms of bone balance and destruction, as well as associated with tumor growth in odontogenic myxomas. Key words:Odontogenic myxoma, dental follicle, RANK, RANK-L, OPG, osteoclastogenesis. PMID:29680857
Statistical methods in quality assurance
International Nuclear Information System (INIS)
Eckhard, W.
1980-01-01
During the different phases of a production process - planning, development and design, manufacturing, assembling, etc. - most of the decision rests on a base of statistics, the collection, analysis and interpretation of data. Statistical methods can be thought of as a kit of tools to help to solve problems in the quality functions of the quality loop with respect to produce quality products and to reduce quality costs. Various statistical methods are represented, typical examples for their practical application are demonstrated. (RW)
International Nuclear Information System (INIS)
Gong, Wenyin; Cai, Zhihua
2013-01-01
Parameter identification of PEM (proton exchange membrane) fuel cell model is a very active area of research. Generally, it can be treated as a numerical optimization problem with complex nonlinear and multi-variable features. DE (differential evolution), which has been successfully used in various fields, is a simple yet efficient evolutionary algorithm for global numerical optimization. In this paper, with the objective of accelerating the process of parameter identification of PEM fuel cell models and reducing the necessary computational efforts, we firstly present a generic and simple ranking-based mutation operator for the DE algorithm. Then, the ranking-based mutation operator is incorporated into five highly-competitive DE variants to solve the PEM fuel cell model parameter identification problems. The main contributions of this work are the proposed ranking-based DE variants and their application to the parameter identification problems of PEM fuel cell models. Experiments have been conducted by using both the simulated voltage–current data and the data obtained from the literature to validate the performance of our approach. The results indicate that the ranking-based DE methods provide better results with respect to the solution quality, the convergence rate, and the success rate compared with their corresponding original DE methods. In addition, the voltage–current characteristics obtained by our approach are in good agreement with the original voltage–current curves in all cases. - Highlights: • A simple and generic ranking-based mutation operator is presented in this paper. • Several DE (differential evolution) variants are used to solve the parameter identification of PEMFC (proton exchange membrane fuel cells) model. • Results show that our method accelerates the process of parameter identification. • The V–I characteristics are in very good agreement with experimental data
Use of the dry-weight-rank method of botanical analysis in the ...
African Journals Online (AJOL)
The dry-weight-rank method of botanical analysis was tested in the highveld of the Eastern Transvaal and was found to be an efficient and accurate means of determining the botanical composition of veld herbage. Accuracy was increased by weighting ranks on the basis of quadrat yield, and by allocation of equal ranks to ...
Khan, Haseeb Ahmad
2004-01-01
The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.
Ranking Tehran’s Stock Exchange Top Fifty Stocks Using Fundamental Indexes and Fuzzy TOPSIS
Directory of Open Access Journals (Sweden)
E. S. Saleh
2017-08-01
Full Text Available Investment through the purchase of securities, constitute an important part of countries economic exchange. Therefore, making decisions about investing in a particular stock has become one of the most controversial areas of economic and financial research and various institutions have began to rank companies stock and determine priorities of stock purchase to investment. The current research, with the determination of important required indexes for companies ranking based on their shares value on the Tehran stock exchange, can greatly help to the accurate ranking of fifty premier listed companies. Initial ranking indicators are extracted and then a decision-making group (exchange experts with the use of the Delphi method and also non-parametric statistic methods, determines the final indexes. Then, by using Fuzzy ANP, weight criteria are obtained with taking into account their interaction with each other. Finally, using fuzzy TOPSIS and information extraction about the premier fifty listed companies of Tehran stock exchange in 2014 are ranked with the software "Rahavard Novin”. Sensitivity analysis to criteria weight and relevant analysis presentation was conducted at the end of the study procedures.
Comparison of Statistical Post-Processing Methods for Probabilistic Wind Speed Forecasting
Han, Keunhee; Choi, JunTae; Kim, Chansoo
2018-02-01
In this study, the statistical post-processing methods that include bias-corrected and probabilistic forecasts of wind speed measured in PyeongChang, which is scheduled to host the 2018 Winter Olympics, are compared and analyzed to provide more accurate weather information. The six post-processing methods used in this study are as follows: mean bias-corrected forecast, mean and variance bias-corrected forecast, decaying averaging forecast, mean absolute bias-corrected forecast, and the alternative implementations of ensemble model output statistics (EMOS) and Bayesian model averaging (BMA) models, which are EMOS and BMA exchangeable models by assuming exchangeable ensemble members and simplified version of EMOS and BMA models. Observations for wind speed were obtained from the 26 stations in PyeongChang and 51 ensemble member forecasts derived from the European Centre for Medium-Range Weather Forecasts (ECMWF Directorate, 2012) that were obtained between 1 May 2013 and 18 March 2016. Prior to applying the post-processing methods, reliability analysis was conducted by using rank histograms to identify the statistical consistency of ensemble forecast and corresponding observations. Based on the results of our study, we found that the prediction skills of probabilistic forecasts of EMOS and BMA models were superior to the biascorrected forecasts in terms of deterministic prediction, whereas in probabilistic prediction, BMA models showed better prediction skill than EMOS. Even though the simplified version of BMA model exhibited best prediction skill among the mentioned six methods, the results showed that the differences of prediction skills between the versions of EMOS and BMA were negligible.
Andries, Jan P M; Vander Heyden, Yvan; Buydens, Lutgarde M C
2011-10-31
The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain
Directory of Open Access Journals (Sweden)
Zeeshan Ali Siddiqui
2016-01-01
Full Text Available Component-based software system (CBSS development technique is an emerging discipline that promises to take software development into a new era. As hardware systems are presently being constructed from kits of parts, software systems may also be assembled from components. It is more reliable to reuse software than to create. It is the glue code and individual components reliability that contribute to the reliability of the overall system. Every component contributes to overall system reliability according to the number of times it is being used, some components are of critical usage, known as usage frequency of component. The usage frequency decides the weight of each component. According to their weights, each component contributes to the overall reliability of the system. Therefore, ranking of components may be obtained by analyzing their reliability impacts on overall application. In this paper, we propose the application of fuzzy multi-objective optimization on the basis of ratio analysis, Fuzzy-MOORA. The method helps us find the best suitable alternative, software component, from a set of available feasible alternatives named software components. It is an accurate and easy to understand tool for solving multi-criteria decision making problems that have imprecise and vague evaluation data. By the use of ratio analysis, the proposed method determines the most suitable alternative among all possible alternatives, and dimensionless measurement will realize the job of ranking of components for estimating CBSS reliability in a non-subjective way. Finally, three case studies are shown to illustrate the use of the proposed technique.
International Nuclear Information System (INIS)
McColl, S.; Gower, S.; Hicks, J.; Shortreed, J.; Craig, L.
2004-01-01
This paper presents the concept and methodologies behind the development of a health effects priority ranking tool for the reduction of air emissions from oil refineries. The Health Effects Indicators Decision Index- Versions 2 (Heidi II) was designed to assist policy makers in prioritizing air emissions reductions on the basis of estimated risk to human health. Inputs include facility level rankings of potential health impacts associated with carcinogenic air toxics, non-carcinogenic air toxics and criteria air contaminants for each of the 20 refineries in Canada. Rankings of estimated health impacts are presented on predicted incidence of health effects. Heidi II considers site-specific annual pollutant emission data, ambient air concentrations associated with releases and concentration response functions for various types of health effects. Additional data includes location specific background air concentrations, site-specific population densities, and the baseline incidence of different health effects endpoints, such as cancer, non-cancer illnesses and cardiorespiratory illnesses and death. Air pollutants include the 29 air toxics reported annually in Environment Canada's National Pollutant Release Inventory. Three health impact ranking outputs are provided for each facility: ranking of pollutants based on predicted number of annual cases of health effects; ranking of pollutants based on simplified Disability Adjusted Life Years (DALYs); and ranking of pollutants based on more complex DALYs that consider types of cancer, systemic disease or types of cardiopulmonary health effects. Rankings rely on rough statistical estimates of predicted incidence rates for health endpoints. The models used to calculate rankings can provide useful guidance by comparing estimated health impacts. Heidi II has demonstrated that it is possible to develop a consistent and objective approach for ranking priority reductions of air emissions. Heidi II requires numerous types and
Semiparametric Gaussian copula models : Geometry and efficient rank-based estimation
Segers, J.; van den Akker, R.; Werker, B.J.M.
2014-01-01
We propose, for multivariate Gaussian copula models with unknown margins and structured correlation matrices, a rank-based, semiparametrically efficient estimator for the Euclidean copula parameter. This estimator is defined as a one-step update of a rank-based pilot estimator in the direction of
Monte Carlo based statistical power analysis for mediation models: methods and software.
Zhang, Zhiyong
2014-12-01
The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
Karlitasari, L.; Suhartini, D.; Nurrosikawati, L.
2018-03-01
Selection of Student Achievement is conducted every year, starting from the level of Study Program, Faculty, to University, which then rank one will be sent to Kopertis level. The criteria made for the selection are Academic and Rich Scientific, Organizational, Personality, and English. In order for the selection of Student Achievement is Objective, then in addition to the presence of the jury is expected to use methods that support the decision to be more optimal in determining the Student Achievement. One method used is the Promethee Method. Preference Ranking Organization Method for Enrichment Evaluation (Promethee) is a method of ranking in Multi Criteria Decision Making (MCDM). PROMETHEE has the advantage that there is a preference type against the criteria that can take into account alternatives with other alternatives on the same criteria. The conjecture of alternate dominance over a criterion used in PROMETHEE is the use of values in the relationships between alternative ranking values. Based on the calculation result, from 7 applicants between Manual and Promethee Matrices, rank 1, 2, and 3, did not change, only 4 to 7 positions were changed. However, after the sensitivity test, almost all criteria experience a high level of sensitivity. Although it does not affect the students who will be sent to the next level, but can bring psychological impact on prospective student’s achievement
A network-based dynamical ranking system for competitive sports
Motegi, Shun; Masuda, Naoki
2012-12-01
From the viewpoint of networks, a ranking system for players or teams in sports is equivalent to a centrality measure for sports networks, whereby a directed link represents the result of a single game. Previously proposed network-based ranking systems are derived from static networks, i.e., aggregation of the results of games over time. However, the score of a player (or team) fluctuates over time. Defeating a renowned player in the peak performance is intuitively more rewarding than defeating the same player in other periods. To account for this factor, we propose a dynamic variant of such a network-based ranking system and apply it to professional men's tennis data. We derive a set of linear online update equations for the score of each player. The proposed ranking system predicts the outcome of the future games with a higher accuracy than the static counterparts.
Szulc, Stefan
1965-01-01
Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then
Nanotechnology strength indicators: international rankings based on US patents
Marinova, Dora; McAleer, Michael
2003-01-01
Technological strength indicators (TSIs) based on patent statistics for 1975-2000 are used to analyse patenting of nanotechnology in the USA, and to compile international rankings for the top 12 foreign patenting countries (namely Australia, Canada, France, Germany, Great Britain, Italy, Japan, Korea, the Netherlands, Sweden, Switzerland and Taiwan). As the indicators are not directly observable, various proxy variables are used, namely the technological specialization index for national priorities, patent shares for international presence, citation rate for the contribution of patents to knowledge development and rate of assigned patents for potential commercial benefits. The best performing country is France, followed by Japan and Canada. It is shown that expertise and strength in nanotechnology are not evenly distributed among the technologically advanced countries, with the TSIs revealing different emphases in the development of nanotechnology.
Sensitivity ranking for freshwater invertebrates towards hydrocarbon contaminants.
Gerner, Nadine V; Cailleaud, Kevin; Bassères, Anne; Liess, Matthias; Beketov, Mikhail A
2017-11-01
Hydrocarbons have an utmost economical importance but may also cause substantial ecological impacts due to accidents or inadequate transportation and use. Currently, freshwater biomonitoring methods lack an indicator that can unequivocally reflect the impacts caused by hydrocarbons while being independent from effects of other stressors. The aim of the present study was to develop a sensitivity ranking for freshwater invertebrates towards hydrocarbon contaminants, which can be used in hydrocarbon-specific bioindicators. We employed the Relative Sensitivity method and developed the sensitivity ranking S hydrocarbons based on literature ecotoxicological data supplemented with rapid and mesocosm test results. A first validation of the sensitivity ranking based on an earlier field study has been conducted and revealed the S hydrocarbons ranking to be promising for application in sensitivity based indicators. Thus, the first results indicate that the ranking can serve as the core component of future hydrocarbon-specific and sensitivity trait based bioindicators.
Measuring streetscape complexity based on the statistics of local contrast and spatial frequency.
Directory of Open Access Journals (Sweden)
André Cavalcante
Full Text Available Streetscapes are basic urban elements which play a major role in the livability of a city. The visual complexity of streetscapes is known to influence how people behave in such built spaces. However, how and which characteristics of a visual scene influence our perception of complexity have yet to be fully understood. This study proposes a method to evaluate the complexity perceived in streetscapes based on the statistics of local contrast and spatial frequency. Here, 74 streetscape images from four cities, including daytime and nighttime scenes, were ranked for complexity by 40 participants. Image processing was then used to locally segment contrast and spatial frequency in the streetscapes. The statistics of these characteristics were extracted and later combined to form a single objective measure. The direct use of statistics revealed structural or morphological patterns in streetscapes related to the perception of complexity. Furthermore, in comparison to conventional measures of visual complexity, the proposed objective measure exhibits a higher correlation with the opinion of the participants. Also, the performance of this method is more robust regarding different time scenarios.
Time evolution of Wikipedia network ranking
Eom, Young-Ho; Frahm, Klaus M.; Benczúr, András; Shepelyansky, Dima L.
2013-12-01
We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003-2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007-2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80% of top universities of Shanghai ranking during the considered time period.
Statistical modelling of citation exchange between statistics journals.
Varin, Cristiano; Cattelan, Manuela; Firth, David
2016-01-01
Rankings of scholarly journals based on citation data are often met with scepticism by the scientific community. Part of the scepticism is due to disparity between the common perception of journals' prestige and their ranking based on citation counts. A more serious concern is the inappropriate use of journal rankings to evaluate the scientific influence of researchers. The paper focuses on analysis of the table of cross-citations among a selection of statistics journals. Data are collected from the Web of Science database published by Thomson Reuters. Our results suggest that modelling the exchange of citations between journals is useful to highlight the most prestigious journals, but also that journal citation data are characterized by considerable heterogeneity, which needs to be properly summarized. Inferential conclusions require care to avoid potential overinterpretation of insignificant differences between journal ratings. Comparison with published ratings of institutions from the UK's research assessment exercise shows strong correlation at aggregate level between assessed research quality and journal citation 'export scores' within the discipline of statistics.
Directory of Open Access Journals (Sweden)
Prasenjit Chatterjee
2012-04-01
Full Text Available Evaluation of proper supplier for manufacturing organizations is one of the most challenging problems in real time manufacturing environment due to a wide variety of customer demands. It has become more and more complicated to meet the challenges of international competitiveness and as the decision makers need to assess a wide range of alternative suppliers based on a set of conflicting criteria. Thus, the main objective of supplier selection is to select highly potential supplier through which all the set goals regarding the purchasing and manufacturing activity can be achieved. Because of these reasons, supplier selection has got considerable attention by the academicians and researchers. This paper presents a combined multi-criteria decision making methodology for supplier evaluation for given industrial applications. The proposed methodology is based on a compromise ranking method combined with Grey Interval Numbers considering different cardinal and ordinal criteria and their relative importance. A ‘supplier selection index’ is also proposed to help evaluation and ranking the alternative suppliers. Two examples are illustrated to demonstrate the potentiality and applicability of the proposed method.
DEA ranking of municipalities of the Republic of Serbia based on efficiency of SMEs in agribusiness
Directory of Open Access Journals (Sweden)
Maletić Radojka
2015-01-01
Full Text Available The most important aspect of any business is efficiency. The goal is to achieve a greater output results using less inputs, i.e. to maximize the use of available inputs. Numerous mathematical and statistical procedures, such as DEA technique (Data Envelopment Analysis, take an important place in the process of the effective management of the company and its business activities. This paper illustrated the application of DEA technique in assessing the business efficiency of SMEs in agribusiness in Vojvodina Measuring the efficiency of business operations of SMEs is based on the values of the following indicators: fixed assets, working capital, number of companies, number of employees, total income, profit and loss. The data used to calculate the values of indicators of business efficiency were obtained from the Statistical Office of the Republic of Serbia, based on the annual accounts of SMEs in agribusiness for four-year average (2008-2011. The aim of this paper is statistical assessment of business efficiency of SMEs in agribusiness using DEA technique, and then, based on the results obtained, to perform the ranking of Vojvodina municipalities in which observed SMEs were located, and finally, based on 4 models, to show sensitivity of DEA technique compared to different combination of input / output indicators, so therefore, caution is needed when this method is used. If the combination of parameters in the model is better, the results are more realistic, since if a key parameter is omitted, wrong decisions could be made.
Pearson's chi-square test and rank correlation inferences for clustered data.
Shih, Joanna H; Fay, Michael P
2017-09-01
Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
A Statistic-Based Calibration Method for TIADC System
Directory of Open Access Journals (Sweden)
Kuojun Yang
2015-01-01
Full Text Available Time-interleaved technique is widely used to increase the sampling rate of analog-to-digital converter (ADC. However, the channel mismatches degrade the performance of time-interleaved ADC (TIADC. Therefore, a statistic-based calibration method for TIADC is proposed in this paper. The average value of sampling points is utilized to calculate offset error, and the summation of sampling points is used to calculate gain error. After offset and gain error are obtained, they are calibrated by offset and gain adjustment elements in ADC. Timing skew is calibrated by an iterative method. The product of sampling points of two adjacent subchannels is used as a metric for calibration. The proposed method is employed to calibrate mismatches in a four-channel 5 GS/s TIADC system. Simulation results show that the proposed method can estimate mismatches accurately in a wide frequency range. It is also proved that an accurate estimation can be obtained even if the signal noise ratio (SNR of input signal is 20 dB. Furthermore, the results obtained from a real four-channel 5 GS/s TIADC system demonstrate the effectiveness of the proposed method. We can see that the spectra spurs due to mismatches have been effectively eliminated after calibration.
Nahar, Jannatun; Johnson, Fiona; Sharma, Ashish
2017-07-01
Use of General Circulation Model (GCM) precipitation and evapotranspiration sequences for hydrologic modelling can result in unrealistic simulations due to the coarse scales at which GCMs operate and the systematic biases they contain. The Bias Correction Spatial Disaggregation (BCSD) method is a popular statistical downscaling and bias correction method developed to address this issue. The advantage of BCSD is its ability to reduce biases in the distribution of precipitation totals at the GCM scale and then introduce more realistic variability at finer scales than simpler spatial interpolation schemes. Although BCSD corrects biases at the GCM scale before disaggregation; at finer spatial scales biases are re-introduced by the assumptions made in the spatial disaggregation process. Our study focuses on this limitation of BCSD and proposes a rank-based approach that aims to reduce the spatial disaggregation bias especially for both low and high precipitation extremes. BCSD requires the specification of a multiplicative bias correction anomaly field that represents the ratio of the fine scale precipitation to the disaggregated precipitation. It is shown that there is significant temporal variation in the anomalies, which is masked when a mean anomaly field is used. This can be improved by modelling the anomalies in rank-space. Results from the application of the rank-BCSD procedure improve the match between the distributions of observed and downscaled precipitation at the fine scale compared to the original BCSD approach. Further improvements in the distribution are identified when a scaling correction to preserve mass in the disaggregation process is implemented. An assessment of the approach using a single GCM over Australia shows clear advantages especially in the simulation of particularly low and high downscaled precipitation amounts.
Feature selection model based on clustering and ranking in pipeline for microarray data
Directory of Open Access Journals (Sweden)
Barnali Sahu
2017-01-01
Full Text Available Most of the available feature selection techniques in the literature are classifier bound. It means a group of features tied to the performance of a specific classifier as applied in wrapper and hybrid approach. Our objective in this study is to select a set of generic features not tied to any classifier based on the proposed framework. This framework uses attribute clustering and feature ranking techniques in pipeline in order to remove redundant features. On each uncovered cluster, signal-to-noise ratio, t-statistics and significance analysis of microarray are independently applied to select the top ranked features. Both filter and evolutionary wrapper approaches have been considered for feature selection and the data set with selected features are given to ensemble of predefined statistically different classifiers. The class labels of the test data are determined using majority voting technique. Moreover, with the aforesaid objectives, this paper focuses on obtaining a stable result out of various classification models. Further, a comparative analysis has been performed to study the classification accuracy and computational time of the current approach and evolutionary wrapper techniques. It gives a better insight into the features and further enhancing the classification accuracy with less computational time.
GeoSearcher: Location-Based Ranking of Search Engine Results.
Watters, Carolyn; Amoudi, Ghada
2003-01-01
Discussion of Web queries with geospatial dimensions focuses on an algorithm that assigns location coordinates dynamically to Web sites based on the URL. Describes a prototype search system that uses the algorithm to re-rank search engine results for queries with a geospatial dimension, thus providing an alternative ranking order for search engine…
Directory of Open Access Journals (Sweden)
Abul Kalam Azad
2014-05-01
Full Text Available The best Weibull distribution methods for the assessment of wind energy potential at different altitudes in desired locations are statistically diagnosed in this study. Seven different methods, namely graphical method (GM, method of moments (MOM, standard deviation method (STDM, maximum likelihood method (MLM, power density method (PDM, modified maximum likelihood method (MMLM and equivalent energy method (EEM were used to estimate the Weibull parameters and six statistical tools, namely relative percentage of error, root mean square error (RMSE, mean percentage of error, mean absolute percentage of error, chi-square error and analysis of variance were used to precisely rank the methods. The statistical fittings of the measured and calculated wind speed data are assessed for justifying the performance of the methods. The capacity factor and total energy generated by a small model wind turbine is calculated by numerical integration using Trapezoidal sums and Simpson’s rules. The results show that MOM and MLM are the most efficient methods for determining the value of k and c to fit Weibull distribution curves.
Fabric defect detection based on visual saliency using deep feature and low-rank recovery
Liu, Zhoufeng; Wang, Baorui; Li, Chunlei; Li, Bicao; Dong, Yan
2018-04-01
Fabric defect detection plays an important role in improving the quality of fabric product. In this paper, a novel fabric defect detection method based on visual saliency using deep feature and low-rank recovery was proposed. First, unsupervised training is carried out by the initial network parameters based on MNIST large datasets. The supervised fine-tuning of fabric image library based on Convolutional Neural Networks (CNNs) is implemented, and then more accurate deep neural network model is generated. Second, the fabric images are uniformly divided into the image block with the same size, then we extract their multi-layer deep features using the trained deep network. Thereafter, all the extracted features are concentrated into a feature matrix. Third, low-rank matrix recovery is adopted to divide the feature matrix into the low-rank matrix which indicates the background and the sparse matrix which indicates the salient defect. In the end, the iterative optimal threshold segmentation algorithm is utilized to segment the saliency maps generated by the sparse matrix to locate the fabric defect area. Experimental results demonstrate that the feature extracted by CNN is more suitable for characterizing the fabric texture than the traditional LBP, HOG and other hand-crafted features extraction method, and the proposed method can accurately detect the defect regions of various fabric defects, even for the image with complex texture.
Bibliometric Rankings of Journals Based on the Thomson Reuters Citations Database
C-L. Chang (Chia-Lin); M.J. McAleer (Michael)
2015-01-01
markdownabstract__Abstract__ Virtually all rankings of journals are based on citations, including self citations by journals and individual academics. The gold standard for bibliometric rankings based on citations data is the widely-used Thomson Reuters Web of Science (2014) citations database,
Bibliometric Rankings of Journals based on the Thomson Reuters Citations Database
C-L. Chang (Chia-Lin); M.J. McAleer (Michael)
2015-01-01
markdownabstract__Abstract__ Virtually all rankings of journals are based on citations, including self citations by journals and individual academics. The gold standard for bibliometric rankings based on citations data is the widely-used Thomson Reuters Web of Science (2014) citations database,
Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G
2014-09-01
The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
Neural modelling of ranking data with an application to stated preference data
Directory of Open Access Journals (Sweden)
Catherine Krier
2013-05-01
Full Text Available Although neural networks are commonly encountered to solve classification problems, ranking data present specificities which require adapting the model. Based on a latent utility function defined on the characteristics of the objects to be ranked, the approach suggested in this paper leads to a perceptron-based algorithm for a highly non linear model. Data on stated preferences obtained through a survey by face-to-face interviews, in the field of freight transport, are used to illustrate the method. Numerical difficulties are pinpointed and a Pocket type algorithm is shown to provide an efficient heuristic to minimize the discrete error criterion. A substantial merit of this approach is to provide a workable estimation of contextually interpretable parameters along with a statistical evaluation of the goodness of fit.
A result-driven minimum blocking method for PageRank parallel computing
Tao, Wan; Liu, Tao; Yu, Wei; Huang, Gan
2017-01-01
Matrix blocking is a common method for improving computational efficiency of PageRank, but the blocking rules are hard to be determined, and the following calculation is complicated. In tackling these problems, we propose a minimum blocking method driven by result needs to accomplish a parallel implementation of PageRank algorithm. The minimum blocking just stores the element which is necessary for the result matrix. In return, the following calculation becomes simple and the consumption of the I/O transmission is cut down. We do experiments on several matrixes of different data size and different sparsity degree. The results show that the proposed method has better computational efficiency than traditional blocking methods.
An adaptive ES with a ranking based constraint handling strategy
Directory of Open Access Journals (Sweden)
Kusakci Ali Osman
2014-01-01
Full Text Available To solve a constrained optimization problem, equality constraints can be used to eliminate a problem variable. If it is not feasible, the relations imposed implicitly by the constraints can still be exploited. Most conventional constraint handling methods in Evolutionary Algorithms (EAs do not consider the correlations between problem variables imposed by the constraints. This paper relies on the idea that a proper search operator, which captures mentioned implicit correlations, can improve performance of evolutionary constrained optimization algorithms. To realize this, an Evolution Strategy (ES along with a simplified Covariance Matrix Adaptation (CMA based mutation operator is used with a ranking based constraint-handling method. The proposed algorithm is tested on 13 benchmark problems as well as on a real life design problem. The outperformance of the algorithm is significant when compared with conventional ES-based methods.
Bradshaw, Corey J A; Brook, Barry W
2016-01-01
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68-0.84 Spearman's ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.
Ranking and selection of commercial off-the-shelf using fuzzy distance based approach
Directory of Open Access Journals (Sweden)
Rakesh Garg
2015-06-01
Full Text Available There is a tremendous growth of the use of the component based software engineering (CBSE approach for the development of software systems. The selection of the best suited COTS components which fulfils the necessary requirement for the development of software(s has become a major challenge for the software developers. The complexity of the optimal selection problem increases with an increase in alternative potential COTS components and the corresponding selection criteria. In this research paper, the problem of ranking and selection of Data Base Management Systems (DBMS components is modeled as a multi-criteria decision making problem. A ‘Fuzzy Distance Based Approach (FDBA’ method is proposed for the optimal ranking and selection of DBMS COTS components of an e-payment system based on 14 selection criteria grouped under three major categories i.e. ‘Vendor Capabilities’, ‘Business Issues’ and ‘Cost’. The results of this method are compared with other Analytical Hierarchy Process (AHP which is termed as a typical multi-criteria decision making approach. The proposed methodology is explained with an illustrated example.
Structure-Based Low-Rank Model With Graph Nuclear Norm Regularization for Noise Removal.
Ge, Qi; Jing, Xiao-Yuan; Wu, Fei; Wei, Zhi-Hui; Xiao, Liang; Shao, Wen-Ze; Yue, Dong; Li, Hai-Bo
2017-07-01
Nonlocal image representation methods, including group-based sparse coding and block-matching 3-D filtering, have shown their great performance in application to low-level tasks. The nonlocal prior is extracted from each group consisting of patches with similar intensities. Grouping patches based on intensity similarity, however, gives rise to disturbance and inaccuracy in estimation of the true images. To address this problem, we propose a structure-based low-rank model with graph nuclear norm regularization. We exploit the local manifold structure inside a patch and group the patches by the distance metric of manifold structure. With the manifold structure information, a graph nuclear norm regularization is established and incorporated into a low-rank approximation model. We then prove that the graph-based regularization is equivalent to a weighted nuclear norm and the proposed model can be solved by a weighted singular-value thresholding algorithm. Extensive experiments on additive white Gaussian noise removal and mixed noise removal demonstrate that the proposed method achieves a better performance than several state-of-the-art algorithms.
Tips and Tricks for Successful Application of Statistical Methods to Biological Data.
Schlenker, Evelyn
2016-01-01
This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.
Texture Repairing by Unified Low Rank Optimization
Institute of Scientific and Technical Information of China (English)
Xiao Liang; Xiang Ren; Zhengdong Zhang; Yi Ma
2016-01-01
In this paper, we show how to harness both low-rank and sparse structures in regular or near-regular textures for image completion. Our method is based on a unified formulation for both random and contiguous corruption. In addition to the low rank property of texture, the algorithm also uses the sparse assumption of the natural image: because the natural image is piecewise smooth, it is sparse in certain transformed domain (such as Fourier or wavelet transform). We combine low-rank and sparsity properties of the texture image together in the proposed algorithm. Our algorithm based on convex optimization can automatically and correctly repair the global structure of a corrupted texture, even without precise information about the regions to be completed. This algorithm integrates texture rectification and repairing into one optimization problem. Through extensive simulations, we show our method can complete and repair textures corrupted by errors with both random and contiguous supports better than existing low-rank matrix recovery methods. Our method demonstrates significant advantage over local patch based texture synthesis techniques in dealing with large corruption, non-uniform texture, and large perspective deformation.
Power-law and exponential rank distributions: A panoramic Gibbsian perspective
International Nuclear Information System (INIS)
Eliazar, Iddo
2015-01-01
Rank distributions are collections of positive sizes ordered either increasingly or decreasingly. Many decreasing rank distributions, formed by the collective collaboration of human actions, follow an inverse power-law relation between ranks and sizes. This remarkable empirical fact is termed Zipf’s law, and one of its quintessential manifestations is the demography of human settlements — which exhibits a harmonic relation between ranks and sizes. In this paper we present a comprehensive statistical-physics analysis of rank distributions, establish that power-law and exponential rank distributions stand out as optimal in various entropy-based senses, and unveil the special role of the harmonic relation between ranks and sizes. Our results extend the contemporary entropy-maximization view of Zipf’s law to a broader, panoramic, Gibbsian perspective of increasing and decreasing power-law and exponential rank distributions — of which Zipf’s law is one out of four pillars
Power-law and exponential rank distributions: A panoramic Gibbsian perspective
Energy Technology Data Exchange (ETDEWEB)
Eliazar, Iddo, E-mail: eliazar@post.tau.ac.il
2015-04-15
Rank distributions are collections of positive sizes ordered either increasingly or decreasingly. Many decreasing rank distributions, formed by the collective collaboration of human actions, follow an inverse power-law relation between ranks and sizes. This remarkable empirical fact is termed Zipf’s law, and one of its quintessential manifestations is the demography of human settlements — which exhibits a harmonic relation between ranks and sizes. In this paper we present a comprehensive statistical-physics analysis of rank distributions, establish that power-law and exponential rank distributions stand out as optimal in various entropy-based senses, and unveil the special role of the harmonic relation between ranks and sizes. Our results extend the contemporary entropy-maximization view of Zipf’s law to a broader, panoramic, Gibbsian perspective of increasing and decreasing power-law and exponential rank distributions — of which Zipf’s law is one out of four pillars.
Rank diversity of languages: generic behavior in computational linguistics.
Cocho, Germinal; Flores, Jorge; Gershenson, Carlos; Pineda, Carlos; Sánchez, Sergio
2015-01-01
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: "heads" consist of words which almost do not change their rank in time, "bodies" are words of general use, while "tails" are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.
Adaptive linear rank tests for eQTL studies.
Szymczak, Silke; Scheinhardt, Markus O; Zeller, Tanja; Wild, Philipp S; Blankenberg, Stefan; Ziegler, Andreas
2013-02-10
Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal-Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies. Copyright © 2012 John Wiley & Sons, Ltd.
Ranking scientific publications: the effect of nonlinearity
Yao, Liyang; Wei, Tian; Zeng, An; Fan, Ying; di, Zengru
2014-10-01
Ranking the significance of scientific publications is a long-standing challenge. The network-based analysis is a natural and common approach for evaluating the scientific credit of papers. Although the number of citations has been widely used as a metric to rank papers, recently some iterative processes such as the well-known PageRank algorithm have been applied to the citation networks to address this problem. In this paper, we introduce nonlinearity to the PageRank algorithm when aggregating resources from different nodes to further enhance the effect of important papers. The validation of our method is performed on the data of American Physical Society (APS) journals. The results indicate that the nonlinearity improves the performance of the PageRank algorithm in terms of ranking effectiveness, as well as robustness against malicious manipulations. Although the nonlinearity analysis is based on the PageRank algorithm, it can be easily extended to other iterative ranking algorithms and similar improvements are expected.
Ranking scientific publications: the effect of nonlinearity.
Yao, Liyang; Wei, Tian; Zeng, An; Fan, Ying; Di, Zengru
2014-10-17
Ranking the significance of scientific publications is a long-standing challenge. The network-based analysis is a natural and common approach for evaluating the scientific credit of papers. Although the number of citations has been widely used as a metric to rank papers, recently some iterative processes such as the well-known PageRank algorithm have been applied to the citation networks to address this problem. In this paper, we introduce nonlinearity to the PageRank algorithm when aggregating resources from different nodes to further enhance the effect of important papers. The validation of our method is performed on the data of American Physical Society (APS) journals. The results indicate that the nonlinearity improves the performance of the PageRank algorithm in terms of ranking effectiveness, as well as robustness against malicious manipulations. Although the nonlinearity analysis is based on the PageRank algorithm, it can be easily extended to other iterative ranking algorithms and similar improvements are expected.
The Marketing of Canadian University Rankings: A Misadventure Now 24 Years Old
Cramer, Kenneth M.; Page, Stewart; Burrows, Vanessa; Lamoureux, Chastine; Mackay, Sarah; Pedri, Victoria; Pschibul, Rebecca
2016-01-01
Based on analyses of Maclean's ranking data pertaining to Canadian universities published over the last 24 years, we present a summary of statistical findings of annual ranking exercises, as well as discussion about their current status and the effects upon student welfare. Some illustrative tables are also presented. Using correlational and…
HIV quality report cards: impact of case-mix adjustment and statistical methods.
Ohl, Michael E; Richardson, Kelly K; Goto, Michihiko; Vaughan-Sarrazin, Mary; Schweizer, Marin L; Perencevich, Eli N
2014-10-15
There will be increasing pressure to publicly report and rank the performance of healthcare systems on human immunodeficiency virus (HIV) quality measures. To inform discussion of public reporting, we evaluated the influence of case-mix adjustment when ranking individual care systems on the viral control quality measure. We used data from the Veterans Health Administration (VHA) HIV Clinical Case Registry and administrative databases to estimate case-mix adjusted viral control for 91 local systems caring for 12 368 patients. We compared results using 2 adjustment methods, the observed-to-expected estimator and the risk-standardized ratio. Overall, 10 913 patients (88.2%) achieved viral control (viral load ≤400 copies/mL). Prior to case-mix adjustment, system-level viral control ranged from 51% to 100%. Seventeen (19%) systems were labeled as low outliers (performance significantly below the overall mean) and 11 (12%) as high outliers. Adjustment for case mix (patient demographics, comorbidity, CD4 nadir, time on therapy, and income from VHA administrative databases) reduced the number of low outliers by approximately one-third, but results differed by method. The adjustment model had moderate discrimination (c statistic = 0.66), suggesting potential for unadjusted risk when using administrative data to measure case mix. Case-mix adjustment affects rankings of care systems on the viral control quality measure. Given the sensitivity of rankings to selection of case-mix adjustment methods-and potential for unadjusted risk when using variables limited to current administrative databases-the HIV care community should explore optimal methods for case-mix adjustment before moving forward with public reporting. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
A Fast Algorithm for Generating Permutation Distribution of Ranks in ...
African Journals Online (AJOL)
... function of the distribution of the ranks. This further gives insight into the permutation distribution of a rank statistics. The algorithm is implemented with the aid of the computer algebra system Mathematica. Key words: Combinatorics, generating function, permutation distribution, rank statistics, partitions, computer algebra.
An approach to build knowledge base for reactor accident diagnostic system using statistical method
International Nuclear Information System (INIS)
Kohsaka, Atsuo; Yokobayashi, Masao; Matsumoto, Kiyoshi; Fujii, Minoru
1988-01-01
In the development of a rule based expert system, one of key issues is how to build a knowledge base (KB). A systematic approach has been attempted for building an objective KB efficiently. The approach is based on the concept that a prototype KB should first be generated in a systematic way and then it is to be modified and/or improved by expert for practical use. The statistical method, Factor Analysis, was applied to build a prototype KB for the JAERI expert system DISKET using source information obtained from a PWR simulator. The prototype KB was obtained and the inference with this KB was performed against several types of transients. In each diagnosis, the transient type was well identified. From this study, it is concluded that the statistical method used is useful for building a prototype knowledge base. (author)
Prioritizing sewer rehabilitation projects using AHP-PROMETHEE II ranking method.
Kessili, Abdelhak; Benmamar, Saadia
2016-01-01
The aim of this paper is to develop a methodology for the prioritization of sewer rehabilitation projects for Algiers (Algeria) sewer networks to support the National Sanitation Office in its challenge to make decisions on prioritization of sewer rehabilitation projects. The methodology applies multiple-criteria decision making. The study includes 47 projects (collectors) and 12 criteria to evaluate them. These criteria represent the different issues considered in the prioritization of the projects, which are structural, hydraulic, environmental, financial, social and technical. The analytic hierarchy process (AHP) is used to determine weights of the criteria and the Preference Ranking Organization Method for Enrichment Evaluations (PROMETHEE II) method is used to obtain the final ranking of the projects. The model was verified using the sewer data of Algiers. The results have shown that the method can be used for prioritizing sewer rehabilitation projects.
Wikipedia ranking of world universities
Lages, José; Patt, Antoine; Shepelyansky, Dima L.
2016-03-01
We use the directed networks between articles of 24 Wikipedia language editions for producing the wikipedia ranking of world Universities (WRWU) using PageRank, 2DRank and CheiRank algorithms. This approach allows to incorporate various cultural views on world universities using the mathematical statistical analysis independent of cultural preferences. The Wikipedia ranking of top 100 universities provides about 60% overlap with the Shanghai university ranking demonstrating the reliable features of this approach. At the same time WRWU incorporates all knowledge accumulated at 24 Wikipedia editions giving stronger highlights for historically important universities leading to a different estimation of efficiency of world countries in university education. The historical development of university ranking is analyzed during ten centuries of their history.
Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition
Directory of Open Access Journals (Sweden)
yuan Shuai
2017-01-01
Full Text Available In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion.
Fuzzy ranking based non-dominated sorting genetic algorithm-II for network overload alleviation
Directory of Open Access Journals (Sweden)
Pandiarajan K.
2014-09-01
Full Text Available This paper presents an effective method of network overload management in power systems. The three competing objectives 1 generation cost 2 transmission line overload and 3 real power loss are optimized to provide pareto-optimal solutions. A fuzzy ranking based non-dominated sorting genetic algorithm-II (NSGA-II is used to solve this complex nonlinear optimization problem. The minimization of competing objectives is done by generation rescheduling. Fuzzy ranking method is employed to extract the best compromise solution out of the available non-dominated solutions depending upon its highest rank. N-1 contingency analysis is carried out to identify the most severe lines and those lines are selected for outage. The effectiveness of the proposed approach is demonstrated for different contingency cases in IEEE 30 and IEEE 118 bus systems with smooth cost functions and their results are compared with other single objective evolutionary algorithms like Particle swarm optimization (PSO and Differential evolution (DE. Simulation results show the effectiveness of the proposed approach to generate well distributed pareto-optimal non-dominated solutions of multi-objective problem
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.
Li, Yaohang; Rata, Ionel; Chiu, See-wing; Jakobsson, Eric
2010-07-20
Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.
A Rank-Constrained Matrix Representation for Hypergraph-Based Subspace Clustering
Directory of Open Access Journals (Sweden)
Yubao Sun
2015-01-01
Full Text Available This paper presents a novel, rank-constrained matrix representation combined with hypergraph spectral analysis to enable the recovery of the original subspace structures of corrupted data. Real-world data are frequently corrupted with both sparse error and noise. Our matrix decomposition model separates the low-rank, sparse error, and noise components from the data in order to enhance robustness to the corruption. In order to obtain the desired rank representation of the data within a dictionary, our model directly utilizes rank constraints by restricting the upper bound of the rank range. An alternative projection algorithm is proposed to estimate the low-rank representation and separate the sparse error from the data matrix. To further capture the complex relationship between data distributed in multiple subspaces, we use hypergraph to represent the data by encapsulating multiple related samples into one hyperedge. The final clustering result is obtained by spectral decomposition of the hypergraph Laplacian matrix. Validation experiments on the Extended Yale Face Database B, AR, and Hopkins 155 datasets show that the proposed method is a promising tool for subspace clustering.
Spectral-based features ranking for gamelan instruments identification using filter techniques
Directory of Open Access Journals (Sweden)
Diah P Wulandari
2013-03-01
Full Text Available In this paper, we describe an approach of spectral-based features ranking for Javanese gamelaninstruments identification using filter techniques. The model extracted spectral-based features set of thesignal using Short Time Fourier Transform (STFT. The rank of the features was determined using the fivealgorithms; namely ReliefF, Chi-Squared, Information Gain, Gain Ratio, and Symmetric Uncertainty. Then,we tested the ranked features by cross validation using Support Vector Machine (SVM. The experimentshowed that Gain Ratio algorithm gave the best result, it yielded accuracy of 98.93%.
Yun, Yong-Huan; Deng, Bai-Chuan; Cao, Dong-Sheng; Wang, Wei-Ting; Liang, Yi-Zeng
2016-03-10
Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single "super"-list reflective of the overall preference or importance within the population. This "super"-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable. Copyright © 2016 Elsevier B.V. All rights reserved.
Prewhitening for Rank-Deficient Noise in Subspace Methods for Noise Reduction
DEFF Research Database (Denmark)
Hansen, Per Christian; Jensen, Søren Holdt
2005-01-01
A fundamental issue in connection with subspace methods for noise reduction is that the covariance matrix for the noise is required to have full rank, in order for the prewhitening step to be defined. However, there are important cases where this requirement is not fulfilled, e.g., when the noise...... has narrow-band characteristics, or in the case of tonal noise. We extend the concept of prewhitening to include the case when the noise covariance matrix is rank deficient, using a weighted pseudoinverse and the quotient SVD, and we show how to formulate a general rank-reduction algorithm that works...... also for rank deficient noise. We also demonstrate how to formulate this algorithm by means of a quotient ULV decomposition, which allows for faster computation and updating. Finally we apply our algorithm to a problem involving a speech signal contaminated by narrow-band noise....
PageRank tracker: from ranking to tracking.
Gong, Chen; Fu, Keren; Loza, Artur; Wu, Qiang; Liu, Jia; Yang, Jie
2014-06-01
Video object tracking is widely used in many real-world applications, and it has been extensively studied for over two decades. However, tracking robustness is still an issue in most existing methods, due to the difficulties with adaptation to environmental or target changes. In order to improve adaptability, this paper formulates the tracking process as a ranking problem, and the PageRank algorithm, which is a well-known webpage ranking algorithm used by Google, is applied. Labeled and unlabeled samples in tracking application are analogous to query webpages and the webpages to be ranked, respectively. Therefore, determining the target is equivalent to finding the unlabeled sample that is the most associated with existing labeled set. We modify the conventional PageRank algorithm in three aspects for tracking application, including graph construction, PageRank vector acquisition and target filtering. Our simulations with the use of various challenging public-domain video sequences reveal that the proposed PageRank tracker outperforms mean-shift tracker, co-tracker, semiboosting and beyond semiboosting trackers in terms of accuracy, robustness and stability.
Li, Chaoxing; Liu, Li; Dinu, Valentin
2018-01-01
Complex diseases such as cancer are usually the result of a combination of environmental factors and one or several biological pathways consisting of sets of genes. Each biological pathway exerts its function by delivering signaling through the gene network. Theoretically, a pathway is supposed to have a robust topological structure under normal physiological conditions. However, the pathway's topological structure could be altered under some pathological condition. It is well known that a normal biological network includes a small number of well-connected hub nodes and a large number of nodes that are non-hubs. In addition, it is reported that the loss of connectivity is a common topological trait of cancer networks, which is an assumption of our method. Hence, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal or the distribution of topological ranks of genes might be altered. Based on this, we propose a new PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) to detect pathways involved in cancer. We use PageRank to measure the relative topological ranks of genes in each biological pathway, then select hub genes for each pathway, and use Fisher's exact test to test if the number of hub genes in each pathway is altered from normal to cancer. Alternatively, if the distribution of topological ranks of gene in a pathway is altered between normal and cancer, this pathway might also be involved in cancer. Hence, we use the Kolmogorov-Smirnov test to detect pathways that have an altered distribution of topological ranks of genes between two phenotypes. We apply PoTRA to study hepatocellular carcinoma (HCC) and several subtypes of HCC. Very interestingly, we discover that all significant pathways in HCC are cancer-associated generally, while several significant pathways in subtypes
Extreme learning machine for ranking: generalization analysis and applications.
Chen, Hong; Peng, Jiangtao; Zhou, Yicong; Li, Luoqing; Pan, Zhibin
2014-05-01
The extreme learning machine (ELM) has attracted increasing attention recently with its successful applications in classification and regression. In this paper, we investigate the generalization performance of ELM-based ranking. A new regularized ranking algorithm is proposed based on the combinations of activation functions in ELM. The generalization analysis is established for the ELM-based ranking (ELMRank) in terms of the covering numbers of hypothesis space. Empirical results on the benchmark datasets show the competitive performance of the ELMRank over the state-of-the-art ranking methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Pourgol-Mohammad, Mohammad, E-mail: pourgolmohammad@sut.ac.ir [Department of Mechanical Engineering, Sahand University of Technology, Tabriz (Iran, Islamic Republic of); Hoseyni, Seyed Mohsen [Department of Basic Sciences, East Tehran Branch, Islamic Azad University, Tehran (Iran, Islamic Republic of); Hoseyni, Seyed Mojtaba [Building & Housing Research Center, Tehran (Iran, Islamic Republic of); Sepanloo, Kamran [Nuclear Science and Technology Research Institute, Tehran (Iran, Islamic Republic of)
2016-08-15
Highlights: • Existing uncertainty ranking methods prove inconsistent for TH applications. • Introduction of a new method for ranking sources of uncertainty in TH codes. • Modified PIRT qualitatively identifies and ranks uncertainty sources more precisely. • The importance of parameters is calculated by a limited number of TH code executions. • Methodology is applied successfully on LOFT-LB1 test facility. - Abstract: In application to thermal–hydraulic calculations by system codes, sensitivity analysis plays an important role for managing the uncertainties of code output and risk analysis. Sensitivity analysis is also used to confirm the results of qualitative Phenomena Identification and Ranking Table (PIRT). Several methodologies have been developed to address uncertainty importance assessment. Generally, uncertainty importance measures, mainly devised for the Probabilistic Risk Assessment (PRA) applications, are not affordable for computationally demanding calculations of the complex thermal–hydraulics (TH) system codes. In other words, for effective quantification of the degree of the contribution of each phenomenon to the total uncertainty of the output, a practical approach is needed by considering high computational burden of TH calculations. This study aims primarily to show the inefficiency of the existing approaches and then introduces a solution to cope with the challenges in this area by modification of variance-based uncertainty importance method. Important parameters are identified by the modified PIRT approach qualitatively then their uncertainty importance is quantified by a local derivative index. The proposed index is attractive from its practicality point of view on TH applications. It is capable of calculating the importance of parameters by a limited number of TH code executions. Application of the proposed methodology is demonstrated on LOFT-LB1 test facility.
International Nuclear Information System (INIS)
Pourgol-Mohammad, Mohammad; Hoseyni, Seyed Mohsen; Hoseyni, Seyed Mojtaba; Sepanloo, Kamran
2016-01-01
Highlights: • Existing uncertainty ranking methods prove inconsistent for TH applications. • Introduction of a new method for ranking sources of uncertainty in TH codes. • Modified PIRT qualitatively identifies and ranks uncertainty sources more precisely. • The importance of parameters is calculated by a limited number of TH code executions. • Methodology is applied successfully on LOFT-LB1 test facility. - Abstract: In application to thermal–hydraulic calculations by system codes, sensitivity analysis plays an important role for managing the uncertainties of code output and risk analysis. Sensitivity analysis is also used to confirm the results of qualitative Phenomena Identification and Ranking Table (PIRT). Several methodologies have been developed to address uncertainty importance assessment. Generally, uncertainty importance measures, mainly devised for the Probabilistic Risk Assessment (PRA) applications, are not affordable for computationally demanding calculations of the complex thermal–hydraulics (TH) system codes. In other words, for effective quantification of the degree of the contribution of each phenomenon to the total uncertainty of the output, a practical approach is needed by considering high computational burden of TH calculations. This study aims primarily to show the inefficiency of the existing approaches and then introduces a solution to cope with the challenges in this area by modification of variance-based uncertainty importance method. Important parameters are identified by the modified PIRT approach qualitatively then their uncertainty importance is quantified by a local derivative index. The proposed index is attractive from its practicality point of view on TH applications. It is capable of calculating the importance of parameters by a limited number of TH code executions. Application of the proposed methodology is demonstrated on LOFT-LB1 test facility.
Directory of Open Access Journals (Sweden)
Matthew eMaestri
2014-03-01
Full Text Available For scientific, clinical, and machine learning purposes alike, it is desirable to quantify the verbal reports of high-level visual percepts. Methods to do this simply do not exist at present. Here we propose a novel methodological principle to help fill this gap, and provide empirical evidence designed to serve as the initial ‘proof’ of this principle. In the proposed method, subjects view images real-world scenes and describe, in their own words, what they saw. The verbal description is independently evaluated by several evaluators. Each evaluator assigns a rank score to the subject’s description of each visual object in each image using a novel ranking principle, which takes advantage of the well-known fact that semantic descriptions of real-life objects and scenes can usually be rank-ordered. Thus, for instance, ‘animal’, ‘dog’, and ‘retriever’ can be regarded as increasingly finer-level, and therefore higher-ranking, descriptions of a given object. These numeric scores can preserve the richness of the original verbal description, and can be subsequently evaluated using conventional statistical procedures. We describe an exemplar implementation of this method and empirical data that show its feasibility. With appropriate future standardization and validation, this novel method can serve as an important tool to help quantify the subjective experience of the visual world. In addition to being a novel, potentially powerful testing tool, our method also represents, to our knowledge, the only available method for numerically representing verbal accounts of real-world experience. Given that its minimal requirements, i.e., a verbal description and the ground truth that elicited the description, our method has a wide variety of potential real-world applications.
Zhang, Yun; Baheti, Saurabh; Sun, Zhifu
2018-05-01
High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.
A meta-analysis based method for prioritizing candidate genes involved in a pre-specific function
Directory of Open Access Journals (Sweden)
Jingjing Zhai
2016-12-01
Full Text Available The identification of genes associated with a given biological function in plants remains a challenge, although network-based gene prioritization algorithms have been developed for Arabidopsis thaliana and many non-model plant species. Nevertheless, these network-based gene prioritization algorithms have encountered several problems; one in particular is that of unsatisfactory prediction accuracy due to limited network coverage, varying link quality, and/or uncertain network connectivity. Thus a model that integrates complementary biological data may be expected to increase the prediction accuracy of gene prioritization. Towards this goal, we developed a novel gene prioritization method named RafSee, to rank candidate genes using a random forest algorithm that integrates sequence, evolutionary, and epigenetic features of plants. Subsequently, we proposed an integrative approach named RAP (Rank Aggregation-based data fusion for gene Prioritization, in which an order statistics-based meta-analysis was used to aggregate the rank of the network-based gene prioritization method and RafSee, for accurately prioritizing candidate genes involved in a pre-specific biological function. Finally, we showcased the utility of RAP by prioritizing 380 flowering-time genes in Arabidopsis. The ‘leave-one-out’ cross-validation experiment showed that RafSee could work as a complement to a current state-of-art network-based gene prioritization system (AraNet v2. Moreover, RAP ranked 53.68% (204/380 flowering-time genes higher than AraNet v2, resulting in an 39.46% improvement in term of the first quartile rank. Further evaluations also showed that RAP was effective in prioritizing genes-related to different abiotic stresses. To enhance the usability of RAP for Arabidopsis and non-model plant species, an R package implementing the method is freely available at http://bioinfo.nwafu.edu.cn/software.
International Nuclear Information System (INIS)
Cho, Y.H.; Ko, H.S.; Kim, S.H.; Kang, C.S.; Moon, J.H.; Kim, K.D.
2004-01-01
The cost-effective reduction of occupational radiation dose (ORD) at a nuclear power plant could not be achieved without going through an extensive analysis of accumulated ORD data of existing plants. Through the data analysis, it is required to identify what are the jobs of repetitive high ORD at the nuclear power plant. In general the point value method commonly used, over-estimates the role of mean and median values to identify the high ORD jobs which can lead to misjudgment. In this study, Percentile Rank Sum Method (PRSM) is proposed to identify repetitive high ORD jobs, which is based on non-parametric statistical theory. As a case study, the method is applied to ORD data of maintenance and repair jobs at Kori units 3 and 4 that are pressurized water reactors with 950 MWe capacity and have been operated since 1986 and 1987, respectively in Korea. The results were verified and validated, and PRSM has been demonstrated to be an efficient method of analyzing the data. (authors)
Rank Diversity of Languages: Generic Behavior in Computational Linguistics
Cocho, Germinal; Flores, Jorge; Gershenson, Carlos; Pineda, Carlos; Sánchez, Sergio
2015-01-01
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied. PMID:25849150
Generalization Performance of Regularized Ranking With Multiscale Kernels.
Zhou, Yicong; Chen, Hong; Lan, Rushi; Pan, Zhibin
2016-05-01
The regularized kernel method for the ranking problem has attracted increasing attentions in machine learning. The previous regularized ranking algorithms are usually based on reproducing kernel Hilbert spaces with a single kernel. In this paper, we go beyond this framework by investigating the generalization performance of the regularized ranking with multiscale kernels. A novel ranking algorithm with multiscale kernels is proposed and its representer theorem is proved. We establish the upper bound of the generalization error in terms of the complexity of hypothesis spaces. It shows that the multiscale ranking algorithm can achieve satisfactory learning rates under mild conditions. Experiments demonstrate the effectiveness of the proposed method for drug discovery and recommendation tasks.
Directory of Open Access Journals (Sweden)
Ebrahim Ghorbani-Kalhor
2015-04-01
Full Text Available In the current work, a new version of rank annihilation factor analysis was developedto circumvent the rank deficiency problem in multivariate data measurements.Simultaneous determination of dissociation constant and concentration of monoprotic acids was performed by applying model-based rank annihilation factor analysis on variation matrices of spectrophotometric acid-base titrations data. Variation matrices can be obtained by subtracting first row of data matrix from all rows of the main data matrix. This method uses variation matrices instead of multivariate spectrophotometric acid-base titrations matrices to circumvent the rank deficiency problem in the rank quantitation step. The applicability of this approach was evaluated by simulated data at first stage, then the binary mixtures of ascorbic and sorbic acids as model compounds were investigated by the proposed method. At the end, the proposed method was successfully applied for resolving the ascorbic and sorbic acid in an orange juice real sample. Therefore, unique results were achieved by applying rank annihilation factor analysis on variation matrix and using hard soft model combination advantage without any problem and difficulty in rank determination. Normal 0 false false false EN-US X-NONE AR-SA /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Arial; mso-bidi-theme-font:minor-bidi; mso-bidi-language:AR-SA;}
Ranking nodes in growing networks: When PageRank fails.
Mariani, Manuel Sebastian; Medo, Matúš; Zhang, Yi-Cheng
2015-11-10
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
With the fast growth of Chinese economic,more and more capital will be invested in environmental projects.How to select the environmental investment projects(alternatives)for obtaining the best environmental quality and economic benefits is an important problem for the decision makers.The purpose of this paper is to develop a decision-making model to rank a finite number of alternatives with several and sometimes conflicting criteria.A model for ranking the projects of municipal sewage treatment plants is proposed by using exports' information and the data of the real projects.And,the ranking result is given based on the PROMETHEE method. Furthermore,by means of the concept of the weight stability intervals(WSI),the sensitivity of the ranking results to the size of criteria values and the change of weights value of criteria are discussed.The result shows that some criteria,such as"proportion of benefit to projoct cost",will influence the ranking result of alternatives very strong while others not.The influence are not only from the value of criterion but also from the changing the weight of criterion.So,some criteria such as"proportion of benefit to projoct cost" are key critera for ranking the projects. Decision makers must be cautious to them.
Directory of Open Access Journals (Sweden)
Lei Guo
2017-02-01
Full Text Available Point-of-interest (POI recommendation has been well studied in recent years. However, most of the existing methods focus on the recommendation scenarios where users can provide explicit feedback. In most cases, however, the feedback is not explicit, but implicit. For example, we can only get a user’s check-in behaviors from the history of what POIs she/he has visited, but never know how much she/he likes and why she/he does not like them. Recently, some researchers have noticed this problem and began to learn the user preferences from the partial order of POIs. However, these works give equal weight to each POI pair and cannot distinguish the contributions from different POI pairs. Intuitively, for the two POIs in a POI pair, the larger the frequency difference of being visited and the farther the geographical distance between them, the higher the contribution of this POI pair to the ranking function. Based on the above observations, we propose a weighted ranking method for POI recommendation. Specifically, we first introduce a Bayesian personalized ranking criterion designed for implicit feedback to POI recommendation. To fully utilize the partial order of POIs, we then treat the cost function in a weighted way, that is give each POI pair a different weight according to their frequency of being visited and the geographical distance between them. Data analysis and experimental results on two real-world datasets demonstrate the existence of user preference on different POI pairs and the effectiveness of our weighted ranking method.
A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
Wei, Wei; Gao, Bin; Liu, Tie-Yan; Wang, Taifeng; Li, Guohui; Li, Hang
2016-04-01
Graph-based ranking has been extensively studied and frequently applied in many applications, such as webpage ranking. It aims at mining potentially valuable information from the raw graph-structured data. Recently, with the proliferation of rich heterogeneous information (e.g., node/edge features and prior knowledge) available in many real-world graphs, how to effectively and efficiently leverage all information to improve the ranking performance becomes a new challenging problem. Previous methods only utilize part of such information and attempt to rank graph nodes according to link-based methods, of which the ranking performances are severely affected by several well-known issues, e.g., over-fitting or high computational complexity, especially when the scale of graph is very large. In this paper, we address the large-scale graph-based ranking problem and focus on how to effectively exploit rich heterogeneous information of the graph to improve the ranking performance. Specifically, we propose an innovative and effective semi-supervised PageRank (SSP) approach to parameterize the derived information within a unified semi-supervised learning framework (SSLF-GR), then simultaneously optimize the parameters and the ranking scores of graph nodes. Experiments on the real-world large-scale graphs demonstrate that our method significantly outperforms the algorithms that consider such graph information only partially.
Directory of Open Access Journals (Sweden)
Renata Maciel de Melo
2015-03-01
Full Text Available The quality of the construction production process may be improved using several different methods such as Lean Construction, ISO 9001, ISO 14001 or ISO 18001. Construction companies need a preliminary study and systematic implementation of changes to become more competitive and efficient. This paper presents a multicriteria decision model for the selection and ranking of such alternatives for improvement approaches regarding the aspects of quality, sustainability and safety, based on the PROMETHEE II method. The adoption of this model provides more confidence and visibility for decision makers. One of the differentiators of this model is the use of a fragmented set of improvement alternatives. These alternatives were combined with some restrictions to create a global set of alternatives. An application to three scenarios, considering realistic data, was developed. The results of the application show that the model should be incorporated into the strategic planning process of organizations.
Beyond Low Rank: A Data-Adaptive Tensor Completion Method
Zhang, Lei; Wei, Wei; Shi, Qinfeng; Shen, Chunhua; Hengel, Anton van den; Zhang, Yanning
2017-01-01
Low rank tensor representation underpins much of recent progress in tensor completion. In real applications, however, this approach is confronted with two challenging problems, namely (1) tensor rank determination; (2) handling real tensor data which only approximately fulfils the low-rank requirement. To address these two issues, we develop a data-adaptive tensor completion model which explicitly represents both the low-rank and non-low-rank structures in a latent tensor. Representing the no...
Ranking as parameter estimation
Czech Academy of Sciences Publication Activity Database
Kárný, Miroslav; Guy, Tatiana Valentine
2009-01-01
Roč. 4, č. 2 (2009), s. 142-158 ISSN 1745-7645 R&D Projects: GA MŠk 2C06001; GA AV ČR 1ET100750401; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : ranking * Bayesian estimation * negotiation * modelling Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2009/AS/karny- ranking as parameter estimation.pdf
Accuracy Evaluation of C4.5 and Naive Bayes Classifiers Using Attribute Ranking Method
Directory of Open Access Journals (Sweden)
S. Sivakumari
2009-03-01
Full Text Available This paper intends to classify the Ljubljana Breast Cancer dataset using C4.5 Decision Tree and Nai?ve Bayes classifiers. In this work, classification is carriedout using two methods. In the first method, dataset is analysed using all the attributes in the dataset. In the second method, attributes are ranked using information gain ranking technique and only the high ranked attributes are used to build the classification model. We are evaluating the results of C4.5 Decision Tree and Nai?ve Bayes classifiers in terms of classifier accuracy for various folds of cross validation. Our results show that both the classifiers achieve good accuracy on the dataset.
An Efficient Normalized Rank Based SVM for Room Level Indoor WiFi Localization with Diverse Devices
Directory of Open Access Journals (Sweden)
Yasmine Rezgui
2017-01-01
Full Text Available This paper proposes an efficient and effective WiFi fingerprinting-based indoor localization algorithm, which uses the Received Signal Strength Indicator (RSSI of WiFi signals. In practical harsh indoor environments, RSSI variation and hardware variance can significantly degrade the performance of fingerprinting-based localization methods. To address the problem of hardware variance and signal fluctuation in WiFi fingerprinting-based localization, we propose a novel normalized rank based Support Vector Machine classifier (NR-SVM. Moving from RSSI value based analysis to the normalized rank transformation based analysis, the principal features are prioritized and the dimensionalities of signature vectors are taken into account. The proposed method has been tested using sixteen different devices in a shopping mall with 88 shops. The experimental results demonstrate its robustness with no less than 98.75% correct estimation in 93.75% of the tested cases and 100% correct rate in 56.25% of cases. In the experiments, the new method shows better performance over the KNN, Naïve Bayes, Random Forest, and Neural Network algorithms. Furthermore, we have compared the proposed approach with three popular calibration-free transformation based methods, including difference method (DIFF, Signal Strength Difference (SSD, and the Hyperbolic Location Fingerprinting (HLF based SVM. The results show that the NR-SVM outperforms these popular methods.
Zhang, Kejiang; Kluck, Cheryl; Achari, Gopal
2009-11-01
A ranking system for contaminated sites based on comparative risk methodology using fuzzy Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE) was developed in this article. It combines the concepts of fuzzy sets to represent uncertain site information with the PROMETHEE, a subgroup of Multi-Criteria Decision Making (MCDM) methods. Criteria are identified based on a combination of the attributes (toxicity, exposure, and receptors) associated with the potential human health and ecological risks posed by contaminated sites, chemical properties, site geology and hydrogeology and contaminant transport phenomena. Original site data are directly used avoiding the subjective assignment of scores to site attributes. When the input data are numeric and crisp the PROMETHEE method can be used. The Fuzzy PROMETHEE method is preferred when substantial uncertainties and subjectivities exist in site information. The PROMETHEE and fuzzy PROMETHEE methods are both used in this research to compare the sites. The case study shows that this methodology provides reasonable results.
Fuzzy-set based contingency ranking
International Nuclear Information System (INIS)
Hsu, Y.Y.; Kuo, H.C.
1992-01-01
In this paper, a new approach based on fuzzy set theory is developed for contingency ranking of Taiwan power system. To examine whether a power system can remain in a secure and reliable operating state under contingency conditions, those contingency cases that will result in loss-of-load, loss-of generation, or islanding are first identified. Then 1P-1Q iteration of fast decoupled load flow is preformed to estimate post-contingent quantities (line flows, bus voltages) for other contingency cases. Based on system operators' past experience, each post-contingent quantity is assigned a degree of severity according to the potential damage that could be imposed on the power system by the quantity, should the contingency occurs. An approach based on fuzzy set theory is developed to deal with the imprecision of linguistic terms
Scheduling for Multiuser MIMO Downlink Channels with Ranking-Based Feedback
Kountouris, Marios; Sälzer, Thomas; Gesbert, David
2008-12-01
We consider a multi-antenna broadcast channel with more single-antenna receivers than transmit antennas and partial channel state information at the transmitter (CSIT). We propose a novel type of CSIT representation for the purpose of user selection, coined as ranking-based feedback. Each user calculates and feeds back the rank, an integer between 1 and W + 1, of its instantaneous channel quality information (CQI) among a set of W past CQI measurements. Apart from reducing significantly the required feedback load, ranking-based feedback enables the transmitter to select users that are on the highest peak (quantile) with respect to their own channel distribution, independently of the distribution of other users. It can also be shown that this feedback metric can restore temporal fairness in heterogeneous networks, in which users' channels are not identically distributed and mobile terminals experience different average signal-to-noise ratio (SNR). The performance of a system that performs user selection using ranking-based CSIT in the context of random opportunistic beamforming is analyzed, and we provide design guidelines on the number of required past CSIT samples and the impact of finite W on average throughput. Simulation results show that feedback reduction of order of 40-50% can be achieved with negligible decrease in system throughput.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Directory of Open Access Journals (Sweden)
Ke Li
2016-01-01
Full Text Available A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF and Diagnostic Bayesian Network (DBN is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO. To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA is proposed to evaluate the sensitiveness of symptom parameters (SPs for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-01
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Manifold Based Low-rank Regularization for Image Restoration and Semi-supervised Learning
Lai, Rongjie; Li, Jia
2017-01-01
Low-rank structures play important role in recent advances of many problems in image science and data science. As a natural extension of low-rank structures for data with nonlinear structures, the concept of the low-dimensional manifold structure has been considered in many data processing problems. Inspired by this concept, we consider a manifold based low-rank regularization as a linear approximation of manifold dimension. This regularization is less restricted than the global low-rank regu...
Advanced statistical methods in data science
Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao
2016-01-01
This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...
Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles
2011-01-01
Background Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. Results Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. Conclusions In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches. PMID:21342534
Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth
2015-10-01
Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations
Fuzzy Group Decision Making Approach for Ranking Work Stations Based on Physical Pressure
Directory of Open Access Journals (Sweden)
Hamed Salmanzadeh
2014-06-01
Full Text Available This paper proposes a Fuzzy Group Decision Making approach for ranking work stations based on physical pressure. Fuzzy group decision making approach allows experts to evaluate different ergonomic factors using linguistic terms such as very high, high, medium, low, very low, rather than precise numerical values. In this way, there is no need to measure parameters and evaluation can be easily made in a group. According to ergonomics much work contents and situations, accompanied with multiple parameters and uncertainties, fuzzy group decision making is the best way to evaluate such a chameleon of concept. A case study was down to utilize the approach and illustrate its application in ergonomic assessment and ranking the work stations based on work pressure and found that this approach provides flexibility, practicality, efficiency in making decision around ergonomics areas. The normalized defuzzification numbers which are resulted from this method are compared with result of quantitative assessment of Automotive Assembly Work Sheet auto, it’s demonstrated that the proposed method result is 10% less than Automotive Assembly Work Sheet, approximately.
Data depth and rank-based tests for covariance and spectral density matrices
Chau, Joris
2017-06-26
In multivariate time series analysis, objects of primary interest to study cross-dependences in the time series are the autocovariance or spectral density matrices. Non-degenerate covariance and spectral density matrices are necessarily Hermitian and positive definite, and our primary goal is to develop new methods to analyze samples of such matrices. The main contribution of this paper is the generalization of the concept of statistical data depth for collections of covariance or spectral density matrices by exploiting the geometric properties of the space of Hermitian positive definite matrices as a Riemannian manifold. This allows one to naturally characterize most central or outlying matrices, but also provides a practical framework for rank-based hypothesis testing in the context of samples of covariance or spectral density matrices. First, the desired properties of a data depth function acting on the space of Hermitian positive definite matrices are presented. Second, we propose two computationally efficient pointwise and integrated data depth functions that satisfy each of these requirements. Several applications of the developed methodology are illustrated by the analysis of collections of spectral matrices in multivariate brain signal time series datasets.
Data depth and rank-based tests for covariance and spectral density matrices
Chau, Joris; Ombao, Hernando; Sachs, Rainer von
2017-01-01
In multivariate time series analysis, objects of primary interest to study cross-dependences in the time series are the autocovariance or spectral density matrices. Non-degenerate covariance and spectral density matrices are necessarily Hermitian and positive definite, and our primary goal is to develop new methods to analyze samples of such matrices. The main contribution of this paper is the generalization of the concept of statistical data depth for collections of covariance or spectral density matrices by exploiting the geometric properties of the space of Hermitian positive definite matrices as a Riemannian manifold. This allows one to naturally characterize most central or outlying matrices, but also provides a practical framework for rank-based hypothesis testing in the context of samples of covariance or spectral density matrices. First, the desired properties of a data depth function acting on the space of Hermitian positive definite matrices are presented. Second, we propose two computationally efficient pointwise and integrated data depth functions that satisfy each of these requirements. Several applications of the developed methodology are illustrated by the analysis of collections of spectral matrices in multivariate brain signal time series datasets.
Directory of Open Access Journals (Sweden)
Chaoxing Li
2018-04-01
Full Text Available Complex diseases such as cancer are usually the result of a combination of environmental factors and one or several biological pathways consisting of sets of genes. Each biological pathway exerts its function by delivering signaling through the gene network. Theoretically, a pathway is supposed to have a robust topological structure under normal physiological conditions. However, the pathway’s topological structure could be altered under some pathological condition. It is well known that a normal biological network includes a small number of well-connected hub nodes and a large number of nodes that are non-hubs. In addition, it is reported that the loss of connectivity is a common topological trait of cancer networks, which is an assumption of our method. Hence, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal or the distribution of topological ranks of genes might be altered. Based on this, we propose a new PageRank-based method called Pathways of Topological Rank Analysis (PoTRA to detect pathways involved in cancer. We use PageRank to measure the relative topological ranks of genes in each biological pathway, then select hub genes for each pathway, and use Fisher’s exact test to test if the number of hub genes in each pathway is altered from normal to cancer. Alternatively, if the distribution of topological ranks of gene in a pathway is altered between normal and cancer, this pathway might also be involved in cancer. Hence, we use the Kolmogorov–Smirnov test to detect pathways that have an altered distribution of topological ranks of genes between two phenotypes. We apply PoTRA to study hepatocellular carcinoma (HCC and several subtypes of HCC. Very interestingly, we discover that all significant pathways in HCC are cancer-associated generally, while several
Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths.
Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T
How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method.
A Ranking Method for Evaluating Constructed Responses
Attali, Yigal
2014-01-01
This article presents a comparative judgment approach for holistically scored constructed response tasks. In this approach, the grader rank orders (rather than rate) the quality of a small set of responses. A prior automated evaluation of responses guides both set formation and scaling of rankings. Sets are formed to have similar prior scores and…
Ranking structures and rank-rank correlations of countries: The FIFA and UEFA cases
Ausloos, Marcel; Cloots, Rudi; Gadomski, Adam; Vitanov, Nikolay K.
2014-04-01
Ranking of agents competing with each other in complex systems may lead to paradoxes according to the pre-chosen different measures. A discussion is presented on such rank-rank, similar or not, correlations based on the case of European countries ranked by UEFA and FIFA from different soccer competitions. The first question to be answered is whether an empirical and simple law is obtained for such (self-) organizations of complex sociological systems with such different measuring schemes. It is found that the power law form is not the best description contrary to many modern expectations. The stretched exponential is much more adequate. Moreover, it is found that the measuring rules lead to some inner structures in both cases.
Directory of Open Access Journals (Sweden)
Motoki Yokoyama
2017-07-01
Full Text Available The prevalence of smartphones and wireless broadband networks have been progressing as a new Railway infomration environment. According to the spread of such devices and information technology, various types of information can be obtained from databases connected to the Internet. One scenario of obtaining such a wide variety of information resources is in the phase of user’s transportation. This paper proposes an information provision system, named the Station Concierge System that matches the situation and intention of passengers. The purpose of this system is to estimate the needs of passengers like station staff or hotel concierge and to provide information resources that satisfy user’s expectations dynamically. The most important module of the system is constructed based on a new information ranking method for passenger intention prediction and service recommendation. This method has three main features, which are (1 projecting a user to semantic vector space by using her current context, (2 predicting the intention of a user based on selecting a semantic vector subspace, and (3 ranking the services by a descending order of relevant scores to the user’ intention. By comparing the predicted results of our method with those of two straightforward computation methods, the experimental studies show the effectiveness and efficiency of the proposed method. Using this system, users can obtain transit information and service map that dynamically matches their context.
Adaptive distributional extensions to DFR ranking
DEFF Research Database (Denmark)
Petersen, Casper; Simonsen, Jakob Grue; Järvelin, Kalervo
2016-01-01
-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood...
Directory of Open Access Journals (Sweden)
M. N. Ivliev
2016-01-01
Full Text Available The work is devoted to methods of analysis the company financial condition, including aggregated ratings. It is proposed to use the generalized solvency and liquidity indicator and the capital structure composite index. Mathematically, the generalized index is a sum of variables-characteristics and weighting factors characterizing the relative importance of individual characteristics composition. It is offered to select the significant features from a set of standard financial ratios, calculated according to enterprises balance sheets. To obtain the weighting factors values it is proposed to use one of the expert statistical approaches, the analytic hierarchy process. The method is as follows: we choose the most important characteristic and after the experts determine the degree of preference for the main feature based on the linguistic scale. Further, matrix of pairwise comparisons based on the assigned ranks is compiled, which characterizes the relative importance of attributes. The required coefficients are determined as elements of a vector of priorities, which is the first vector of the matrix of paired comparisons. The paper proposes a mechanism for finding the fields for rating numbers analysis. In addition, the paper proposes a method for the statistical evaluation of the balance sheets of various companies by calculating the mutual correlation matrices. Based on the considered mathematical methods to determine quantitative characteristics of technical objects financial and economic activities, was developed algorithms, information and software allowing to realize of different systems economic analysis.
RANWAR: rank-based weighted association rule mining from gene expression and methylation data.
Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal
2015-01-01
Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.
Weakly intrusive low-rank approximation method for nonlinear parameter-dependent equations
Giraldi, Loic; Nouy, Anthony
2017-01-01
This paper presents a weakly intrusive strategy for computing a low-rank approximation of the solution of a system of nonlinear parameter-dependent equations. The proposed strategy relies on a Newton-like iterative solver which only requires evaluations of the residual of the parameter-dependent equation and of a preconditioner (such as the differential of the residual) for instances of the parameters independently. The algorithm provides an approximation of the set of solutions associated with a possibly large number of instances of the parameters, with a computational complexity which can be orders of magnitude lower than when using the same Newton-like solver for all instances of the parameters. The reduction of complexity requires efficient strategies for obtaining low-rank approximations of the residual, of the preconditioner, and of the increment at each iteration of the algorithm. For the approximation of the residual and the preconditioner, weakly intrusive variants of the empirical interpolation method are introduced, which require evaluations of entries of the residual and the preconditioner. Then, an approximation of the increment is obtained by using a greedy algorithm for low-rank approximation, and a low-rank approximation of the iterate is finally obtained by using a truncated singular value decomposition. When the preconditioner is the differential of the residual, the proposed algorithm is interpreted as an inexact Newton solver for which a detailed convergence analysis is provided. Numerical examples illustrate the efficiency of the method.
Weakly intrusive low-rank approximation method for nonlinear parameter-dependent equations
Giraldi, Loic
2017-06-30
This paper presents a weakly intrusive strategy for computing a low-rank approximation of the solution of a system of nonlinear parameter-dependent equations. The proposed strategy relies on a Newton-like iterative solver which only requires evaluations of the residual of the parameter-dependent equation and of a preconditioner (such as the differential of the residual) for instances of the parameters independently. The algorithm provides an approximation of the set of solutions associated with a possibly large number of instances of the parameters, with a computational complexity which can be orders of magnitude lower than when using the same Newton-like solver for all instances of the parameters. The reduction of complexity requires efficient strategies for obtaining low-rank approximations of the residual, of the preconditioner, and of the increment at each iteration of the algorithm. For the approximation of the residual and the preconditioner, weakly intrusive variants of the empirical interpolation method are introduced, which require evaluations of entries of the residual and the preconditioner. Then, an approximation of the increment is obtained by using a greedy algorithm for low-rank approximation, and a low-rank approximation of the iterate is finally obtained by using a truncated singular value decomposition. When the preconditioner is the differential of the residual, the proposed algorithm is interpreted as an inexact Newton solver for which a detailed convergence analysis is provided. Numerical examples illustrate the efficiency of the method.
Many-Objective Optimization Using Adaptive Differential Evolution with a New Ranking Method
Directory of Open Access Journals (Sweden)
Xiaoguang He
2014-01-01
Full Text Available Pareto dominance is an important concept and is usually used in multiobjective evolutionary algorithms (MOEAs to determine the nondominated solutions. However, for many-objective problems, using Pareto dominance to rank the solutions even in the early generation, most obtained solutions are often the nondominated solutions, which results in a little selection pressure of MOEAs toward the optimal solutions. In this paper, a new ranking method is proposed for many-objective optimization problems to verify a relatively smaller number of representative nondominated solutions with a uniform and wide distribution and improve the selection pressure of MOEAs. After that, a many-objective differential evolution with the new ranking method (MODER for handling many-objective optimization problems is designed. At last, the experiments are conducted and the proposed algorithm is compared with several well-known algorithms. The experimental results show that the proposed algorithm can guide the search to converge to the true PF and maintain the diversity of solutions for many-objective problems.
Research on Statistical Flow of the Complex Background Based on Image Method
Directory of Open Access Journals (Sweden)
Yang Huanhai
2014-06-01
Full Text Available Along with our country city changes a process continues to accelerate, city road traffic system pressure increasing. Therefore, the importance of intelligent transportation system based on computer vision technology is becoming more and more significant. Using the image processing technology for the vehicle detection has become a hot topic in the research field of. Only accurately segmented from the background of vehicle can recognize and track vehicles. Therefore, the application of video vehicle detection technology and image processing technology, identify a number of the same sight many car can, types and moving characteristics, can provide real-time basis for intelligent traffic control. This paper first introduces the concept of intelligent transportation system, the importance and the image processing technology in vehicle recognition in statistics, overview of video vehicle detection method, and the video detection technology and other detection technology, puts forward the superiority of video detection technology. Finally we design a real-time and reliable background subtraction method and the area of the vehicle recognition method based on information fusion algorithm, which is implemented with the MATLAB/GUI development tool in Windows operating system platform. In this paper, the application of the algorithm to study the frame traffic flow image. The experimental results show that, the algorithm of recognition of vehicle flow statistics, the effect is very good.
Ranking nodes in growing networks: When PageRank fails
Mariani, Manuel Sebastian; Medo, Matúš; Zhang, Yi-Cheng
2015-11-01
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm’s efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank’s performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
Research Notes Use of the dry-weight-rank method of botanical ...
African Journals Online (AJOL)
When used in combination with the double sampling (or comparative yield) method of yield estimation, the dry-weight-rank method of botanical analysis provides a rapid non-destructive means of estimating botanical composition. The composition is expressed in terms of the contribution of individual species to total herbage ...
Health systems around the world - a comparison of existing health system rankings.
Schütte, Stefanie; Acevedo, Paula N Marin; Flahault, Antoine
2018-06-01
Existing health systems all over the world are different due to the different combinations of components that can be considered for their establishment. The ranking of health systems has been a focal points for many years especially the issue of performance. In 2000 the World Health Organization (WHO) performed a ranking to compare the Performance of the health system of the member countries. Since then other health system rankings have been performed and it became an issue of public discussion. A point of contention regarding these rankings is the methodology employed by each of them, since no gold standard exists. Therefore, this review focuses on evaluating the methodologies of each existing health system performance ranking to assess their reproducibility and transparency. A search was conducted to identify existing health system rankings, and a questionnaire was developed for the comparison of the methodologies based on the following indicators: (1) General information, (2) Statistical methods, (3) Data (4) Indicators. Overall nine rankings were identified whereas six of them focused rather on the measurement of population health without any financial component and were therefore excluded. Finally, three health system rankings were selected for this review: "Health Systems: Improving Performance" by the WHO, "Mirror, Mirror on the wall: How the Performance of the US Health Care System Compares Internationally" by the Commonwealth Fund and "the Most efficient Health Care" by Bloomberg. After the completion of the comparison of the rankings by giving them scores according to the indicators, the ranking performed the WHO was considered the most complete regarding the ability of reproducibility and transparency of the methodology. This review and comparison could help in establishing consensus in the field of health system research. This may also help giving recommendations for future health rankings and evaluating the current gap in the literature.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha
2012-12-01
The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
Feature ranking and rank aggregation for automatic sleep stage classification: a comparative study.
Najdi, Shirin; Gharbali, Ali Abdollahi; Fonseca, José Manuel
2017-08-18
Nowadays, sleep quality is one of the most important measures of healthy life, especially considering the huge number of sleep-related disorders. Identifying sleep stages using polysomnographic (PSG) signals is the traditional way of assessing sleep quality. However, the manual process of sleep stage classification is time-consuming, subjective and costly. Therefore, in order to improve the accuracy and efficiency of the sleep stage classification, researchers have been trying to develop automatic classification algorithms. Automatic sleep stage classification mainly consists of three steps: pre-processing, feature extraction and classification. Since classification accuracy is deeply affected by the extracted features, a poor feature vector will adversely affect the classifier and eventually lead to low classification accuracy. Therefore, special attention should be given to the feature extraction and selection process. In this paper the performance of seven feature selection methods, as well as two feature rank aggregation methods, were compared. Pz-Oz EEG, horizontal EOG and submental chin EMG recordings of 22 healthy males and females were used. A comprehensive feature set including 49 features was extracted from these recordings. The extracted features are among the most common and effective features used in sleep stage classification from temporal, spectral, entropy-based and nonlinear categories. The feature selection methods were evaluated and compared using three criteria: classification accuracy, stability, and similarity. Simulation results show that MRMR-MID achieves the highest classification performance while Fisher method provides the most stable ranking. In our simulations, the performance of the aggregation methods was in the average level, although they are known to generate more stable results and better accuracy. The Borda and RRA rank aggregation methods could not outperform significantly the conventional feature ranking methods. Among
Improve Biomedical Information Retrieval using Modified Learning to Rank Methods.
Xu, Bo; Lin, Hongfei; Lin, Yuan; Ma, Yunlong; Yang, Liang; Wang, Jian; Yang, Zhihao
2016-06-14
In these years, the number of biomedical articles has increased exponentially, which becomes a problem for biologists to capture all the needed information manually. Information retrieval technologies, as the core of search engines, can deal with the problem automatically, providing users with the needed information. However, it is a great challenge to apply these technologies directly for biomedical retrieval, because of the abundance of domain specific terminologies. To enhance biomedical retrieval, we propose a novel framework based on learning to rank. Learning to rank is a series of state-of-the-art information retrieval techniques, and has been proved effective in many information retrieval tasks. In the proposed framework, we attempt to tackle the problem of the abundance of terminologies by constructing ranking models, which focus on not only retrieving the most relevant documents, but also diversifying the searching results to increase the completeness of the resulting list for a given query. In the model training, we propose two novel document labeling strategies, and combine several traditional retrieval models as learning features. Besides, we also investigate the usefulness of different learning to rank approaches in our framework. Experimental results on TREC Genomics datasets demonstrate the effectiveness of our framework for biomedical information retrieval.
Identification of significant features by the Global Mean Rank test.
Klammer, Martin; Dybowski, J Nikolaj; Hoffmann, Daniel; Schaab, Christoph
2014-01-01
With the introduction of omics-technologies such as transcriptomics and proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions of features. With the MeanRank test we aimed at developing a test that performs robustly under these conditions, while favorably scaling with the number of replicates. The test proposed here is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation. In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset. The tests were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies. MeanRank outperformed the other global rank-based methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed similar or better. Furthermore, MeanRank exhibits more consistent behavior regarding the degree of regulation and is robust against the choice of preprocessing methods. MeanRank does not require any imputation of missing values, is easy to understand, and yields results that are easy to interpret. The software implementing the algorithm is freely available for academic and commercial use.
Directory of Open Access Journals (Sweden)
Aihong Ren
2016-01-01
Full Text Available This paper is concerned with a class of fully fuzzy bilevel linear programming problems where all the coefficients and decision variables of both objective functions and the constraints are fuzzy numbers. A new approach based on deviation degree measures and a ranking function method is proposed to solve these problems. We first introduce concepts of the feasible region and the fuzzy optimal solution of a fully fuzzy bilevel linear programming problem. In order to obtain a fuzzy optimal solution of the problem, we apply deviation degree measures to deal with the fuzzy constraints and use a ranking function method of fuzzy numbers to rank the upper and lower level fuzzy objective functions. Then the fully fuzzy bilevel linear programming problem can be transformed into a deterministic bilevel programming problem. Considering the overall balance between improving objective function values and decreasing allowed deviation degrees, the computational procedure for finding a fuzzy optimal solution is proposed. Finally, a numerical example is provided to illustrate the proposed approach. The results indicate that the proposed approach gives a better optimal solution in comparison with the existing method.
Ranking production units according to marginal efficiency contribution
DEFF Research Database (Denmark)
Ghiyasi, Mojtaba; Hougaard, Jens Leth
League tables associated with various forms of service activities from schools to hospitals illustrate the public need for ranking institutions by their productive performance. We present a new method for ranking production units which is based on each units marginal contribution to the technical...
DEFF Research Database (Denmark)
Bohlin, J; Skjerve, E; Ussery, David
2008-01-01
with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies......, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore...... the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection...
Convergence of Inner-Iteration GMRES Methods for Rank-Deficient Least Squares Problems
Czech Academy of Sciences Publication Activity Database
Morikuni, Keiichi; Hayami, K.
2015-01-01
Roč. 36, č. 1 (2015), s. 225-250 ISSN 0895-4798 Institutional support: RVO:67985807 Keywords : least squares problem * iterative methods * preconditioner * inner-outer iteration * GMRES method * stationary iterative method * rank-deficient problem Subject RIV: BA - General Mathematics Impact factor: 1.883, year: 2015
Low rank approximation methods for MR fingerprinting with large scale dictionaries.
Yang, Mingrui; Ma, Dan; Jiang, Yun; Hamilton, Jesse; Seiberlich, Nicole; Griswold, Mark A; McGivney, Debra
2018-04-01
This work proposes new low rank approximation approaches with significant memory savings for large scale MR fingerprinting (MRF) problems. We introduce a compressed MRF with randomized singular value decomposition method to significantly reduce the memory requirement for calculating a low rank approximation of large sized MRF dictionaries. We further relax this requirement by exploiting the structures of MRF dictionaries in the randomized singular value decomposition space and fitting them to low-degree polynomials to generate high resolution MRF parameter maps. In vivo 1.5T and 3T brain scan data are used to validate the approaches. T 1 , T 2 , and off-resonance maps are in good agreement with that of the standard MRF approach. Moreover, the memory savings is up to 1000 times for the MRF-fast imaging with steady-state precession sequence and more than 15 times for the MRF-balanced, steady-state free precession sequence. The proposed compressed MRF with randomized singular value decomposition and dictionary fitting methods are memory efficient low rank approximation methods, which can benefit the usage of MRF in clinical settings. They also have great potentials in large scale MRF problems, such as problems considering multi-component MRF parameters or high resolution in the parameter space. Magn Reson Med 79:2392-2400, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Statistical analysis to assess automated level of suspicion scoring methods in breast ultrasound
Galperin, Michael
2003-05-01
A well-defined rule-based system has been developed for scoring 0-5 the Level of Suspicion (LOS) based on qualitative lexicon describing the ultrasound appearance of breast lesion. The purposes of the research are to asses and select one of the automated LOS scoring quantitative methods developed during preliminary studies in benign biopsies reduction. The study has used Computer Aided Imaging System (CAIS) to improve the uniformity and accuracy of applying the LOS scheme by automatically detecting, analyzing and comparing breast masses. The overall goal is to reduce biopsies on the masses with lower levels of suspicion, rather that increasing the accuracy of diagnosis of cancers (will require biopsy anyway). On complex cysts and fibroadenoma cases experienced radiologists were up to 50% less certain in true negatives than CAIS. Full correlation analysis was applied to determine which of the proposed LOS quantification methods serves CAIS accuracy the best. This paper presents current results of applying statistical analysis for automated LOS scoring quantification for breast masses with known biopsy results. It was found that First Order Ranking method yielded most the accurate results. The CAIS system (Image Companion, Data Companion software) is developed by Almen Laboratories and was used to achieve the results.
High-dimensional statistical inference: From vector to matrix
Zhang, Anru
estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.
Result Diversification Based on Query-Specific Cluster Ranking
J. He (Jiyin); E. Meij; M. de Rijke (Maarten)
2011-01-01
htmlabstractResult diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking,
Akbudak, Kadir; Ltaief, Hatem; Mikhalev, Aleksandr; Keyes, David E.
2017-01-01
Covariance matrices are ubiquitous in computational science and engineering. In particular, large covariance matrices arise from multivariate spatial data sets, for instance, in climate/weather modeling applications to improve prediction using statistical methods and spatial data. One of the most time-consuming computational steps consists in calculating the Cholesky factorization of the symmetric, positive-definite covariance matrix problem. The structure of such covariance matrices is also often data-sparse, in other words, effectively of low rank, though formally dense. While not typically globally of low rank, covariance matrices in which correlation decays with distance are nearly always hierarchically of low rank. While symmetry and positive definiteness should be, and nearly always are, exploited for performance purposes, exploiting low rank character in this context is very recent, and will be a key to solving these challenging problems at large-scale dimensions. The authors design a new and flexible tile row rank Cholesky factorization and propose a high performance implementation using OpenMP task-based programming model on various leading-edge manycore architectures. Performance comparisons and memory footprint saving on up to 200K×200K covariance matrix size show a gain of more than an order of magnitude for both metrics, against state-of-the-art open-source and vendor optimized numerical libraries, while preserving the numerical accuracy fidelity of the original model. This research represents an important milestone in enabling large-scale simulations for covariance-based scientific applications.
Akbudak, Kadir
2017-05-11
Covariance matrices are ubiquitous in computational science and engineering. In particular, large covariance matrices arise from multivariate spatial data sets, for instance, in climate/weather modeling applications to improve prediction using statistical methods and spatial data. One of the most time-consuming computational steps consists in calculating the Cholesky factorization of the symmetric, positive-definite covariance matrix problem. The structure of such covariance matrices is also often data-sparse, in other words, effectively of low rank, though formally dense. While not typically globally of low rank, covariance matrices in which correlation decays with distance are nearly always hierarchically of low rank. While symmetry and positive definiteness should be, and nearly always are, exploited for performance purposes, exploiting low rank character in this context is very recent, and will be a key to solving these challenging problems at large-scale dimensions. The authors design a new and flexible tile row rank Cholesky factorization and propose a high performance implementation using OpenMP task-based programming model on various leading-edge manycore architectures. Performance comparisons and memory footprint saving on up to 200K×200K covariance matrix size show a gain of more than an order of magnitude for both metrics, against state-of-the-art open-source and vendor optimized numerical libraries, while preserving the numerical accuracy fidelity of the original model. This research represents an important milestone in enabling large-scale simulations for covariance-based scientific applications.
Sun, Ying
2012-10-01
© 2012 John Wiley & Sons, Ltd. Band depth is an important nonparametric measure that generalizes order statistics and makes univariate methods based on order statistics possible for functional data. However, the computational burden of band depth limits its applicability when large functional or image datasets are considered. This paper proposes an exact fast method to speed up the band depth computation when bands are defined by two curves. Remarkable computational gains are demonstrated through simulation studies comparing our proposal with the original computation and one existing approximate method. For example, we report an experiment where our method can rank one million curves, evaluated at fifty time points each, in 12.4 seconds with Matlab.
Ranking Based Locality Sensitive Hashing Enabled Cancelable Biometrics: Index-of-Max Hashing
Jin, Zhe; Lai, Yen-Lung; Hwang, Jung-Yeon; Kim, Soohyung; Teoh, Andrew Beng Jin
2017-01-01
In this paper, we propose a ranking based locality sensitive hashing inspired two-factor cancelable biometrics, dubbed "Index-of-Max" (IoM) hashing for biometric template protection. With externally generated random parameters, IoM hashing transforms a real-valued biometric feature vector into discrete index (max ranked) hashed code. We demonstrate two realizations from IoM hashing notion, namely Gaussian Random Projection based and Uniformly Random Permutation based hashing schemes. The disc...
An introduction to statistical computing a simulation-based approach
Voss, Jochen
2014-01-01
A comprehensive introduction to sampling-based methods in statistical computing The use of computers in mathematics and statistics has opened up a wide range of techniques for studying otherwise intractable problems. Sampling-based simulation techniques are now an invaluable tool for exploring statistical models. This book gives a comprehensive introduction to the exciting area of sampling-based methods. An Introduction to Statistical Computing introduces the classical topics of random number generation and Monte Carlo methods. It also includes some advanced met
Complete hazard ranking to analyze right-censored data: An ALS survival study.
Huang, Zhengnan; Zhang, Hongjiu; Boss, Jonathan; Goutman, Stephen A; Mukherjee, Bhramar; Dinov, Ivo D; Guan, Yuanfang
2017-12-01
Survival analysis represents an important outcome measure in clinical research and clinical trials; further, survival ranking may offer additional advantages in clinical trials. In this study, we developed GuanRank, a non-parametric ranking-based technique to transform patients' survival data into a linear space of hazard ranks. The transformation enables the utilization of machine learning base-learners including Gaussian process regression, Lasso, and random forest on survival data. The method was submitted to the DREAM Amyotrophic Lateral Sclerosis (ALS) Stratification Challenge. Ranked first place, the model gave more accurate ranking predictions on the PRO-ACT ALS dataset in comparison to Cox proportional hazard model. By utilizing right-censored data in its training process, the method demonstrated its state-of-the-art predictive power in ALS survival ranking. Its feature selection identified multiple important factors, some of which conflicts with previous studies.
Result diversification based on query-specific cluster ranking
He, J.; Meij, E.; de Rijke, M.
2011-01-01
Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification
Rank-defective millimeter-wave channel estimation based on subspace-compressive sensing
Directory of Open Access Journals (Sweden)
Majid Shakhsi Dastgahian
2016-11-01
Full Text Available Millimeter-wave communication (mmWC is considered as one of the pioneer candidates for 5G indoor and outdoor systems in E-band. To subdue the channel propagation characteristics in this band, high dimensional antenna arrays need to be deployed at both the base station (BS and mobile sets (MS. Unlike the conventional MIMO systems, Millimeter-wave (mmW systems lay away to employ the power predatory equipment such as ADC or RF chain in each branch of MIMO system because of hardware constraints. Such systems leverage to the hybrid precoding (combining architecture for downlink deployment. Because there is a large array at the transceiver, it is impossible to estimate the channel by conventional methods. This paper develops a new algorithm to estimate the mmW channel by exploiting the sparse nature of the channel. The main contribution is the representation of a sparse channel model and the exploitation of a modified approach based on Multiple Measurement Vector (MMV greedy sparse framework and subspace method of Multiple Signal Classification (MUSIC which work together to recover the indices of non-zero elements of an unknown channel matrix when the rank of the channel matrix is defected. In practical rank-defective channels, MUSIC fails, and we need to propose new extended MUSIC approaches based on subspace enhancement to compensate the limitation of MUSIC. Simulation results indicate that our proposed extended MUSIC algorithms will have proper performances and moderate computational speeds, and that they are even able to work in channels with an unknown sparsity level.
The use of fuzzy real option valuation method to rank Giga ...
African Journals Online (AJOL)
The use of fuzzy real option valuation method to rank Giga Investment Projects on Iran's natural gas reserves. ... Journal of Fundamental and Applied Sciences ... methodology – discounted cash flow analysis – in valuation of Giga investments.
Application of pedagogy reflective in statistical methods course and practicum statistical methods
Julie, Hongki
2017-08-01
Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.
New public management based on rankings: From plann ing to evaluation
Directory of Open Access Journals (Sweden)
Andrés Valdez Zepeda
2017-11-01
Full Text Available This article focuses on the emergence and development of a new trend of public affairs and global government management known as ranking-based management. This type of management process is the result of performance measurement usually conducted by an external agent or prestigious institution, which generally uses a methodology based on indicators and audits. It also evaluates the results, achievements and progress in governance, which it ranks on a list on which they are compared against other comparable governments. As a global trend, supported by management rankings this process is not seen as an option, but as a real requirement for public agencies and government, which not only helps them in the process of continuous improvement, but also creates important incentives such as prestige, social recognition, construction and better branding.
A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH
Directory of Open Access Journals (Sweden)
K. Latha
2010-07-01
Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method
Omranian, Nooshin; Mueller-Roeber, Bernd; Nikoloski, Zoran
2012-04-01
The levels of cellular organization, from gene transcription to translation to protein-protein interaction and metabolism, operate via tightly regulated mutual interactions, facilitating organismal adaptability and various stress responses. Characterizing the mutual interactions between genes, transcription factors, and proteins involved in signaling, termed crosstalk, is therefore crucial for understanding and controlling cells' functionality. We aim at using high-throughput transcriptomics data to discover previously unknown links between signaling networks. We propose and analyze a novel method for crosstalk identification which relies on transcriptomics data and overcomes the lack of complete information for signaling pathways in Arabidopsis thaliana. Our method first employs a network-based transformation of the results from the statistical analysis of differential gene expression in given groups of experiments under different signal-inducing conditions. The stationary distribution of a random walk (similar to the PageRank algorithm) on the constructed network is then used to determine the putative transcripts interrelating different signaling pathways. With the help of the proposed method, we analyze a transcriptomics data set including experiments from four different stresses/signals: nitrate, sulfur, iron, and hormones. We identified promising gene candidates, downstream of the transcription factors (TFs), associated to signaling crosstalk, which were validated through literature mining. In addition, we conduct a comparative analysis with the only other available method in this field which used a biclustering-based approach. Surprisingly, the biclustering-based approach fails to robustly identify any candidate genes involved in the crosstalk of the analyzed signals. We demonstrate that our proposed method is more robust in identifying gene candidates involved downstream of the signaling crosstalk for species for which large transcriptomics data sets
Indirect two-sided relative ranking: a robust similarity measure for gene expression data
Directory of Open Access Journals (Sweden)
Licamele Louis
2010-03-01
Full Text Available Abstract Background There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. Results In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. Conclusions We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public. We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related cell types. Even in the absence of a known (i.e., labeled experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.
A tilting approach to ranking influence
Genton, Marc G.; Hall, Peter
2014-01-01
We suggest a new approach, which is applicable for general statistics computed from random samples of univariate or vector-valued or functional data, to assessing the influence that individual data have on the value of a statistic, and to ranking
International Nuclear Information System (INIS)
Xu, Q; Liu, H; Xing, L; Yu, H; Wang, G
2016-01-01
Purpose: Spectral CT enabled by an energy-resolved photon-counting detector outperforms conventional CT in terms of material discrimination, contrast resolution, etc. One reconstruction method for spectral CT is to generate a color image from a reconstructed component in each energy channel. However, given the radiation dose, the number of photons in each channel is limited, which will result in strong noise in each channel and affect the final color reconstruction. Here we propose a novel dictionary learning method for spectral CT that combines dictionary-based sparse representation method and the patch based low-rank constraint to simultaneously improve the reconstruction in each channel and to address the inter-channel correlations to further improve the reconstruction. Methods: The proposed method has two important features: (1) guarantee of the patch based sparsity in each energy channel, which is the result of the dictionary based sparse representation constraint; (2) the explicit consideration of the correlations among different energy channels, which is realized by patch-by-patch nuclear norm-based low-rank constraint. For each channel, the dictionary consists of two sub-dictionaries. One is learned from the average of the images in all energy channels, and the other is learned from the average of the images in all energy channels except the current channel. With average operation to reduce noise, these two dictionaries can effectively preserve the structural details and get rid of artifacts caused by noise. Combining them together can express all structural information in current channel. Results: Dictionary learning based methods can obtain better results than FBP and the TV-based method. With low-rank constraint, the image quality can be further improved in the channel with more noise. The final color result by the proposed method has the best visual quality. Conclusion: The proposed method can effectively improve the image quality of low-dose spectral
Energy Technology Data Exchange (ETDEWEB)
Xu, Q [Xi’an Jiaotong University, Xi’an (China); Stanford University School of Medicine, Stanford, CA (United States); Liu, H; Xing, L [Stanford University School of Medicine, Stanford, CA (United States); Yu, H [University of Massachusetts Lowell, Lowell, MA (United States); Wang, G [Rensselaer Polytechnic Instute., Troy, NY (United States)
2016-06-15
Purpose: Spectral CT enabled by an energy-resolved photon-counting detector outperforms conventional CT in terms of material discrimination, contrast resolution, etc. One reconstruction method for spectral CT is to generate a color image from a reconstructed component in each energy channel. However, given the radiation dose, the number of photons in each channel is limited, which will result in strong noise in each channel and affect the final color reconstruction. Here we propose a novel dictionary learning method for spectral CT that combines dictionary-based sparse representation method and the patch based low-rank constraint to simultaneously improve the reconstruction in each channel and to address the inter-channel correlations to further improve the reconstruction. Methods: The proposed method has two important features: (1) guarantee of the patch based sparsity in each energy channel, which is the result of the dictionary based sparse representation constraint; (2) the explicit consideration of the correlations among different energy channels, which is realized by patch-by-patch nuclear norm-based low-rank constraint. For each channel, the dictionary consists of two sub-dictionaries. One is learned from the average of the images in all energy channels, and the other is learned from the average of the images in all energy channels except the current channel. With average operation to reduce noise, these two dictionaries can effectively preserve the structural details and get rid of artifacts caused by noise. Combining them together can express all structural information in current channel. Results: Dictionary learning based methods can obtain better results than FBP and the TV-based method. With low-rank constraint, the image quality can be further improved in the channel with more noise. The final color result by the proposed method has the best visual quality. Conclusion: The proposed method can effectively improve the image quality of low-dose spectral
Research on the Fusion of Dependent Evidence Based on Rank Correlation Coefficient
Directory of Open Access Journals (Sweden)
Fengjian Shi
2017-10-01
Full Text Available In order to meet the higher accuracy and system reliability requirements, the information fusion for multi-sensor systems is an increasing concern. Dempster–Shafer evidence theory (D–S theory has been investigated for many applications in multi-sensor information fusion due to its flexibility in uncertainty modeling. However, classical evidence theory assumes that the evidence is independent of each other, which is often unrealistic. Ignoring the relationship between the evidence may lead to unreasonable fusion results, and even lead to wrong decisions. This assumption severely prevents D–S evidence theory from practical application and further development. In this paper, an innovative evidence fusion model to deal with dependent evidence based on rank correlation coefficient is proposed. The model first uses rank correlation coefficient to measure the dependence degree between different evidence. Then, total discount coefficient is obtained based on the dependence degree, which also considers the impact of the reliability of evidence. Finally, the discount evidence fusion model is presented. An example is illustrated to show the use and effectiveness of the proposed method.
Research on the Fusion of Dependent Evidence Based on Rank Correlation Coefficient.
Shi, Fengjian; Su, Xiaoyan; Qian, Hong; Yang, Ning; Han, Wenhua
2017-10-16
In order to meet the higher accuracy and system reliability requirements, the information fusion for multi-sensor systems is an increasing concern. Dempster-Shafer evidence theory (D-S theory) has been investigated for many applications in multi-sensor information fusion due to its flexibility in uncertainty modeling. However, classical evidence theory assumes that the evidence is independent of each other, which is often unrealistic. Ignoring the relationship between the evidence may lead to unreasonable fusion results, and even lead to wrong decisions. This assumption severely prevents D-S evidence theory from practical application and further development. In this paper, an innovative evidence fusion model to deal with dependent evidence based on rank correlation coefficient is proposed. The model first uses rank correlation coefficient to measure the dependence degree between different evidence. Then, total discount coefficient is obtained based on the dependence degree, which also considers the impact of the reliability of evidence. Finally, the discount evidence fusion model is presented. An example is illustrated to show the use and effectiveness of the proposed method.
Whole vertebral bone segmentation method with a statistical intensity-shape model based approach
Hanaoka, Shouhei; Fritscher, Karl; Schuler, Benedikt; Masutani, Yoshitaka; Hayashi, Naoto; Ohtomo, Kuni; Schubert, Rainer
2011-03-01
An automatic segmentation algorithm for the vertebrae in human body CT images is presented. Especially we focused on constructing and utilizing 4 different statistical intensity-shape combined models for the cervical, upper / lower thoracic and lumbar vertebrae, respectively. For this purpose, two previously reported methods were combined: a deformable model-based initial segmentation method and a statistical shape-intensity model-based precise segmentation method. The former is used as a pre-processing to detect the position and orientation of each vertebra, which determines the initial condition for the latter precise segmentation method. The precise segmentation method needs prior knowledge on both the intensities and the shapes of the objects. After PCA analysis of such shape-intensity expressions obtained from training image sets, vertebrae were parametrically modeled as a linear combination of the principal component vectors. The segmentation of each target vertebra was performed as fitting of this parametric model to the target image by maximum a posteriori estimation, combined with the geodesic active contour method. In the experimental result by using 10 cases, the initial segmentation was successful in 6 cases and only partially failed in 4 cases (2 in the cervical area and 2 in the lumbo-sacral). In the precise segmentation, the mean error distances were 2.078, 1.416, 0.777, 0.939 mm for cervical, upper and lower thoracic, lumbar spines, respectively. In conclusion, our automatic segmentation algorithm for the vertebrae in human body CT images showed a fair performance for cervical, thoracic and lumbar vertebrae.
Directory of Open Access Journals (Sweden)
Yun Tian
2016-01-01
Full Text Available The segmentation of coronary arteries is a vital process that helps cardiovascular radiologists detect and quantify stenosis. In this paper, we propose a fully automated coronary artery segmentation from cardiac data volume. The method is built on a statistics region growing together with a heuristic decision. First, the heart region is extracted using a multi-atlas-based approach. Second, the vessel structures are enhanced via a 3D multiscale line filter. Next, seed points are detected automatically through a threshold preprocessing and a subsequent morphological operation. Based on the set of detected seed points, a statistics-based region growing is applied. Finally, results are obtained by setting conservative parameters. A heuristic decision method is then used to obtain the desired result automatically because parameters in region growing vary in different patients, and the segmentation requires full automation. The experiments are carried out on a dataset that includes eight-patient multivendor cardiac computed tomography angiography (CTA volume data. The DICE similarity index, mean distance, and Hausdorff distance metrics are employed to compare the proposed algorithm with two state-of-the-art methods. Experimental results indicate that the proposed algorithm is capable of performing complete, robust, and accurate extraction of coronary arteries.
Quantified Risk Ranking Model for Condition-Based Risk and Reliability Centered Maintenance
Chattopadhyaya, Pradip Kumar; Basu, Sushil Kumar; Majumdar, Manik Chandra
2017-06-01
In the recent past, risk and reliability centered maintenance (RRCM) framework is introduced with a shift in the methodological focus from reliability and probabilities (expected values) to reliability, uncertainty and risk. In this paper authors explain a novel methodology for risk quantification and ranking the critical items for prioritizing the maintenance actions on the basis of condition-based risk and reliability centered maintenance (CBRRCM). The critical items are identified through criticality analysis of RPN values of items of a system and the maintenance significant precipitating factors (MSPF) of items are evaluated. The criticality of risk is assessed using three risk coefficients. The likelihood risk coefficient treats the probability as a fuzzy number. The abstract risk coefficient deduces risk influenced by uncertainty, sensitivity besides other factors. The third risk coefficient is called hazardous risk coefficient, which is due to anticipated hazards which may occur in the future and the risk is deduced from criteria of consequences on safety, environment, maintenance and economic risks with corresponding cost for consequences. The characteristic values of all the three risk coefficients are obtained with a particular test. With few more tests on the system, the values may change significantly within controlling range of each coefficient, hence `random number simulation' is resorted to obtain one distinctive value for each coefficient. The risk coefficients are statistically added to obtain final risk coefficient of each critical item and then the final rankings of critical items are estimated. The prioritization in ranking of critical items using the developed mathematical model for risk assessment shall be useful in optimization of financial losses and timing of maintenance actions.
Tucker tensor analysis of Matern functions in spatial statistics
Litvinenko, Alexander
2018-04-20
Low-rank Tucker tensor methods in spatial statistics 1. Motivation: improve statistical models 2. Motivation: disadvantages of matrices 3. Tools: Tucker tensor format 4. Tensor approximation of Matern covariance function via FFT 5. Typical statistical operations in Tucker tensor format 6. Numerical experiments
Complete hazard ranking to analyze right-censored data: An ALS survival study.
Directory of Open Access Journals (Sweden)
Zhengnan Huang
2017-12-01
Full Text Available Survival analysis represents an important outcome measure in clinical research and clinical trials; further, survival ranking may offer additional advantages in clinical trials. In this study, we developed GuanRank, a non-parametric ranking-based technique to transform patients' survival data into a linear space of hazard ranks. The transformation enables the utilization of machine learning base-learners including Gaussian process regression, Lasso, and random forest on survival data. The method was submitted to the DREAM Amyotrophic Lateral Sclerosis (ALS Stratification Challenge. Ranked first place, the model gave more accurate ranking predictions on the PRO-ACT ALS dataset in comparison to Cox proportional hazard model. By utilizing right-censored data in its training process, the method demonstrated its state-of-the-art predictive power in ALS survival ranking. Its feature selection identified multiple important factors, some of which conflicts with previous studies.
The ranking of negative-cost emissions reduction measures
International Nuclear Information System (INIS)
Taylor, Simon
2012-01-01
A flaw has been identified in the calculation of the cost-effectiveness in marginal abatement cost curves (MACCs). The problem affects “negative-cost” emissions reduction measures—those that produce a return on investment. The resulting ranking sometimes favours measures that produce low emissions savings and is therefore unreliable. The issue is important because incorrect ranking means a potential failure to achieve the best-value outcome. A simple mathematical analysis shows that not only is the standard cost-effectiveness calculation inadequate for ranking negative-cost measures, but there is no possible replacement that satisfies reasonable requirements. Furthermore, the concept of negative cost-effectiveness is found to be unsound and its use should be avoided. Among other things, this means that MACCs are unsuitable for ranking negative-cost measures. As a result, MACCs produced by a range of organizations including UK government departments may need to be revised. An alternative partial ranking method has been devised by making use of Pareto optimization. The outcome can be presented as a stacked bar chart that indicates both the preferred ordering and the total emissions saving available for each measure without specifying a cost-effectiveness. - Highlights: ► Marginal abatement cost curves (MACCs) are used to rank emission reduction measures. ► There is a flaw in the standard ranking method for negative-cost measures. ► Negative values of cost-effectiveness (in £/tC or equivalent) are invalid. ► There may be errors in published MACCs. ► A method based on Pareto principles provides an alternative ranking method.
Dilger, Alexander; Müller, Harry
2013-01-01
Rankings of academics can be constructed in two different ways, either based on journal rankings or based on citations. Although citation-based rankings promise some fundamental advantages they are still not common in German-speaking business administration. However, the choice of the underlying database is crucial. This article argues that for…
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Khoromskaia, Venera; Khoromskij, Boris N.
2014-12-01
Our recent method for low-rank tensor representation of sums of the arbitrarily positioned electrostatic potentials discretized on a 3D Cartesian grid reduces the 3D tensor summation to operations involving only 1D vectors however retaining the linear complexity scaling in the number of potentials. Here, we introduce and study a novel tensor approach for fast and accurate assembled summation of a large number of lattice-allocated potentials represented on 3D N × N × N grid with the computational requirements only weakly dependent on the number of summed potentials. It is based on the assembled low-rank canonical tensor representations of the collected potentials using pointwise sums of shifted canonical vectors representing the single generating function, say the Newton kernel. For a sum of electrostatic potentials over L × L × L lattice embedded in a box the required storage scales linearly in the 1D grid-size, O(N) , while the numerical cost is estimated by O(NL) . For periodic boundary conditions, the storage demand remains proportional to the 1D grid-size of a unit cell, n = N / L, while the numerical cost reduces to O(N) , that outperforms the FFT-based Ewald-type summation algorithms of complexity O(N3 log N) . The complexity in the grid parameter N can be reduced even to the logarithmic scale O(log N) by using data-sparse representation of canonical N-vectors via the quantics tensor approximation. For justification, we prove an upper bound on the quantics ranks for the canonical vectors in the overall lattice sum. The presented approach is beneficial in applications which require further functional calculus with the lattice potential, say, scalar product with a function, integration or differentiation, which can be performed easily in tensor arithmetics on large 3D grids with 1D cost. Numerical tests illustrate the performance of the tensor summation method and confirm the estimated bounds on the tensor ranks.
Development and first application of an operating events ranking tool
International Nuclear Information System (INIS)
Šimić, Zdenko; Zerger, Benoit; Banov, Reni
2015-01-01
Highlights: • A method using analitycal hierarchy process for ranking operating events is developed and tested. • The method is applied for 5 years of U.S. NRC Licensee Event Reports (1453 events). • Uncertainty and sensitivity of the ranking results are evaluated. • Real events assessment shows potential of the method for operating experience feedback. - Abstract: The operating experience feedback is important for maintaining and improving safety and availability in nuclear power plants. Detailed investigation of all events is challenging since it requires excessive resources, especially in case of large event databases. This paper presents an event groups ranking method to complement the analysis of individual operating events. The basis for the method is the use of an internationally accepted events characterization scheme that allows different ways of events grouping and ranking. The ranking method itself consists of implementing the analytical hierarchy process (AHP) by means of a custom developed tool which allows events ranking based on ranking indexes pre-determined by expert judgment. Following the development phase, the tool was applied to analyze a complete set of 5 years of real nuclear power plants operating events (1453 events). The paper presents the potential of this ranking method to identify possible patterns throughout the event database and therefore to give additional insights into the events as well as to give quantitative input for the prioritization of further more detailed investigation of selected event groups
Social norms and rank-based nudging: Changing willingness to pay for healthy food.
Aldrovandi, Silvio; Brown, Gordon D A; Wood, Alex M
2015-09-01
People's evaluations in the domain of healthy eating are at least partly determined by the choice context. We systematically test reference level and rank-based models of relative comparisons against each other and explore their application to social norms nudging, an intervention that aims at influencing consumers' behavior by addressing their inaccurate beliefs about their consumption relative to the consumption of others. Study 1 finds that the rank of a product or behavior among others in the immediate comparison context, rather than its objective attributes, influences its evaluation. Study 2 finds that when a comparator is presented in isolation the same rank-based process occurs based on information retrieved from memory. Study 3 finds that telling people how their consumption ranks within a normative comparison sample increases willingness to pay for a healthy food by over 30% relative to the normal social norms intervention that tells them how they compare to the average. We conclude that social norms interventions should present rank information (e.g., "you are in the most unhealthy 10% of eaters") rather than information relative to the average (e.g., "you consume 500 calories more than the average person"). (c) 2015 APA, all rights reserved).
Ranking Adverse Drug Reactions With Crowdsourcing
Gottlieb, Assaf
2015-03-23
Background: There is no publicly available resource that provides the relative severity of adverse drug reactions (ADRs). Such a resource would be useful for several applications, including assessment of the risks and benefits of drugs and improvement of patient-centered care. It could also be used to triage predictions of drug adverse events. Objective: The intent of the study was to rank ADRs according to severity. Methods: We used Internet-based crowdsourcing to rank ADRs according to severity. We assigned 126,512 pairwise comparisons of ADRs to 2589 Amazon Mechanical Turk workers and used these comparisons to rank order 2929 ADRs. Results: There is good correlation (rho=.53) between the mortality rates associated with ADRs and their rank. Our ranking highlights severe drug-ADR predictions, such as cardiovascular ADRs for raloxifene and celecoxib. It also triages genes associated with severe ADRs such as epidermal growth-factor receptor (EGFR), associated with glioblastoma multiforme, and SCN1A, associated with epilepsy. Conclusions: ADR ranking lays a first stepping stone in personalized drug risk assessment. Ranking of ADRs using crowdsourcing may have useful clinical and financial implications, and should be further investigated in the context of health care decision making.
Directory of Open Access Journals (Sweden)
Salomon Joshua A
2003-12-01
Full Text Available Abstract Background In survey studies on health-state valuations, ordinal ranking exercises often are used as precursors to other elicitation methods such as the time trade-off (TTO or standard gamble, but the ranking data have not been used in deriving cardinal valuations. This study reconsiders the role of ordinal ranks in valuing health and introduces a new approach to estimate interval-scaled valuations based on aggregate ranking data. Methods Analyses were undertaken on data from a previously published general population survey study in the United Kingdom that included rankings and TTO values for hypothetical states described using the EQ-5D classification system. The EQ-5D includes five domains (mobility, self-care, usual activities, pain/discomfort and anxiety/depression with three possible levels on each. Rank data were analysed using a random utility model, operationalized through conditional logit regression. In the statistical model, probabilities of observed rankings were related to the latent utilities of different health states, modeled as a linear function of EQ-5D domain scores, as in previously reported EQ-5D valuation functions. Predicted valuations based on the conditional logit model were compared to observed TTO values for the 42 states in the study and to predictions based on a model estimated directly from the TTO values. Models were evaluated using the intraclass correlation coefficient (ICC between predictions and mean observations, and the root mean squared error of predictions at the individual level. Results Agreement between predicted valuations from the rank model and observed TTO values was very high, with an ICC of 0.97, only marginally lower than for predictions based on the model estimated directly from TTO values (ICC = 0.99. Individual-level errors were also comparable in the two models, with root mean squared errors of 0.503 and 0.496 for the rank-based and TTO-based predictions, respectively. Conclusions
INTEL: Intel based systems move up in supercomputing ranks
2002-01-01
"The TOP500 supercomputer rankings released today at the Supercomputing 2002 conference show a dramatic increase in the number of Intel-based systems being deployed in high-performance computing (HPC) or supercomputing areas" (1/2 page).
Glushak, P. A.; Markiv, B. B.; Tokarchuk, M. V.
2018-01-01
We present a generalization of Zubarev's nonequilibrium statistical operator method based on the principle of maximum Renyi entropy. In the framework of this approach, we obtain transport equations for the basic set of parameters of the reduced description of nonequilibrium processes in a classical system of interacting particles using Liouville equations with fractional derivatives. For a classical systems of particles in a medium with a fractal structure, we obtain a non-Markovian diffusion equation with fractional spatial derivatives. For a concrete model of the frequency dependence of a memory function, we obtain generalized Kettano-type diffusion equation with the spatial and temporal fractality taken into account. We present a generalization of nonequilibrium thermofield dynamics in Zubarev's nonequilibrium statistical operator method in the framework of Renyi statistics.
Hyper-local, directions-based ranking of places
DEFF Research Database (Denmark)
Venetis, Petros; Gonzalez, Hector; Jensen, Christian S.
2011-01-01
they are numerous and contain precise locations. Specifically, the paper proposes a framework that takes a user location and a collection of near-by places as arguments, producing a ranking of the places. The framework enables a range of aspects of directions queries to be exploited for the ranking of places......, including the frequency with which places have been referred to in directions queries. Next, the paper proposes an algorithm and accompanying data structures capable of ranking places in response to hyper-local web queries. Finally, an empirical study with very large directions query logs offers insight...... into the potential of directions queries for the ranking of places and suggests that the proposed algorithm is suitable for use in real web search engines....
Neophilia Ranking of Scientific Journals.
Packalen, Mikko; Bhattacharya, Jay
2017-01-01
The ranking of scientific journals is important because of the signal it sends to scientists about what is considered most vital for scientific progress. Existing ranking systems focus on measuring the influence of a scientific paper (citations)-these rankings do not reward journals for publishing innovative work that builds on new ideas. We propose an alternative ranking based on the proclivity of journals to publish papers that build on new ideas, and we implement this ranking via a text-based analysis of all published biomedical papers dating back to 1946. In addition, we compare our neophilia ranking to citation-based (impact factor) rankings; this comparison shows that the two ranking approaches are distinct. Prior theoretical work suggests an active role for our neophilia index in science policy. Absent an explicit incentive to pursue novel science, scientists underinvest in innovative work because of a coordination problem: for work on a new idea to flourish, many scientists must decide to adopt it in their work. Rankings that are based purely on influence thus do not provide sufficient incentives for publishing innovative work. By contrast, adoption of the neophilia index as part of journal-ranking procedures by funding agencies and university administrators would provide an explicit incentive for journals to publish innovative work and thus help solve the coordination problem by increasing scientists' incentives to pursue innovative work.
Ranking of Developing Countries Based on the Economic Freedom Index
Zirak, Masoumeh; Mehrara, Mohsen
2013-01-01
In this paper we’ve ranked developing countries based on the Economic Freedom index. Therefore we are trying to do the analysis how this ranking is done using numerical taxonomic methodology. To do this, by estimating the effects of the determinants of FDI in 123 developing countries from 1997 to 2010, results showed that with regard to the degree of economic freedom or Economic openness, attract foreign direct investment in each country is different. In this study china, Equator, Liberia, Az...
Hierarchical low-rank approximation for high dimensional approximation
Nouy, Anthony
2016-01-07
Tensor methods are among the most prominent tools for the numerical solution of high-dimensional problems where functions of multiple variables have to be approximated. Such high-dimensional approximation problems naturally arise in stochastic analysis and uncertainty quantification. In many practical situations, the approximation of high-dimensional functions is made computationally tractable by using rank-structured approximations. In this talk, we present algorithms for the approximation in hierarchical tensor format using statistical methods. Sparse representations in a given tensor format are obtained with adaptive or convex relaxation methods, with a selection of parameters using crossvalidation methods.
Hierarchical low-rank approximation for high dimensional approximation
Nouy, Anthony
2016-01-01
Tensor methods are among the most prominent tools for the numerical solution of high-dimensional problems where functions of multiple variables have to be approximated. Such high-dimensional approximation problems naturally arise in stochastic analysis and uncertainty quantification. In many practical situations, the approximation of high-dimensional functions is made computationally tractable by using rank-structured approximations. In this talk, we present algorithms for the approximation in hierarchical tensor format using statistical methods. Sparse representations in a given tensor format are obtained with adaptive or convex relaxation methods, with a selection of parameters using crossvalidation methods.
A Direct Elliptic Solver Based on Hierarchically Low-Rank Schur Complements
Chávez, Gustavo
2017-03-17
A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with O(Nlog2N) arithmetic complexity and O(NlogN) memory footprint. We provide a baseline for performance and applicability by comparing with well-known implementations of the $$\\\\mathcal{H}$$ -LU factorization and algebraic multigrid within a shared-memory parallel environment that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as $$\\\\mathcal{H}$$ -LU and that it can tackle problems where algebraic multigrid fails to converge.
It's all relative: ranking the diversity of aquatic bacterial communities.
Shaw, Allison K; Halpern, Aaron L; Beeson, Karen; Tran, Bao; Venter, J Craig; Martiny, Jennifer B H
2008-09-01
The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.
Statistical methods for evaluating the attainment of cleanup standards
Energy Technology Data Exchange (ETDEWEB)
Gilbert, R.O.; Simpson, J.C.
1992-12-01
This document is the third volume in a series of volumes sponsored by the US Environmental Protection Agency (EPA), Statistical Policy Branch, that provide statistical methods for evaluating the attainment of cleanup Standards at Superfund sites. Volume 1 (USEPA 1989a) provides sampling designs and tests for evaluating attainment of risk-based standards for soils and solid media. Volume 2 (USEPA 1992) provides designs and tests for evaluating attainment of risk-based standards for groundwater. The purpose of this third volume is to provide statistical procedures for designing sampling programs and conducting statistical tests to determine whether pollution parameters in remediated soils and solid media at Superfund sites attain site-specific reference-based standards. This.document is written for individuals who may not have extensive training or experience with statistical methods. The intended audience includes EPA regional remedial project managers, Superfund-site potentially responsible parties, state environmental protection agencies, and contractors for these groups.
THE GROWTH POINTS OF STATISTICAL METHODS
Orlov A. I.
2014-01-01
On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data
International Nuclear Information System (INIS)
Schreibmann, Eduard; Xing Lei
2005-01-01
Purpose: Beam orientation optimization in intensity-modulated radiation therapy (IMRT) is computationally intensive, and various single beam ranking techniques have been proposed to reduce the search space. Up to this point, none of the existing ranking techniques considers the clinically important dose-volume effects of the involved structures, which may lead to clinically irrelevant angular ranking. The purpose of this work is to develop a clinically sensible angular ranking model with incorporation of dose-volume effects and to show its utility for IMRT beam placement. Methods and Materials: The general consideration in constructing this angular ranking function is that a beamlet/beam is preferable if it can deliver a higher dose to the target without exceeding the tolerance of the sensitive structures located on the path of the beamlet/beam. In the previously proposed dose-based approach, the beamlets are treated independently and, to compute the maximally deliverable dose to the target volume, the intensity of each beamlet is pushed to its maximum intensity without considering the values of other beamlets. When volumetric structures are involved, the complication arises from the fact that there are numerous dose distributions corresponding to the same dose-volume tolerance. In this situation, the beamlets are not independent and an optimization algorithm is required to find the intensity profile that delivers the maximum target dose while satisfying the volumetric constraints. In this study, the behavior of a volumetric organ was modeled by using the equivalent uniform dose (EUD). A constrained sequential quadratic programming algorithm (CFSQP) was used to find the beam profile that delivers the maximum dose to the target volume without violating the EUD constraint or constraints. To assess the utility of the proposed technique, we planned a head-and-neck and abdominal case with and without the guidance of the angular ranking information. The qualities of the
Feasibility study of component risk ranking for plant maintenance
International Nuclear Information System (INIS)
Ushijima, Koji; Yonebayashi, Kenji; Narumiya, Yoshiyuki; Sakata, Kaoru; Kumano, Tetsuji
1999-01-01
Nuclear power is the base load electricity source in Japan, and reduction of operation and maintenance cost maintaining or improving plant safety is one of the major issues. Recently, Risk Informed Management (RIM) is focused as a solution. In this paper, the outline regarding feasibility study of component risk ranking for plant maintenance for a typical Japanese PWR plant is described. A feasibility study of component risk raking for plant maintenance optimization is performed on check valves and motor-operated valves. Risk ranking is performed in two steps using probabilistic analysis (quantitative method) for risk ranking of components, and deterministic examination (qualitative method) for component review. In this study, plant components are ranked from the viewpoint of plant safety / reliability, and the applicability for maintenance is assessed. As a result, distribution of maintenance resources using risk ranking is considered effective. (author)
Microseismic Event Relocation and Focal Mechanism Estimation Based on PageRank Linkage
Aguiar, A. C.; Myers, S. C.
2017-12-01
Microseismicity associated with enhanced geothermal systems (EGS) is key in understanding how subsurface stimulation can modify stress, fracture rock, and increase permeability. Large numbers of microseismic events are commonly associated with hydroshearing an EGS, making data mining methods useful in their analysis. We focus on PageRank, originally developed as Google's search engine, and subsequently adapted for use in seismology to detect low-frequency earthquakes by linking events directly and indirectly through cross-correlation (Aguiar and Beroza, 2014). We expand on this application by using PageRank to define signal-correlation topology for micro-earthquakes from the Newberry Volcano EGS in Central Oregon, which has been stimulated two times using high-pressure fluid injection. We create PageRank signal families from both data sets and compare these to the spatial and temporal proximity of associated earthquakes. PageRank families are relocated using differential travel times measured by waveform cross-correlation (CC) and the Bayesloc approach (Myers et al., 2007). Prior to relocation events are loosely clustered with events at a distance from the cluster. After relocation, event families are found to be tightly clustered. Indirect linkage of signals using PageRank is a reliable way to increase the number of events confidently determined to be similar, suggesting an efficient and effective grouping of earthquakes with similar physical characteristics (ie. location, focal mechanism, stress drop). We further explore the possibility of using PageRank families to identify events with similar relative phase polarities and estimate focal mechanisms following Shelly et al. (2016) method, where CC measurements are used to determine individual polarities within event clusters. Given a positive result, PageRank might be a useful tool in adaptive approaches to enhance production at well-instrumented geothermal sites. Prepared by LLNL under Contract DE-AC52-07NA27344
Hierarchical partial order ranking
International Nuclear Information System (INIS)
Carlsen, Lars
2008-01-01
Assessing the potential impact on environmental and human health from the production and use of chemicals or from polluted sites involves a multi-criteria evaluation scheme. A priori several parameters are to address, e.g., production tonnage, specific release scenarios, geographical and site-specific factors in addition to various substance dependent parameters. Further socio-economic factors may be taken into consideration. The number of parameters to be included may well appear to be prohibitive for developing a sensible model. The study introduces hierarchical partial order ranking (HPOR) that remedies this problem. By HPOR the original parameters are initially grouped based on their mutual connection and a set of meta-descriptors is derived representing the ranking corresponding to the single groups of descriptors, respectively. A second partial order ranking is carried out based on the meta-descriptors, the final ranking being disclosed though average ranks. An illustrative example on the prioritisation of polluted sites is given. - Hierarchical partial order ranking of polluted sites has been developed for prioritization based on a large number of parameters
Mao, Shasha; Xiong, Lin; Jiao, Licheng; Feng, Tian; Yeung, Sai-Kit
2017-05-01
Riemannian optimization has been widely used to deal with the fixed low-rank matrix completion problem, and Riemannian metric is a crucial factor of obtaining the search direction in Riemannian optimization. This paper proposes a new Riemannian metric via simultaneously considering the Riemannian geometry structure and the scaling information, which is smoothly varying and invariant along the equivalence class. The proposed metric can make a tradeoff between the Riemannian geometry structure and the scaling information effectively. Essentially, it can be viewed as a generalization of some existing metrics. Based on the proposed Riemanian metric, we also design a Riemannian nonlinear conjugate gradient algorithm, which can efficiently solve the fixed low-rank matrix completion problem. By experimenting on the fixed low-rank matrix completion, collaborative filtering, and image and video recovery, it illustrates that the proposed method is superior to the state-of-the-art methods on the convergence efficiency and the numerical performance.
Overview of statistical methods, models and analysis for predicting equipment end of life
Energy Technology Data Exchange (ETDEWEB)
NONE
2009-07-01
Utility equipment can be operated and maintained for many years following installation. However, as the equipment ages, utility operators must decide whether to extend the service life or replace the equipment. Condition assessment modelling is used by many utilities to determine the condition of equipment and to prioritize the maintenance or repair. Several factors are weighted and combined in assessment modelling, which gives a single index number to rate the equipment. There is speculation that this index alone may not be adequate for a business case to rework or replace an asset because it only ranks an asset into a particular category. For that reason, a new methodology was developed to determine the economic end of life of an asset. This paper described the different statistical methods available and their use in determining the remaining service life of electrical equipment. A newly developed Excel-based demonstration computer tool is also an integral part of the deliverables of this project.
Li, Robin; Lin, Xiao; Geng, Haijiang; Li, Zhihui; Li, Jiabing; Lu, Tao; Yan, Fangrong
2015-12-29
Personalized cancer treatments depend on the determination of a patient's genetic status according to known genetic profiles for which targeted treatments exist. Such genetic profiles must be scientifically validated before they is applied to general patient population. Reproducibility of findings that support such genetic profiles is a fundamental challenge in validation studies. The percentage of overlapping genes (POG) criterion and derivative methods produce unstable and misleading results. Furthermore, in a complex disease, comparisons between different tumor subtypes can produce high POG scores that do not capture the consistencies in the functions. We focused on the quality rather than the quantity of the overlapping genes. We defined the rank value of each gene according to importance or quality by PageRank on basis of a particular topological structure. Then, we used the p-value of the rank-sum of the overlapping genes (PRSOG) to evaluate the quality of reproducibility. Though the POG scores were low in different studies of the same disease, the PRSOG was statistically significant, which suggests that sets of differentially expressed genes might be highly reproducible. Evaluations of eight datasets from breast cancer, lung cancer and four other disorders indicate that quality-based PRSOG method performs better than a quantity-based method. Our analysis of the components of the sets of overlapping genes supports the utility of the PRSOG method.
Ranking Theory and Conditional Reasoning.
Skovgaard-Olsen, Niels
2016-05-01
Ranking theory is a formal epistemology that has been developed in over 600 pages in Spohn's recent book The Laws of Belief, which aims to provide a normative account of the dynamics of beliefs that presents an alternative to current probabilistic approaches. It has long been received in the AI community, but it has not yet found application in experimental psychology. The purpose of this paper is to derive clear, quantitative predictions by exploiting a parallel between ranking theory and a statistical model called logistic regression. This approach is illustrated by the development of a model for the conditional inference task using Spohn's (2013) ranking theoretic approach to conditionals. Copyright © 2015 Cognitive Science Society, Inc.
Ranking beta sheet topologies of proteins
DEFF Research Database (Denmark)
Fonseca, Rasmus; Helles, Glennie; Winter, Pawel
2010-01-01
One of the challenges of protein structure prediction is to identify long-range interactions between amino acids. To reliably predict such interactions, we enumerate, score and rank all beta-topologies (partitions of beta-strands into sheets, orderings of strands within sheets and orientations...... of paired strands) of a given protein. We show that the beta-topology corresponding to the native structure is, with high probability, among the top-ranked. Since full enumeration is very time-consuming, we also suggest a method to deal with proteins with many beta-strands. The results reported...... in this paper are highly relevant for ab initio protein structure prediction methods based on decoy generation. The top-ranked beta-topologies can be used to find initial conformations from which conformational searches can be started. They can also be used to filter decoys by removing those with poorly...
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data
Directory of Open Access Journals (Sweden)
Maria Vinaixa
2012-10-01
Full Text Available Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
Rank reduction of correlation matrices by majorization
R. Pietersz (Raoul); P.J.F. Groenen (Patrick)
2004-01-01
textabstractIn this paper a novel method is developed for the problem of finding a low-rank correlation matrix nearest to a given correlation matrix. The method is based on majorization and therefore it is globally convergent. The method is computationally efficient, is straightforward to implement,
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-11-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.
Spatial analysis statistics, visualization, and computational methods
Oyana, Tonny J
2015-01-01
An introductory text for the next generation of geospatial analysts and data scientists, Spatial Analysis: Statistics, Visualization, and Computational Methods focuses on the fundamentals of spatial analysis using traditional, contemporary, and computational methods. Outlining both non-spatial and spatial statistical concepts, the authors present practical applications of geospatial data tools, techniques, and strategies in geographic studies. They offer a problem-based learning (PBL) approach to spatial analysis-containing hands-on problem-sets that can be worked out in MS Excel or ArcGIS-as well as detailed illustrations and numerous case studies. The book enables readers to: Identify types and characterize non-spatial and spatial data Demonstrate their competence to explore, visualize, summarize, analyze, optimize, and clearly present statistical data and results Construct testable hypotheses that require inferential statistical analysis Process spatial data, extract explanatory variables, conduct statisti...
RankProdIt: A web-interactive Rank Products analysis tool
Directory of Open Access Journals (Sweden)
Laing Emma
2010-08-01
Full Text Available Abstract Background The first objective of a DNA microarray experiment is typically to generate a list of genes or probes that are found to be differentially expressed or represented (in the case of comparative genomic hybridizations and/or copy number variation between two conditions or strains. Rank Products analysis comprises a robust algorithm for deriving such lists from microarray experiments that comprise small numbers of replicates, for example, less than the number required for the commonly used t-test. Currently, users wishing to apply Rank Products analysis to their own microarray data sets have been restricted to the use of command line-based software which can limit its usage within the biological community. Findings Here we have developed a web interface to existing Rank Products analysis tools allowing users to quickly process their data in an intuitive and step-wise manner to obtain the respective Rank Product or Rank Sum, probability of false prediction and p-values in a downloadable file. Conclusions The online interactive Rank Products analysis tool RankProdIt, for analysis of any data set containing measurements for multiple replicated conditions, is available at: http://strep-microarray.sbs.surrey.ac.uk/RankProducts
Statistical methods and materials characterisation
International Nuclear Information System (INIS)
Wallin, K.R.W.
2010-01-01
Statistics is a wide mathematical area, which covers a myriad of analysis and estimation options, some of which suit special cases better than others. A comprehensive coverage of the whole area of statistics would be an enormous effort and would also be outside the capabilities of this author. Therefore, this does not intend to be a textbook on statistical methods available for general data analysis and decision making. Instead it will highlight a certain special statistical case applicable to mechanical materials characterization. The methods presented here do not in any way rule out other statistical methods by which to analyze mechanical property material data. (orig.)
Statistical identification of effective input variables
International Nuclear Information System (INIS)
Vaurio, J.K.
1982-09-01
A statistical sensitivity analysis procedure has been developed for ranking the input data of large computer codes in the order of sensitivity-importance. The method is economical for large codes with many input variables, since it uses a relatively small number of computer runs. No prior judgemental elimination of input variables is needed. The sceening method is based on stagewise correlation and extensive regression analysis of output values calculated with selected input value combinations. The regression process deals with multivariate nonlinear functions, and statistical tests are also available for identifying input variables that contribute to threshold effects, i.e., discontinuities in the output variables. A computer code SCREEN has been developed for implementing the screening techniques. The efficiency has been demonstrated by several examples and applied to a fast reactor safety analysis code (Venus-II). However, the methods and the coding are general and not limited to such applications
Around power law for PageRank components in Buckley-Osthus model of web graph
Gasnikov, Alexander; Zhukovskii, Maxim; Kim, Sergey; Noskov, Fedor; Plaunov, Stepan; Smirnov, Daniil
2017-01-01
In the paper we investigate power law for PageRank components for the Buckley-Osthus model for web graph. We compare different numerical methods for PageRank calculation. With the best method we do a lot of numerical experiments. These experiments confirm the hypothesis about power law. At the end we discuss real model of web-ranking based on the classical PageRank approach.
An Improved Approach to the PageRank Problems
Directory of Open Access Journals (Sweden)
Yue Xie
2013-01-01
Full Text Available We introduce a partition of the web pages particularly suited to the PageRank problems in which the web link graph has a nested block structure. Based on the partition of the web pages, dangling nodes, common nodes, and general nodes, the hyperlink matrix can be reordered to be a more simple block structure. Then based on the parallel computation method, we propose an algorithm for the PageRank problems. In this algorithm, the dimension of the linear system becomes smaller, and the vector for general nodes in each block can be calculated separately in every iteration. Numerical experiments show that this approach speeds up the computation of PageRank.
Directory of Open Access Journals (Sweden)
Janackovic, Goran Lj.
2013-11-01
Full Text Available This paper presents the factors, performance, and indicators of occupational safety, as well as a method to select and rank occupational safety indicators based on the expert evaluation method and the fuzzy analytic hierarchy process (fuzzy AHP. A case study is done on road construction companies in Serbia. The key safety performance indicators for the road construction industry are identified and ranked according to the results of a survey that included experts who assessed occupational safety risks in these companies. The case study confirmed that organisational factors have a dominant effect on the quality of the occupational health and safety management system in Serbian road construction companies.
Assessment of statistical methods used in library-based approaches to microbial source tracking.
Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D
2003-12-01
Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Permutation statistical methods an integrated approach
Berry, Kenneth J; Johnston, Janis E
2016-01-01
This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...
Directory of Open Access Journals (Sweden)
Vassal Aurélien
2008-01-01
Full Text Available Abstract Background The huge amount of data generated by DNA chips is a powerful basis to classify various pathologies. However, constant evolution of microarray technology makes it difficult to mix data from different chip types for class prediction of limited sample populations. Affymetrix® technology provides both a quantitative fluorescence signal and a decision (detection call: absent or present based on signed-rank algorithms applied to several hybridization repeats of each gene, with a per-chip normalization. We developed a new prediction method for class belonging based on the detection call only from recent Affymetrix chip type. Biological data were obtained by hybridization on U133A, U133B and U133Plus 2.0 microarrays of purified normal B cells and cells from three independent groups of multiple myeloma (MM patients. Results After a call-based data reduction step to filter out non class-discriminative probe sets, the gene list obtained was reduced to a predictor with correction for multiple testing by iterative deletion of probe sets that sequentially improve inter-class comparisons and their significance. The error rate of the method was determined using leave-one-out and 5-fold cross-validation. It was successfully applied to (i determine a sex predictor with the normal donor group classifying gender with no error in all patient groups except for male MM samples with a Y chromosome deletion, (ii predict the immunoglobulin light and heavy chains expressed by the malignant myeloma clones of the validation group and (iii predict sex, light and heavy chain nature for every new patient. Finally, this method was shown powerful when compared to the popular classification method Prediction Analysis of Microarray (PAM. Conclusion This normalization-free method is routinely used for quality control and correction of collection errors in patient reports to clinicians. It can be easily extended to multiple class prediction suitable with
Understanding advanced statistical methods
Westfall, Peter
2013-01-01
Introduction: Probability, Statistics, and ScienceReality, Nature, Science, and ModelsStatistical Processes: Nature, Design and Measurement, and DataModelsDeterministic ModelsVariabilityParametersPurely Probabilistic Statistical ModelsStatistical Models with Both Deterministic and Probabilistic ComponentsStatistical InferenceGood and Bad ModelsUses of Probability ModelsRandom Variables and Their Probability DistributionsIntroductionTypes of Random Variables: Nominal, Ordinal, and ContinuousDiscrete Probability Distribution FunctionsContinuous Probability Distribution FunctionsSome Calculus-Derivatives and Least SquaresMore Calculus-Integrals and Cumulative Distribution FunctionsProbability Calculation and SimulationIntroductionAnalytic Calculations, Discrete and Continuous CasesSimulation-Based ApproximationGenerating Random NumbersIdentifying DistributionsIntroductionIdentifying Distributions from Theory AloneUsing Data: Estimating Distributions via the HistogramQuantiles: Theoretical and Data-Based Estimate...
Application of statistical method for FBR plant transient computation
International Nuclear Information System (INIS)
Kikuchi, Norihiro; Mochizuki, Hiroyasu
2014-01-01
Highlights: • A statistical method with a large trial number up to 10,000 is applied to the plant system analysis. • A turbine trip test conducted at the “Monju” reactor is selected as a plant transient. • A reduction method of trial numbers is discussed. • The result with reduced trial number can express the base regions of the computed distribution. -- Abstract: It is obvious that design tolerances, errors included in operation, and statistical errors in empirical correlations effect on the transient behavior. The purpose of the present study is to apply above mentioned statistical errors to a plant system computation in order to evaluate the statistical distribution contained in the transient evolution. A selected computation case is the turbine trip test conducted at 40% electric power of the prototype fast reactor “Monju”. All of the heat transport systems of “Monju” are modeled with the NETFLOW++ system code which has been validated using the plant transient tests of the experimental fast reactor Joyo, and “Monju”. The effects of parameters on upper plenum temperature are confirmed by sensitivity analyses, and dominant parameters are chosen. The statistical errors are applied to each computation deck by using a pseudorandom number and the Monte-Carlo method. The dSFMT (Double precision SIMD-oriented Fast Mersenne Twister) that is developed version of Mersenne Twister (MT), is adopted as the pseudorandom number generator. In the present study, uniform random numbers are generated by dSFMT, and these random numbers are transformed to the normal distribution by the Box–Muller method. Ten thousands of different computations are performed at once. In every computation case, the steady calculation is performed for 12,000 s, and transient calculation is performed for 4000 s. In the purpose of the present statistical computation, it is important that the base regions of distribution functions should be calculated precisely. A large number of
Andries, Jan P M; Vander Heyden, Yvan; Buydens, Lutgarde M C
2013-01-14
The calibration performance of partial least squares regression for one response (PLS1) can be improved by eliminating uninformative variables. Many variable-reduction methods are based on so-called predictor-variable properties or predictive properties, which are functions of various PLS-model parameters, and which may change during the steps of the variable-reduction process. Recently, a new predictive-property-ranked variable reduction method with final complexity adapted models, denoted as PPRVR-FCAM or simply FCAM, was introduced. It is a backward variable elimination method applied on the predictive-property-ranked variables. The variable number is first reduced, with constant PLS1 model complexity A, until A variables remain, followed by a further decrease in PLS complexity, allowing the final selection of small numbers of variables. In this study for three data sets the utility and effectiveness of six individual and nine combined predictor-variable properties are investigated, when used in the FCAM method. The individual properties include the absolute value of the PLS1 regression coefficient (REG), the significance of the PLS1 regression coefficient (SIG), the norm of the loading weight (NLW) vector, the variable importance in the projection (VIP), the selectivity ratio (SR), and the squared correlation coefficient of a predictor variable with the response y (COR). The selective and predictive performances of the models resulting from the use of these properties are statistically compared using the one-tailed Wilcoxon signed rank test. The results indicate that the models, resulting from variable reduction with the FCAM method, using individual or combined properties, have similar or better predictive abilities than the full spectrum models. After mean-centring of the data, REG and SIG, provide low numbers of informative variables, with a meaning relevant to the response, and lower than the other individual properties, while the predictive abilities are
Integrated inventory ranking system for oilfield equipment industry
Directory of Open Access Journals (Sweden)
Jalel Ben Hmida
2014-01-01
Full Text Available Purpose: This case study is motivated by the subcontracting problem in an oilfield equipment and service company where the management needs to decide which parts to manufacture in-house when the capacity is not enough to make all required parts. Currently the company is making subcontracting decisions based on management’s experience. Design/methodology/approach: Working with the management, a decision support system (DSS is developed to rank parts by integrating three inventory classification methods considering both quantitative factors such as cost and demand, and qualitative factors such as functionality, efficiency, and quality. The proposed integrated inventory ranking procedure will make use of three classification methods: ABC, FSN, and VED. Findings: An integration mechanism using weights is developed to rank the parts based on the total priority scores. The ranked list generated by the system helps management to identify about 50 critical parts to manufacture in-house. Originality/value: The integration of all three inventory classification techniques into a single system is a unique feature of this research. This is important as it provides a more inclusive, big picture view of the DSS for management’s use in making business decisions.
Škrbić, Biljana; Héberger, Károly; Durišić-Mladenović, Nataša
2013-10-01
Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores--the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can
Comparative Case Studies on Indonesian Higher Education Rankings
Kurniasih, Nuning; Hasyim, C.; Wulandari, A.; Setiawan, M. I.; Ahmar, A. S.
2018-01-01
The quality of the higher education is the result of a continuous process. There are many indicators that can be used to assess the quality of a higher education. The existence of different indicators makes the different result of university rankings. This research aims to find variables that can connect ranking indicators that are used by Indonesian Ministry of Research, Technology, and Higher Education with indicators that are used by international rankings by taking two kind of ranking systems i.e. Webometrics and 4icu. This research uses qualitative research method with comparative case studies approach. The result of the research shows that to bridge the indicators that are used by Indonesian Ministry or Research, Technology, and Higher Education with web-based ranking system like Webometrics and 4icu so that the Indonesian higher education institutions need to open access towards either scientific or non-scientific that are publicly used into web-based environment. One of the strategies that can be used to improve the openness and access towards scientific work of a university is by involving in open science and collaboration.
GeneRank: Using search engine technology for the analysis of microarray experiments
Directory of Open Access Journals (Sweden)
Breitling Rainer
2005-09-01
Full Text Available Abstract Background Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. Results GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Conclusion Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments.
GeneRank: using search engine technology for the analysis of microarray experiments.
Morrison, Julie L; Breitling, Rainer; Higham, Desmond J; Gilbert, David R
2005-09-21
Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method--based on the PageRank algorithm employed by the popular search engine Google--that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments.
The application of low-rank and sparse decomposition method in the field of climatology
Gupta, Nitika; Bhaskaran, Prasad K.
2018-04-01
The present study reports a low-rank and sparse decomposition method that separates the mean and the variability of a climate data field. Until now, the application of this technique was limited only in areas such as image processing, web data ranking, and bioinformatics data analysis. In climate science, this method exactly separates the original data into a set of low-rank and sparse components, wherein the low-rank components depict the linearly correlated dataset (expected or mean behavior), and the sparse component represents the variation or perturbation in the dataset from its mean behavior. The study attempts to verify the efficacy of this proposed technique in the field of climatology with two examples of real world. The first example attempts this technique on the maximum wind-speed (MWS) data for the Indian Ocean (IO) region. The study brings to light a decadal reversal pattern in the MWS for the North Indian Ocean (NIO) during the months of June, July, and August (JJA). The second example deals with the sea surface temperature (SST) data for the Bay of Bengal region that exhibits a distinct pattern in the sparse component. The study highlights the importance of the proposed technique used for interpretation and visualization of climate data.
Doerr, Timothy P.; Alves, Gelio; Yu, Yi-Kuo
2005-08-01
Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time using the transfer matrix technique or, equivalently, the dynamic programming approach. This suggests a way to efficiently find approximate solutions-find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of the kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the finite number of high-ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks-peptide sequencing using tandem mass spectrometry data. For directed paths in random media, the scaling function depends on the particular realization of randomness; in the mass spectrometry case, the scaling function is spectrum-specific.
Application of nonparametric statistic method for DNBR limit calculation
International Nuclear Information System (INIS)
Dong Bo; Kuang Bo; Zhu Xuenong
2013-01-01
Background: Nonparametric statistical method is a kind of statistical inference method not depending on a certain distribution; it calculates the tolerance limits under certain probability level and confidence through sampling methods. The DNBR margin is one important parameter of NPP design, which presents the safety level of NPP. Purpose and Methods: This paper uses nonparametric statistical method basing on Wilks formula and VIPER-01 subchannel analysis code to calculate the DNBR design limits (DL) of 300 MW NPP (Nuclear Power Plant) during the complete loss of flow accident, simultaneously compared with the DL of DNBR through means of ITDP to get certain DNBR margin. Results: The results indicate that this method can gain 2.96% DNBR margin more than that obtained by ITDP methodology. Conclusions: Because of the reduction of the conservation during analysis process, the nonparametric statistical method can provide greater DNBR margin and the increase of DNBR margin is benefited for the upgrading of core refuel scheme. (authors)
Evaluation of Term Ranking Algorithms for Pseudo-Relevance Feedback in MEDLINE Retrieval.
Yoo, Sooyoung; Choi, Jinwook
2011-06-01
The purpose of this study was to investigate the effects of query expansion algorithms for MEDLINE retrieval within a pseudo-relevance feedback framework. A number of query expansion algorithms were tested using various term ranking formulas, focusing on query expansion based on pseudo-relevance feedback. The OHSUMED test collection, which is a subset of the MEDLINE database, was used as a test corpus. Various ranking algorithms were tested in combination with different term re-weighting algorithms. Our comprehensive evaluation showed that the local context analysis ranking algorithm, when used in combination with one of the reweighting algorithms - Rocchio, the probabilistic model, and our variants - significantly outperformed other algorithm combinations by up to 12% (paired t-test; p algorithm pairs, at least in the context of the OHSUMED corpus. Comparative experiments on term ranking algorithms were performed in the context of a subset of MEDLINE documents. With medical documents, local context analysis, which uses co-occurrence with all query terms, significantly outperformed various term ranking methods based on both frequency and distribution analyses. Furthermore, the results of the experiments demonstrated that the term rank-based re-weighting method contributed to a remarkable improvement in mean average precision.
Expert opinion on landslide susceptibility elicted by probabilistic inversion from scenario rankings
Lee, Katy; Dashwood, Claire; Lark, Murray
2016-04-01
For many natural hazards the opinion of experts, with experience in assessing susceptibility under different circumstances, is a valuable source of information on which to base risk assessments. This is particularly important where incomplete process understanding, and limited data, limit the scope to predict susceptibility by mechanistic or statistical modelling. The expert has a tacit model of a system, based on their understanding of processes and their field experience. This model may vary in quality, depending on the experience of the expert. There is considerable interest in how one may elicit expert understanding by a process which is transparent and robust, to provide a basis for decision support. One approach is to provide experts with a set of scenarios, and then to ask them to rank small overlapping subsets of these with respect to susceptibility. Methods of probabilistic inversion have been used to compute susceptibility scores for each scenario, implicit in the expert ranking. It is also possible to model these scores as functions of measurable properties of the scenarios. This approach has been used to assess susceptibility of animal populations to invasive diseases, to assess risk to vulnerable marine environments and to assess the risk in hypothetical novel technologies for food production. We will present the results of a study in which a group of geologists with varying degrees of expertise in assessing landslide hazards were asked to rank sets of hypothetical simplified scenarios with respect to land slide susceptibility. We examine the consistency of their rankings and the importance of different properties of the scenarios in the tacit susceptibility model that their rankings implied. Our results suggest that this is a promising approach to the problem of how experts can communicate their tacit model of uncertain systems to those who want to make use of their expertise.
Strategic Entrepreneurship Based Model of Catch-up University in Global Rankings
Directory of Open Access Journals (Sweden)
Kozlov Mikhail
2016-01-01
Full Text Available The paper will help answer the question, why only few universities managed to succeed significantly in their global ranking advancement, while most of their competitors fail. For this purpose it will introduce a new strategically entrepreneurial catch-up university framework, based on the combination of the resource based view, dynamic capabilities, strategic entrepreneurship and latecomer organization concepts. The new framework logics explains the advantages of being ambidextrous for ranking oriented universities and pursuing new potentially more favorable opportunities for research development. It will propose that substantial increase in the level of dynamic capabilities of the universities and their resource base accumulation is based on the use of the new combination of financial, human and social capital combined with strategic management of these resources in the process of identification and exploitation of greater opportunities.
International Nuclear Information System (INIS)
Safaei Mohamadabadi, H.; Tichkowsky, G.; Kumar, A.
2009-01-01
Several factors, including economical, environmental, and social factors, are involved in selection of the best fuel-based vehicles for road transportation. This leads to a multi-criteria selection problem for multi-alternatives. In this study, a multi-criteria assessment model was developed to rank different road transportation fuel-based vehicles (both renewable and non-renewable) using a method called Preference Ranking Organization Method for Enrichment and Evaluations (PROMETHEE). This method combines qualitative and quantitative criteria to rank various alternatives. In this study, vehicles based on gasoline, gasoline-electric (hybrid), E85 ethanol, diesel, B100 biodiesel, and compressed natural gas (CNG) were considered as alternatives. These alternatives were ranked based on five criteria: vehicle cost, fuel cost, distance between refueling stations, number of vehicle options available to the consumer, and greenhouse gas (GHG) emissions per unit distance traveled. In addition, sensitivity analyses were performed to study the impact of changes in various parameters on final ranking. Two base cases and several alternative scenarios were evaluated. In the base case scenario with higher weight on economical parameters, gasoline-based vehicle was ranked higher than other vehicles. In the base case scenario with higher weight on environmental parameters, hybrid vehicle was ranked first followed by biodiesel-based vehicle
Energy Technology Data Exchange (ETDEWEB)
Safaei Mohamadabadi, H.; Tichkowsky, G.; Kumar, A. [Department of Mechanical Engineering, University of Alberta, Edmonton, Alberta (Canada)
2009-01-15
Several factors, including economical, environmental, and social factors, are involved in selection of the best fuel-based vehicles for road transportation. This leads to a multi-criteria selection problem for multi-alternatives. In this study, a multi-criteria assessment model was developed to rank different road transportation fuel-based vehicles (both renewable and non-renewable) using a method called Preference Ranking Organization Method for Enrichment and Evaluations (PROMETHEE). This method combines qualitative and quantitative criteria to rank various alternatives. In this study, vehicles based on gasoline, gasoline-electric (hybrid), E85 ethanol, diesel, B100 biodiesel, and compressed natural gas (CNG) were considered as alternatives. These alternatives were ranked based on five criteria: vehicle cost, fuel cost, distance between refueling stations, number of vehicle options available to the consumer, and greenhouse gas (GHG) emissions per unit distance traveled. In addition, sensitivity analyses were performed to study the impact of changes in various parameters on final ranking. Two base cases and several alternative scenarios were evaluated. In the base case scenario with higher weight on economical parameters, gasoline-based vehicle was ranked higher than other vehicles. In the base case scenario with higher weight on environmental parameters, hybrid vehicle was ranked first followed by biodiesel-based vehicle. (author)
Discovering urban mobility patterns with PageRank based traffic modeling and prediction
Wang, Minjie; Yang, Su; Sun, Yi; Gao, Jun
2017-11-01
Urban transportation system can be viewed as complex network with time-varying traffic flows as links to connect adjacent regions as networked nodes. By computing urban traffic evolution on such temporal complex network with PageRank, it is found that for most regions, there exists a linear relation between the traffic congestion measure at present time and the PageRank value of the last time. Since the PageRank measure of a region does result from the mutual interactions of the whole network, it implies that the traffic state of a local region does not evolve independently but is affected by the evolution of the whole network. As a result, the PageRank values can act as signatures in predicting upcoming traffic congestions. We observe the aforementioned laws experimentally based on the trajectory data of 12000 taxies in Beijing city for one month.
A LDA-based approach to promoting ranking diversity for genomics information retrieval.
Chen, Yan; Yin, Xiaoshi; Li, Zhoujun; Hu, Xiaohua; Huang, Jimmy Xiangji
2012-06-11
In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search
Reconstructing Macroeconomics Based on Statistical Physics
Aoki, Masanao; Yoshikawa, Hiroshi
We believe that time has come to integrate the new approach based on statistical physics or econophysics into macroeconomics. Toward this goal, there must be more dialogues between physicists and economists. In this paper, we argue that there is no reason why the methods of statistical physics so successful in many fields of natural sciences cannot be usefully applied to macroeconomics that is meant to analyze the macroeconomy comprising a large number of economic agents. It is, in fact, weird to regard the macroeconomy as a homothetic enlargement of the representative micro agent. We trust the bright future of the new approach to macroeconomies based on statistical physics.
Directory of Open Access Journals (Sweden)
Zutao Zhang
2016-06-01
Full Text Available Environmental perception and information processing are two key steps of active safety for vehicle reversing. Single-sensor environmental perception cannot meet the need for vehicle reversing safety due to its low reliability. In this paper, we present a novel multi-sensor environmental perception method using low-rank representation and a particle filter for vehicle reversing safety. The proposed system consists of four main steps, namely multi-sensor environmental perception, information fusion, target recognition and tracking using low-rank representation and a particle filter, and vehicle reversing speed control modules. First of all, the multi-sensor environmental perception module, based on a binocular-camera system and ultrasonic range finders, obtains the distance data for obstacles behind the vehicle when the vehicle is reversing. Secondly, the information fusion algorithm using an adaptive Kalman filter is used to process the data obtained with the multi-sensor environmental perception module, which greatly improves the robustness of the sensors. Then the framework of a particle filter and low-rank representation is used to track the main obstacles. The low-rank representation is used to optimize an objective particle template that has the smallest L-1 norm. Finally, the electronic throttle opening and automatic braking is under control of the proposed vehicle reversing control strategy prior to any potential collisions, making the reversing control safer and more reliable. The final system simulation and practical testing results demonstrate the validity of the proposed multi-sensor environmental perception method using low-rank representation and a particle filter for vehicle reversing safety.
Statistical-mechanical entropy by the thin-layer method
International Nuclear Information System (INIS)
Feng, He; Kim, Sung Won
2003-01-01
G. Hooft first studied the statistical-mechanical entropy of a scalar field in a Schwarzschild black hole background by the brick-wall method and hinted that the statistical-mechanical entropy is the statistical origin of the Bekenstein-Hawking entropy of the black hole. However, according to our viewpoint, the statistical-mechanical entropy is only a quantum correction to the Bekenstein-Hawking entropy of the black-hole. The brick-wall method based on thermal equilibrium at a large scale cannot be applied to the cases out of equilibrium such as a nonstationary black hole. The statistical-mechanical entropy of a scalar field in a nonstationary black hole background is calculated by the thin-layer method. The condition of local equilibrium near the horizon of the black hole is used as a working postulate and is maintained for a black hole which evaporates slowly enough and whose mass is far greater than the Planck mass. The statistical-mechanical entropy is also proportional to the area of the black hole horizon. The difference from the stationary black hole is that the result relies on a time-dependent cutoff
APPLYING ROBUST RANKING METHOD IN TWO PHASE FUZZY OPTIMIZATION LINEAR PROGRAMMING PROBLEMS (FOLPP
Directory of Open Access Journals (Sweden)
Monalisha Pattnaik
2014-12-01
Full Text Available Background: This paper explores the solutions to the fuzzy optimization linear program problems (FOLPP where some parameters are fuzzy numbers. In practice, there are many problems in which all decision parameters are fuzzy numbers, and such problems are usually solved by either probabilistic programming or multi-objective programming methods. Methods: In this paper, using the concept of comparison of fuzzy numbers, a very effective method is introduced for solving these problems. This paper extends linear programming based problem in fuzzy environment. With the problem assumptions, the optimal solution can still be theoretically solved using the two phase simplex based method in fuzzy environment. To handle the fuzzy decision variables can be initially generated and then solved and improved sequentially using the fuzzy decision approach by introducing robust ranking technique. Results and conclusions: The model is illustrated with an application and a post optimal analysis approach is obtained. The proposed procedure was programmed with MATLAB (R2009a version software for plotting the four dimensional slice diagram to the application. Finally, numerical example is presented to illustrate the effectiveness of the theoretical results, and to gain additional managerial insights.
Multi-dimensional Rankings, Program Termination, and Complexity Bounds of Flowchart Programs
Alias, Christophe; Darte, Alain; Feautrier, Paul; Gonnord, Laure
Proving the termination of a flowchart program can be done by exhibiting a ranking function, i.e., a function from the program states to a well-founded set, which strictly decreases at each program step. A standard method to automatically generate such a function is to compute invariants for each program point and to search for a ranking in a restricted class of functions that can be handled with linear programming techniques. Previous algorithms based on affine rankings either are applicable only to simple loops (i.e., single-node flowcharts) and rely on enumeration, or are not complete in the sense that they are not guaranteed to find a ranking in the class of functions they consider, if one exists. Our first contribution is to propose an efficient algorithm to compute ranking functions: It can handle flowcharts of arbitrary structure, the class of candidate rankings it explores is larger, and our method, although greedy, is provably complete. Our second contribution is to show how to use the ranking functions we generate to get upper bounds for the computational complexity (number of transitions) of the source program. This estimate is a polynomial, which means that we can handle programs with more than linear complexity. We applied the method on a collection of test cases from the literature. We also show the links and differences with previous techniques based on the insertion of counters.
Hoshikawa, K; Ono, S
2017-02-01
Multicriteria decision analysis (MCDA) has been generally considered a promising decision-making methodology for the assessment of drug benefit-risk profiles. There have been many discussions in both public and private sectors on its feasibility and applicability, but it has not been employed in official decision-makings. For the purpose of examining to what extent MCDA would reflect the first-hand, intuitive preference of evaluators in practical pharmaceutical assessments, we conducted a questionnaire survey involving the participation of employees of pharmaceutical companies. Showing profiles of the efficacy and safety of four hypothetical drugs, each respondent was asked to rank them following the standard MCDA process and then to rank them intuitively (i.e. without applying any analytical framework). These two approaches resulted in substantially different ranking patterns from the same individuals, and the concordance rate was surprisingly low (17%). Although many respondents intuitively showed a preference for mild, balanced risk-benefit profiles over profiles with a conspicuous advantage in either risk or benefit, the ranking orders based on MCDA scores did not reflect the intuitive preference. Observed discrepancies between the rankings seemed to be primarily attributed to the structural characteristics of MCDA, which assumes that evaluation on each benefit and risk component should have monotonic impact on final scores. It would be difficult for MCDA to reflect commonly observed non-monotonic preferences for risk and benefit profiles. Possible drawbacks of MCDA should be further investigated prior to the real-world application of its benefit-risk assessment. © 2016 John Wiley & Sons Ltd.
Analysis of Statistical Methods Currently used in Toxicology Journals.
Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min
2014-09-01
Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.
Rank-Based miRNA Signatures for Early Cancer Detection
Directory of Open Access Journals (Sweden)
Mario Lauria
2014-01-01
Full Text Available We describe a new signature definition and analysis method to be used as biomarker for early cancer detection. Our new approach is based on the construction of a reference map of transcriptional signatures of both healthy and cancer affected individuals using circulating miRNA from a large number of subjects. Once such a map is available, the diagnosis for a new patient can be performed by observing the relative position on the map of his/her transcriptional signature. To demonstrate its efficacy for this specific application we report the results of the application of our method to published datasets of circulating miRNA, and we quantify its performance compared to current state-of-the-art methods. A number of additional features make this method an ideal candidate for large-scale use, for example, as a mass screening tool for early cancer detection or for at-home diagnostics. Specifically, our method is minimally invasive (because it works well with circulating miRNA, it is robust with respect to lab-to-lab protocol variability and batch effects (it requires that only the relative ranking of expression value of miRNA in a profile be accurate not their absolute values, and it is scalable to a large number of subjects. Finally we discuss the need for HPC capability in a widespread application of our or similar methods.
Block models and personalized PageRank.
Kloumann, Isabel M; Ugander, Johan; Kleinberg, Jon
2017-01-03
Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods through the "seed set expansion problem": given a subset [Formula: see text] of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of "landing probabilities" of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work, we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model and find, surprisingly, that under reasonable assumptions the gradient is asymptotically equivalent to personalized PageRank for a specific choice of the PageRank parameter [Formula: see text] that depends on the block model parameters. This connection provides a formal motivation for the success of personalized PageRank in seed set expansion and node ranking generally. We use this connection to propose more advanced techniques incorporating higher moments of landing probabilities; our advanced methods exhibit greatly improved performance, despite being simple linear classification rules, and are even competitive with belief propagation.
Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin
2014-01-01
Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874
METHOD FOR SOLVING FUZZY ASSIGNMENT PROBLEM USING MAGNITUDE RANKING TECHNIQUE
D. Selvi; R. Queen Mary; G. Velammal
2017-01-01
Assignment problems have various applications in the real world because of their wide applicability in industry, commerce, management science, etc. Traditional classical assignment problems cannot be successfully used for real life problem, hence the use of fuzzy assignment problems is more appropriate. In this paper, the fuzzy assignment problem is formulated to crisp assignment problem using Magnitude Ranking technique and Hungarian method has been applied to find an optimal solution. The N...
Ranking multiple docking solutions based on the conservation of inter-residue contacts
Oliva, Romina M.
2013-06-17
Molecular docking is the method of choice for investigating the molecular basis of recognition in a large number of functional protein complexes. However, correctly scoring the obtained docking solutions (decoys) to rank native-like (NL) conformations in the top positions is still an open problem. Herein we present CONSRANK, a simple and effective tool to rank multiple docking solutions, which relies on the conservation of inter-residue contacts in the analyzed decoys ensemble. First it calculates a conservation rate for each inter-residue contact, then it ranks decoys according to their ability to match the more frequently observed contacts. We applied CONSRANK to 102 targets from three different benchmarks, RosettaDock, DOCKGROUND, and Critical Assessment of PRedicted Interactions (CAPRI). The method performs consistently well, both in terms of NL solutions ranked in the top positions and of values of the area under the receiver operating characteristic curve. Its ideal application is to solutions coming from different docking programs and procedures, as in the case of CAPRI targets. For all the analyzed CAPRI targets where a comparison is feasible, CONSRANK outperforms the CAPRI scorers. The fraction of NL solutions in the top ten positions in the RosettaDock, DOCKGROUND, and CAPRI benchmarks is enriched on average by a factor of 3.0, 1.9, and 9.9, respectively. Interestingly, CONSRANK is also able to specifically single out the high/medium quality (HMQ) solutions from the docking decoys ensemble: it ranks 46.2 and 70.8% of the total HMQ solutions available for the RosettaDock and CAPRI targets, respectively, within the top 20 positions. © 2013 Wiley Periodicals, Inc.
Group social rank is associated with performance on a spatial learning task.
Langley, Ellis J G; van Horik, Jayden O; Whiteside, Mark A; Madden, Joah R
2018-02-01
Dominant individuals differ from subordinates in their performances on cognitive tasks across a suite of taxa. Previous studies often only consider dyadic relationships, rather than the more ecologically relevant social hierarchies or networks, hence failing to account for how dyadic relationships may be adjusted within larger social groups. We used a novel statistical method: randomized Elo-ratings, to infer the social hierarchy of 18 male pheasants, Phasianus colchicus , while in a captive, mixed-sex group with a linear hierarchy. We assayed individual learning performance of these males on a binary spatial discrimination task to investigate whether inter-individual variation in performance is associated with group social rank. Task performance improved with increasing trial number and was positively related to social rank, with higher ranking males showing greater levels of success. Motivation to participate in the task was not related to social rank or task performance, thus indicating that these rank-related differences are not a consequence of differences in motivation to complete the task. Our results provide important information about how variation in cognitive performance relates to an individual's social rank within a group. Whether the social environment causes differences in learning performance or instead, inherent differences in learning ability predetermine rank remains to be tested.
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.
Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho
2014-10-01
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
Multimodal biometric system using rank-level fusion approach.
Monwar, Md Maruf; Gavrilova, Marina L
2009-08-01
In many real-world applications, unimodal biometric systems often face significant limitations due to sensitivity to noise, intraclass variability, data quality, nonuniversality, and other factors. Attempting to improve the performance of individual matchers in such situations may not prove to be highly effective. Multibiometric systems seek to alleviate some of these problems by providing multiple pieces of evidence of the same identity. These systems help achieve an increase in performance that may not be possible using a single-biometric indicator. This paper presents an effective fusion scheme that combines information presented by multiple domain experts based on the rank-level fusion integration method. The developed multimodal biometric system possesses a number of unique qualities, starting from utilizing principal component analysis and Fisher's linear discriminant methods for individual matchers (face, ear, and signature) identity authentication and utilizing the novel rank-level fusion method in order to consolidate the results obtained from different biometric matchers. The ranks of individual matchers are combined using the highest rank, Borda count, and logistic regression approaches. The results indicate that fusion of individual modalities can improve the overall performance of the biometric system, even in the presence of low quality data. Insights on multibiometric design using rank-level fusion and its performance on a variety of biometric databases are discussed in the concluding section.
Developing a Clustering-Based Empirical Bayes Analysis Method for Hotspot Identification
Directory of Open Access Journals (Sweden)
Yajie Zou
2017-01-01
Full Text Available Hotspot identification (HSID is a critical part of network-wide safety evaluations. Typical methods for ranking sites are often rooted in using the Empirical Bayes (EB method to estimate safety from both observed crash records and predicted crash frequency based on similar sites. The performance of the EB method is highly related to the selection of a reference group of sites (i.e., roadway segments or intersections similar to the target site from which safety performance functions (SPF used to predict crash frequency will be developed. As crash data often contain underlying heterogeneity that, in essence, can make them appear to be generated from distinct subpopulations, methods are needed to select similar sites in a principled manner. To overcome this possible heterogeneity problem, EB-based HSID methods that use common clustering methodologies (e.g., mixture models, K-means, and hierarchical clustering to select “similar” sites for building SPFs are developed. Performance of the clustering-based EB methods is then compared using real crash data. Here, HSID results, when computed on Texas undivided rural highway cash data, suggest that all three clustering-based EB analysis methods are preferred over the conventional statistical methods. Thus, properly classifying the road segments for heterogeneous crash data can further improve HSID accuracy.
Model of Decision Making through Consensus in Ranking Case
Tarigan, Gim; Darnius, Open
2018-01-01
The basic problem to determine ranking consensus is a problem to combine some rankings those are decided by two or more Decision Maker (DM) into ranking consensus. DM is frequently asked to present their preferences over a group of objects in terms of ranks, for example to determine a new project, new product, a candidate in a election, and so on. The problem in ranking can be classified into two major categories; namely, cardinal and ordinal rankings. The objective of the study is to obtin the ranking consensus by appying some algorithms and methods. The algorithms and methods used in this study were partial algorithm, optimal ranking consensus, BAK (Borde-Kendal)Model. A method proposed as an alternative in ranking conssensus is a Weighted Distance Forward-Backward (WDFB) method, which gave a little difference i ranking consensus result compare to the result oethe example solved by Cook, et.al (2005).
A rank based social norms model of how people judge their levels of drunkenness whilst intoxicated
Directory of Open Access Journals (Sweden)
Simon C. Moore
2016-09-01
Full Text Available Abstract Background A rank based social norms model predicts that drinkers’ judgements about their drinking will be based on the rank of their breath alcohol level amongst that of others in the immediate environment, rather than their actual breath alcohol level, with lower relative rank associated with greater feelings of safety. This study tested this hypothesis and examined how people judge their levels of drunkenness and the health consequences of their drinking whilst they are intoxicated in social drinking environments. Methods Breath alcohol testing of 1,862 people (mean age = 26.96 years; 61.86 % male in drinking environments. A subset (N = 400 also answered four questions asking about their perceptions of their drunkenness and the health consequences of their drinking (plus background measures. Results Perceptions of drunkenness and the health consequences of drinking were regressed on: (a breath alcohol level, (b the rank of the breath alcohol level amongst that of others in the same environment, and (c covariates. Only rank of breath alcohol level predicted perceptions: How drunk they felt (b 3.78, 95 % CI 1.69 5.87, how extreme they regarded their drinking that night (b 3.7, 95 % CI 1.3 6.20, how at risk their long-term health was due to their current level of drinking (b 4.1, 95 % CI 0.2 8.0 and how likely they felt they would experience liver cirrhosis (b 4.8. 95 % CI 0.7 8.8. People were more influenced by more sober others than by more drunk others. Conclusion Whilst intoxicated and in drinking environments, people base judgements regarding their drinking on how their level of intoxication ranks relative to that of others of the same gender around them, not on their actual levels of intoxication. Thus, when in the company of others who are intoxicated, drinkers were found to be more likely to underestimate their own level of drinking, drunkenness and associated risks. The implications of these results, for example
Minkowski metrics in creating universal ranking algorithms
Directory of Open Access Journals (Sweden)
Andrzej Ameljańczyk
2014-06-01
Full Text Available The paper presents a general procedure for creating the rankings of a set of objects, while the relation of preference based on any ranking function. The analysis was possible to use the ranking functions began by showing the fundamental drawbacks of commonly used functions in the form of a weighted sum. As a special case of the ranking procedure in the space of a relation, the procedure based on the notion of an ideal element and generalized Minkowski distance from the element was proposed. This procedure, presented as universal ranking algorithm, eliminates most of the disadvantages of ranking functions in the form of a weighted sum.[b]Keywords[/b]: ranking functions, preference relation, ranking clusters, categories, ideal point, universal ranking algorithm
Ranking of biomass pellets by integration of economic, environmental and technical factors
International Nuclear Information System (INIS)
Sultana, Arifa; Kumar, Amit
2012-01-01
Interest in biomass as a renewable energy source has increased recently in response to a need to reduce greenhouse gas (GHG) emissions. The objective of this study is to develop a multi-criteria assessment model and rank different biomass feedstock-based pellets, in terms of their suitability for use in large heat and power generation plants and show the importance of environmental, economical and technical factors in making decision about different pellets. Five pellet alternatives, each produced from a different sustainable biomass feedstock i.e., wood, straw, switchgrass, alfalfa and poultry litter, are ranked according to eleven criteria, using the Preference Ranking Organization Method for Enrichment and Evaluation (PROMETHEE). Both quantitative and qualitative criteria are considered, including environmental, technical and economic factors. Three scenarios, namely base case, environmental and economic, are developed by changing the weight assigned to different criteria. In the base case scenario, equal weights are assigned to each criterion. In the economic and environmental scenarios, more weight is given to the economic and environmental factors, respectively. Based on the PROMETHEE rankings, wood pellets are the best source of energy for all scenarios followed by switchgrass, straw, poultry litter and alfalfa pellets except economic scenario, where straw pellets held higher position than switchgrass pellets. Sensitivity analysis on weights, threshold values, preference function and production cost indicate that the ranking was stable. The ranking in all scenarios remained same when qualitative criteria were omitted from the model; this indicates the stronger influence of quantitative criteria. -- Highlights: ► This study ranks the pellets produced from different biomass feedstocks. ► The ranking of the pellets is based on technical, economical and environmental factors. ► This study uses PROMETHEE method for ranking pellets based on a range of
[Study on commercial specification of atractylodes based on Delphi method].
Wang, Hao; Chen, Li-Xiao; Huang, Lu-Qi; Zhang, Tian-Tian; Li, Ying; Zheng, Yu-Guang
2016-03-01
This research adopts "Delphi method" to evaluate atractylodes traditional traits and rank correlation. By using methods of mathematical statistics the relationship of the traditional identification indicators and atractylodes goods rank correlation was analyzed, It is found that the main characteristics affectingatractylodes commodity specifications and grades of main characters wereoil points of transaction,color of transaction,color of surface,grain of transaction,texture of transaction andspoilage. The study points out that the original "seventy-six kinds of medicinal materials commodity specification standards of atractylodes differentiate commodity specification" is not in conformity with the actual market situation, we need to formulate corresponding atractylodes medicinal products specifications and grades.This study combined with experimental results "Delphi method" and the market actual situation, proposed the new draft atractylodes commodity specifications and grades, as the new atractylodes commodity specifications and grades standards. It provides a reference and theoretical basis. Copyright© by the Chinese Pharmaceutical Association.
A Survey on PageRank Computing
Berkhin, Pavel
2005-01-01
This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much mor...
Statistical methods for nuclear material management
International Nuclear Information System (INIS)
Bowen, W.M.; Bennett, C.A.
1988-12-01
This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems
Statistical methods for nuclear material management
Energy Technology Data Exchange (ETDEWEB)
Bowen W.M.; Bennett, C.A. (eds.)
1988-12-01
This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems.
Leveraging Multiactions to Improve Medical Personalized Ranking for Collaborative Filtering
Directory of Open Access Journals (Sweden)
Shan Gao
2017-01-01
Full Text Available Nowadays, providing high-quality recommendation services to users is an essential component in web applications, including shopping, making friends, and healthcare. This can be regarded either as a problem of estimating users’ preference by exploiting explicit feedbacks (numerical ratings, or as a problem of collaborative ranking with implicit feedback (e.g., purchases, views, and clicks. Previous works for solving this issue include pointwise regression methods and pairwise ranking methods. The emerging healthcare websites and online medical databases impose a new challenge for medical service recommendation. In this paper, we develop a model, MBPR (Medical Bayesian Personalized Ranking over multiple users’ actions, based on the simple observation that users tend to assign higher ranks to some kind of healthcare services that are meanwhile preferred in users’ other actions. Experimental results on the real-world datasets demonstrate that MBPR achieves more accurate recommendations than several state-of-the-art methods and shows its generality and scalability via experiments on the datasets from one mobile shopping app.
Leveraging Multiactions to Improve Medical Personalized Ranking for Collaborative Filtering.
Gao, Shan; Guo, Guibing; Li, Runzhi; Wang, Zongmin
2017-01-01
Nowadays, providing high-quality recommendation services to users is an essential component in web applications, including shopping, making friends, and healthcare. This can be regarded either as a problem of estimating users' preference by exploiting explicit feedbacks (numerical ratings), or as a problem of collaborative ranking with implicit feedback (e.g., purchases, views, and clicks). Previous works for solving this issue include pointwise regression methods and pairwise ranking methods. The emerging healthcare websites and online medical databases impose a new challenge for medical service recommendation. In this paper, we develop a model, MBPR (Medical Bayesian Personalized Ranking over multiple users' actions), based on the simple observation that users tend to assign higher ranks to some kind of healthcare services that are meanwhile preferred in users' other actions. Experimental results on the real-world datasets demonstrate that MBPR achieves more accurate recommendations than several state-of-the-art methods and shows its generality and scalability via experiments on the datasets from one mobile shopping app.
About the use of rank transformation in sensitivity analysis of model output
International Nuclear Information System (INIS)
Saltelli, Andrea; Sobol', Ilya M
1995-01-01
Rank transformations are frequently employed in numerical experiments involving a computational model, especially in the context of sensitivity and uncertainty analyses. Response surface replacement and parameter screening are tasks which may benefit from a rank transformation. Ranks can cope with nonlinear (albeit monotonic) input-output distributions, allowing the use of linear regression techniques. Rank transformed statistics are more robust, and provide a useful solution in the presence of long tailed input and output distributions. As is known to practitioners, care must be employed when interpreting the results of such analyses, as any conclusion drawn using ranks does not translate easily to the original model. In the present note an heuristic approach is taken, to explore, by way of practical examples, the effect of a rank transformation on the outcome of a sensitivity analysis. An attempt is made to identify trends, and to correlate these effects to a model taxonomy. Employing sensitivity indices, whereby the total variance of the model output is decomposed into a sum of terms of increasing dimensionality, we show that the main effect of the rank transformation is to increase the relative weight of the first order terms (the 'main effects'), at the expense of the 'interactions' and 'higher order interactions'. As a result the influence of those parameters which influence the output mostly by way of interactions may be overlooked in an analysis based on the ranks. This difficulty increases with the dimensionality of the problem, and may lead to the failure of a rank based sensitivity analysis. We suggest that the models can be ranked, with respect to the complexity of their input-output relationship, by mean of an 'Association' index I y . I y may complement the usual model coefficient of determination R y 2 as a measure of model complexity for the purpose of uncertainty and sensitivity analysis
Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach.
Li, Jun; Zhao, Patrick X
2016-01-01
Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/.
University Rankings and Social Science
Marginson, S.
2014-01-01
University rankings widely affect the behaviours of prospective students and their families, university executive leaders, academic faculty, governments and investors in higher education. Yet the social science foundations of global rankings receive little scrutiny. Rankings that simply recycle reputation without any necessary connection to real outputs are of no common value. It is necessary that rankings be soundly based in scientific terms if a virtuous relationship between performance and...
Ensemble Manifold Rank Preserving for Acceleration-Based Human Activity Recognition.
Tao, Dapeng; Jin, Lianwen; Yuan, Yuan; Xue, Yang
2016-06-01
With the rapid development of mobile devices and pervasive computing technologies, acceleration-based human activity recognition, a difficult yet essential problem in mobile apps, has received intensive attention recently. Different acceleration signals for representing different activities or even a same activity have different attributes, which causes troubles in normalizing the signals. We thus cannot directly compare these signals with each other, because they are embedded in a nonmetric space. Therefore, we present a nonmetric scheme that retains discriminative and robust frequency domain information by developing a novel ensemble manifold rank preserving (EMRP) algorithm. EMRP simultaneously considers three aspects: 1) it encodes the local geometry using the ranking order information of intraclass samples distributed on local patches; 2) it keeps the discriminative information by maximizing the margin between samples of different classes; and 3) it finds the optimal linear combination of the alignment matrices to approximate the intrinsic manifold lied in the data. Experiments are conducted on the South China University of Technology naturalistic 3-D acceleration-based activity dataset and the naturalistic mobile-devices based human activity dataset to demonstrate the robustness and effectiveness of the new nonmetric scheme for acceleration-based human activity recognition.
Deriving consensus rankings via multicriteria decision making methodology
Amy Poh Ai Ling; Mohamad Nasir Saludin; Masao Mukaidono
2012-01-01
Purpose - This paper seeks to take a cautionary stance to the impact of the marketing mix on customer satisfaction, via a case study deriving consensus rankings for benchmarking on selected retail stores in Malaysia. Design/methodology/approach - The ELECTRE I model is used in deriving consensus rankings via multicriteria decision making method for benchmarking base on the marketing mix model 4P's. Descriptive analysis is used to analyze best practice among the four marketing tactics. Finding...
An Automated Approach for Ranking Journals to Help in Clinician Decision Support
Jonnalagadda, Siddhartha R.; Moosavinasab, Soheil; Nath, Chinmoy; Li, Dingcheng; Chute, Christopher G.; Liu, Hongfang
2014-01-01
Point of care access to knowledge from full text journal articles supports decision-making and decreases medical errors. However, it is an overwhelming task to search through full text journal articles and find quality information needed by clinicians. We developed a method to rate journals for a given clinical topic, Congestive Heart Failure (CHF). Our method enables filtering of journals and ranking of journal articles based on source journal in relation to CHF. We also obtained a journal priority score, which automatically rates any journal based on its importance to CHF. Comparing our ranking with data gathered by surveying 169 cardiologists, who publish on CHF, our best Multiple Linear Regression model showed a correlation of 0.880, based on five-fold cross validation. Our ranking system can be extended to other clinical topics. PMID:25954382
Statistics of Monte Carlo methods used in radiation transport calculation
International Nuclear Information System (INIS)
Datta, D.
2009-01-01
Radiation transport calculation can be carried out by using either deterministic or statistical methods. Radiation transport calculation based on statistical methods is basic theme of the Monte Carlo methods. The aim of this lecture is to describe the fundamental statistics required to build the foundations of Monte Carlo technique for radiation transport calculation. Lecture note is organized in the following way. Section (1) will describe the introduction of Basic Monte Carlo and its classification towards the respective field. Section (2) will describe the random sampling methods, a key component of Monte Carlo radiation transport calculation, Section (3) will provide the statistical uncertainty of Monte Carlo estimates, Section (4) will describe in brief the importance of variance reduction techniques while sampling particles such as photon, or neutron in the process of radiation transport
Directory of Open Access Journals (Sweden)
M. Mousavi
2014-06-01
Full Text Available Evaluating and prioritizing appropriate renewable energy sources is inevitably a complex decision process. Various information and conflicting attributes should be taken into account. For this purpose, multi-attribute decision making (MADM methods can assist managers or decision makers in formulating renewable energy sources priorities by considering important objective and attributes. In this paper, a new extension of compromise ranking method with interval numbers is presented for the prioritization of renewable energy sources that is based on the performance similarity of alternatives to ideal solutions. To demonstrate the applicability of the proposed decision method, an application example is provided and the computational results are analyzed. Results illustrate that the presented method is viable in solving the evaluation and prioritization problem of renewable energy sources.
Multivariate statistical methods and data mining in particle physics (4/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (2/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (1/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
LogDet Rank Minimization with Application to Subspace Clustering
Directory of Open Access Journals (Sweden)
Zhao Kang
2015-01-01
Full Text Available Low-rank matrix is desired in many machine learning and computer vision problems. Most of the recent studies use the nuclear norm as a convex surrogate of the rank operator. However, all singular values are simply added together by the nuclear norm, and thus the rank may not be well approximated in practical problems. In this paper, we propose using a log-determinant (LogDet function as a smooth and closer, though nonconvex, approximation to rank for obtaining a low-rank representation in subspace clustering. Augmented Lagrange multipliers strategy is applied to iteratively optimize the LogDet-based nonconvex objective function on potentially large-scale data. By making use of the angular information of principal directions of the resultant low-rank representation, an affinity graph matrix is constructed for spectral clustering. Experimental results on motion segmentation and face clustering data demonstrate that the proposed method often outperforms state-of-the-art subspace clustering algorithms.
Ranking Operations Management conferences
Steenhuis, H.J.; de Bruijn, E.J.; Gupta, Sushil; Laptaned, U
2007-01-01
Several publications have appeared in the field of Operations Management which rank Operations Management related journals. Several ranking systems exist for journals based on , for example, perceived relevance and quality, citation, and author affiliation. Many academics also publish at conferences
Zhang, Kejiang; Achari, Gopal; Pei, Yuansheng
2010-10-01
Different types of uncertain information-linguistic, probabilistic, and possibilistic-exist in site characterization. Their representation and propagation significantly influence the management of contaminated sites. In the absence of a framework with which to properly represent and integrate these quantitative and qualitative inputs together, decision makers cannot fully take advantage of the available and necessary information to identify all the plausible alternatives. A systematic methodology was developed in the present work to incorporate linguistic, probabilistic, and possibilistic information into the Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE), a subgroup of Multi-Criteria Decision Analysis (MCDA) methods for ranking contaminated sites. The identification of criteria based on the paradigm of comparative risk assessment provides a rationale for risk-based prioritization. Uncertain linguistic, probabilistic, and possibilistic information identified in characterizing contaminated sites can be properly represented as numerical values, intervals, probability distributions, and fuzzy sets or possibility distributions, and linguistic variables according to their nature. These different kinds of representation are first transformed into a 2-tuple linguistic representation domain. The propagation of hybrid uncertainties is then carried out in the same domain. This methodology can use the original site information directly as much as possible. The case study shows that this systematic methodology provides more reasonable results. © 2010 SETAC.
14 CFR 1214.1105 - Final ranking.
2010-01-01
... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Final ranking. 1214.1105 Section 1214.1105... Recruitment and Selection Program § 1214.1105 Final ranking. Final rankings will be based on a combination of... preference will be included in this final ranking in accordance with applicable regulations. ...
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.
Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang
2015-01-01
RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
Reduced-Rank Adaptive Filtering Using Krylov Subspace
Directory of Open Access Journals (Sweden)
Sergueï Burykh
2003-01-01
Full Text Available A unified view of several recently introduced reduced-rank adaptive filters is presented. As all considered methods use Krylov subspace for rank reduction, the approach taken in this work is inspired from Krylov subspace methods for iterative solutions of linear systems. The alternative interpretation so obtained is used to study the properties of each considered technique and to relate one reduced-rank method to another as well as to algorithms used in computational linear algebra. Practical issues are discussed and low-complexity versions are also included in our study. It is believed that the insight developed in this paper can be further used to improve existing reduced-rank methods according to known results in the domain of Krylov subspace methods.
Selection of suitable e-learning approach using TOPSIS technique with best ranked criteria weights
Mohammed, Husam Jasim; Kasim, Maznah Mat; Shaharanee, Izwan Nizal Mohd
2017-11-01
This paper compares the performances of four rank-based weighting assessment techniques, Rank Sum (RS), Rank Reciprocal (RR), Rank Exponent (RE), and Rank Order Centroid (ROC) on five identified e-learning criteria to select the best weights method. A total of 35 experts in a public university in Malaysia were asked to rank the criteria and to evaluate five e-learning approaches which include blended learning, flipped classroom, ICT supported face to face learning, synchronous learning, and asynchronous learning. The best ranked criteria weights are defined as weights that have the least total absolute differences with the geometric mean of all weights, were then used to select the most suitable e-learning approach by using TOPSIS method. The results show that RR weights are the best, while flipped classroom approach implementation is the most suitable approach. This paper has developed a decision framework to aid decision makers (DMs) in choosing the most suitable weighting method for solving MCDM problems.
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Statistical methods in nuclear theory
International Nuclear Information System (INIS)
Shubin, Yu.N.
1974-01-01
The paper outlines statistical methods which are widely used for describing properties of excited states of nuclei and nuclear reactions. It discusses physical assumptions lying at the basis of known distributions between levels (Wigner, Poisson distributions) and of widths of highly excited states (Porter-Thomas distribution, as well as assumptions used in the statistical theory of nuclear reactions and in the fluctuation analysis. The author considers the random matrix method, which consists in replacing the matrix elements of a residual interaction by random variables with a simple statistical distribution. Experimental data are compared with results of calculations using the statistical model. The superfluid nucleus model is considered with regard to superconducting-type pair correlations
Multicriterial ranking approach for evaluating bank branch performance
Aleskerov, F; Ersel, H; Yolalan, R
14 ranking methods based on multiple criteria are suggested for evaluating the performance of the bank branches. The methods are explained via an illustrative example, and some of them are applied to a real-life data for 23 retail bank branches in a large-scale private Turkish commercial bank.
Tensor completion and low-n-rank tensor recovery via convex optimization
International Nuclear Information System (INIS)
Gandy, Silvia; Yamada, Isao; Recht, Benjamin
2011-01-01
In this paper we consider sparsity on a tensor level, as given by the n-rank of a tensor. In an important sparse-vector approximation problem (compressed sensing) and the low-rank matrix recovery problem, using a convex relaxation technique proved to be a valuable solution strategy. Here, we will adapt these techniques to the tensor setting. We use the n-rank of a tensor as a sparsity measure and consider the low-n-rank tensor recovery problem, i.e. the problem of finding the tensor of the lowest n-rank that fulfills some linear constraints. We introduce a tractable convex relaxation of the n-rank and propose efficient algorithms to solve the low-n-rank tensor recovery problem numerically. The algorithms are based on the Douglas–Rachford splitting technique and its dual variant, the alternating direction method of multipliers
Ranking Institutional Settings Based on Publications in Community Psychology Journals
Jason, Leonard A.; Pokorny, Steven B.; Patka, Mazna; Adams, Monica; Morello, Taylor
2007-01-01
Two primary outlets for community psychology research, the "American Journal of Community Psychology" and the "Journal of Community Psychology", were assessed to rank institutions based on publication frequency and scientific influence of publications over a 32-year period. Three specific periods were assessed (1973-1983, 1984-1994, 1995-2004).…
Statistical Methods in Integrative Genomics
Richardson, Sylvia; Tseng, George C.; Sun, Wei
2016-01-01
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531
Directory of Open Access Journals (Sweden)
Yi-hua Zhong
2013-01-01
Full Text Available Recently, various methods have been developed for solving linear programming problems with fuzzy number, such as simplex method and dual simplex method. But their computational complexities are exponential, which is not satisfactory for solving large-scale fuzzy linear programming problems, especially in the engineering field. A new method which can solve large-scale fuzzy number linear programming problems is presented in this paper, which is named a revised interior point method. Its idea is similar to that of interior point method used for solving linear programming problems in crisp environment before, but its feasible direction and step size are chosen by using trapezoidal fuzzy numbers, linear ranking function, fuzzy vector, and their operations, and its end condition is involved in linear ranking function. Their correctness and rationality are proved. Moreover, choice of the initial interior point and some factors influencing the results of this method are also discussed and analyzed. The result of algorithm analysis and example study that shows proper safety factor parameter, accuracy parameter, and initial interior point of this method may reduce iterations and they can be selected easily according to the actual needs. Finally, the method proposed in this paper is an alternative method for solving fuzzy number linear programming problems.
Aihong Ren
2016-01-01
This paper is concerned with a class of fully fuzzy bilevel linear programming problems where all the coefficients and decision variables of both objective functions and the constraints are fuzzy numbers. A new approach based on deviation degree measures and a ranking function method is proposed to solve these problems. We first introduce concepts of the feasible region and the fuzzy optimal solution of a fully fuzzy bilevel linear programming problem. In order to obtain a fuzzy optimal solut...
Fan, Hong; Zhu, Anfeng; Zhang, Weixia
2015-12-01
In order to meet the rapid positioning of 12315 complaints, aiming at the natural language expression of telephone complaints, a semantic retrieval framework is proposed which is based on natural language parsing and geographical names ontology reasoning. Among them, a search result ranking and recommended algorithms is proposed which is regarding both geo-name conceptual similarity and spatial geometry relation similarity. The experiments show that this method can assist the operator to quickly find location of 12,315 complaints, increased industry and commerce customer satisfaction.
Methods of statistical physics
Akhiezer, Aleksandr I
1981-01-01
Methods of Statistical Physics is an exposition of the tools of statistical mechanics, which evaluates the kinetic equations of classical and quantized systems. The book also analyzes the equations of macroscopic physics, such as the equations of hydrodynamics for normal and superfluid liquids and macroscopic electrodynamics. The text gives particular attention to the study of quantum systems. This study begins with a discussion of problems of quantum statistics with a detailed description of the basics of quantum mechanics along with the theory of measurement. An analysis of the asymptotic be
Ing, Alex; Schwarzbauer, Christian
2014-01-01
Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.
Diagnosing and ranking retinopathy disease level using diabetic fundus image recuperation approach.
Somasundaram, K; Rajendran, P Alli
2015-01-01
Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time.
International Nuclear Information System (INIS)
Dordevic, N.; Wehrens, R.; Postma, G.J.; Buydens, L.M.C.; Camin, F.
2012-01-01
Highlights: ► The assessment of claims of origin is of enormous economic importance for DOC and DOCG wines. ► The official method is based on univariate statistical tests of H, C and O isotopic ratios. ► We consider 5220 Italian wine samples collected in the period 2000–2010. ► Multivariate statistical analysis leads to much better specificity and easier detection of false claims of origin. ► In the case of multi-modal data, mixture modelling provides additional improvements. - Abstract: Wine derives its economic value to a large extent from geographical origin, which has a significant impact on the quality of the wine. According to the food legislation, wines can be without geographical origin (table wine) and wines with origin. Wines with origin must have characteristics which are essential due to its region of production and must be produced, processed and prepared, exclusively within that region. The development of fast and reliable analytical methods for the assessment of claims of origin is very important. The current official method is based on the measurement of stable isotope ratios of water and alcohol in wine, which are influenced by climatic factors. The results in this paper are based on 5220 Italian wine samples collected in the period 2000–2010. We evaluate the univariate approach underlying the official method to assess claims of origin and propose several new methods to get better geographical discrimination between samples. It is shown that multivariate methods are superior to univariate approaches in that they show increased sensitivity and specificity. In cases where data are non-normally distributed, an approach based on mixture modelling provides additional improvements.
Directory of Open Access Journals (Sweden)
Hongyang Lu
2016-06-01
Full Text Available Because of the contradiction between the spatial and temporal resolution of remote sensing images (RSI and quality loss in the process of acquisition, it is of great significance to reconstruct RSI in remote sensing applications. Recent studies have demonstrated that reference image-based reconstruction methods have great potential for higher reconstruction performance, while lacking accuracy and quality of reconstruction. For this application, a new compressed sensing objective function incorporating a reference image as prior information is developed. We resort to the reference prior information inherent in interior and exterior data simultaneously to build a new generalized nonconvex low-rank approximation framework for RSI reconstruction. Specifically, the innovation of this paper consists of the following three respects: (1 we propose a nonconvex low-rank approximation for reconstructing RSI; (2 we inject reference prior information to overcome over smoothed edges and texture detail losses; (3 on this basis, we combine conjugate gradient algorithms and a single-value threshold (SVT simultaneously to solve the proposed algorithm. The performance of the algorithm is evaluated both qualitatively and quantitatively. Experimental results demonstrate that the proposed algorithm improves several dBs in terms of peak signal to noise ratio (PSNR and preserves image details significantly compared to most of the current approaches without reference images as priors. In addition, the generalized nonconvex low-rank approximation of our approach is naturally robust to noise, and therefore, the proposed algorithm can handle low resolution with noisy inputs in a more unified framework.
A rank-based sequence aligner with applications in phylogenetic analysis.
Directory of Open Access Journals (Sweden)
Liviu P Dinu
Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.
Statistical analysis tolerance using jacobian torsor model based on uncertainty propagation method
Directory of Open Access Journals (Sweden)
W Ghie
2016-04-01
Full Text Available One risk inherent in the use of assembly components is that the behaviourof these components is discovered only at the moment an assembly isbeing carried out. The objective of our work is to enable designers to useknown component tolerances as parameters in models that can be usedto predict properties at the assembly level. In this paper we present astatistical approach to assemblability evaluation, based on tolerance andclearance propagations. This new statistical analysis method for toleranceis based on the Jacobian-Torsor model and the uncertainty measurementapproach. We show how this can be accomplished by modeling thedistribution of manufactured dimensions through applying a probabilitydensity function. By presenting an example we show how statisticaltolerance analysis should be used in the Jacobian-Torsor model. This workis supported by previous efforts aimed at developing a new generation ofcomputational tools for tolerance analysis and synthesis, using theJacobian-Torsor approach. This approach is illustrated on a simple threepartassembly, demonstrating the method’s capability in handling threedimensionalgeometry.
Statistical method application to knowledge base building for reactor accident diagnostic system
International Nuclear Information System (INIS)
Yoshida, Kazuo; Yokobayashi, Masao; Matsumoto, Kiyoshi; Kohsaka, Atsuo
1989-01-01
In the development of a knowledge based expert system, one of key issues is how to build the knowledge base (KB) in an efficient way with keeping the objectivity of KB. In order to solve this issue, an approach has been proposed to build a prototype KB systematically by a statistical method, factor analysis. For the verification of this approach, factor analysis was applied to build a prototype KB for the JAERI expert system DISKET. To this end, alarm and process information was generated by a PWR simulator and the factor analysis was applied to this information to define taxonomy of accident hypotheses and to extract rules for each hypothesis. The prototype KB thus built was tested through inferring against several types of transients including double-failures. In each diagnosis, the transient type was well identified. Furthermore, newly introduced standards for rule extraction showed good effects on the enhancement of the performance of prototype KB. (author)
The application of statistical methods to assess economic assets
Directory of Open Access Journals (Sweden)
D. V. Dianov
2017-01-01
Full Text Available The article is devoted to consideration and evaluation of machinery, equipment and special equipment, methodological aspects of the use of standards for assessment of buildings and structures in current prices, the valuation of residential, specialized houses, office premises, assessment and reassessment of existing and inactive military assets, the application of statistical methods to obtain the relevant cost estimates.The objective of the scientific article is to consider possible application of statistical tools in the valuation of the assets, composing the core group of elements of national wealth – the fixed assets. Firstly, capital tangible assets constitute the basis of material base of a new value creation, products and non-financial services. The gain, accumulated of tangible assets of a capital nature is a part of the gross domestic product, and from its volume and specific weight in the composition of GDP we can judge the scope of reproductive processes in the country.Based on the methodological materials of the state statistics bodies of the Russian Federation, regulations of the theory of statistics, which describe the methods of statistical analysis such as the index, average values, regression, the methodical approach is structured in the application of statistical tools to obtain value estimates of property, plant and equipment with significant accumulated depreciation. Until now, the use of statistical methodology in the practice of economic assessment of assets is only fragmentary. This applies to both Federal Legislation (Federal law № 135 «On valuation activities in the Russian Federation» dated 16.07.1998 in edition 05.07.2016 and the methodological documents and regulations of the estimated activities, in particular, the valuation activities’ standards. A particular problem is the use of a digital database of Rosstat (Federal State Statistics Service, as to the specific fixed assets the comparison should be carried
Asympotic efficiency of signed - rank symmetry tests under skew alternatives.
Alessandra Durio; Yakov Nikitin
2002-01-01
The efficiency of some known tests for symmetry such as the sign test, the Wilcoxon signed-rank test or more general linear signed rank tests was studied mainly under the classical alternatives of location. However it is interesting to compare the efficiencies of these tests under asymmetric alternatives like the so-called skew alternative proposed in Azzalini (1985). We find and compare local Bahadur efficiencies of linear signed-rank statistics for skew alternatives and discuss also the con...
Multivariate statistical methods a first course
Marcoulides, George A
2014-01-01
Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Statistical methods for physical science
Stanford, John L
1994-01-01
This volume of Methods of Experimental Physics provides an extensive introduction to probability and statistics in many areas of the physical sciences, with an emphasis on the emerging area of spatial statistics. The scope of topics covered is wide-ranging-the text discusses a variety of the most commonly used classical methods and addresses newer methods that are applicable or potentially important. The chapter authors motivate readers with their insightful discussions, augmenting their material withKey Features* Examines basic probability, including coverage of standard distributions, time s
Statistical lamb wave localization based on extreme value theory
Harley, Joel B.
2018-04-01
Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.
2016-08-17
Specialized Finite Set Statistics (FISST)-based Estimation Methods to Enhance Space Situational Awareness in Medium Earth Orbit (MEO) and Geostationary...terms of specialized Geostationary Earth Orbit (GEO) elements to estimate the state of resident space objects in the geostationary regime. Justification...AFRL-RV-PS- AFRL-RV-PS- TR-2016-0114 TR-2016-0114 SPECIALIZED FINITE SET STATISTICS (FISST)- BASED ESTIMATION METHODS TO ENHANCE SPACE SITUATIONAL
Novel Opportunistic Network Routing Based on Social Rank for Device-to-Device Communication
Directory of Open Access Journals (Sweden)
Tong Wang
2017-01-01
Full Text Available In recent years, there has been dramatic proliferation of research concerned with fifth-generation (5G mobile communication networks, among which device-to-device (D2D communication is one of the key technologies. Due to the intermittent connection of nodes, the D2D network topology may be disconnected frequently, which will lead to failure in transmission of large data files. In opportunistic networks, in case of encountering nodes which never meet before a flood message blindly to cause tremendous network overhead, a novel opportunistic network routing protocol based on social rank and intermeeting time (SRIT is proposed in this paper. An improved utility approach applied in utility replication based on encounter durations and intermeeting time is put forward to enhance the routing efficiency. Meanwhile, in order to select better candidate nodes in the network, a social graph among people is established when they socially relate to each other in social rank replication. The results under the scenario show an advantage of the proposed opportunistic network routing based on social rank and intermeeting time (SRIT over the compared algorithms in terms of delivery ratio, average delivery latency, and overhead ratio.
A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.
Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe
2012-04-01
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.
Energy Technology Data Exchange (ETDEWEB)
Lithner, Delilah, E-mail: delilah.lithner@gmail.com; Larsson, Ake; Dave, Goeran
2011-08-15
Plastics constitute a large material group with a global annual production that has doubled in 15 years (245 million tonnes in 2008). Plastics are present everywhere in society and the environment, especially the marine environment, where large amounts of plastic waste accumulate. The knowledge of human and environmental hazards and risks from chemicals associated with the diversity of plastic products is very limited. Most chemicals used for producing plastic polymers are derived from non-renewable crude oil, and several are hazardous. These may be released during the production, use and disposal of the plastic product. In this study the environmental and health hazards of chemicals used in 55 thermoplastic and thermosetting polymers were identified and compiled. A hazard ranking model was developed for the hazard classes and categories in the EU classification and labelling (CLP) regulation which is based on the UN Globally Harmonized System. The polymers were ranked based on monomer hazard classifications, and initial assessments were made. The polymers that ranked as most hazardous are made of monomers classified as mutagenic and/or carcinogenic (category 1A or 1B). These belong to the polymer families of polyurethanes, polyacrylonitriles, polyvinyl chloride, epoxy resins, and styrenic copolymers. All have a large global annual production (1-37 million tonnes). A considerable number of polymers (31 out of 55) are made of monomers that belong to the two worst of the ranking model's five hazard levels, i.e. levels IV-V. The polymers that are made of level IV monomers and have a large global annual production (1-5 million tonnes) are phenol formaldehyde resins, unsaturated polyesters, polycarbonate, polymethyl methacrylate, and urea-formaldehyde resins. This study has identified hazardous substances used in polymer production for which the risks should be evaluated for decisions on the need for risk reduction measures, substitution, or even phase out
Statistical analysis of brake squeal noise
Oberst, S.; Lai, J. C. S.
2011-06-01
Despite substantial research efforts applied to the prediction of brake squeal noise since the early 20th century, the mechanisms behind its generation are still not fully understood. Squealing brakes are of significant concern to the automobile industry, mainly because of the costs associated with warranty claims. In order to remedy the problems inherent in designing quieter brakes and, therefore, to understand the mechanisms, a design of experiments study, using a noise dynamometer, was performed by a brake system manufacturer to determine the influence of geometrical parameters (namely, the number and location of slots) of brake pads on brake squeal noise. The experimental results were evaluated with a noise index and ranked for warm and cold brake stops. These data are analysed here using statistical descriptors based on population distributions, and a correlation analysis, to gain greater insight into the functional dependency between the time-averaged friction coefficient as the input and the peak sound pressure level data as the output quantity. The correlation analysis between the time-averaged friction coefficient and peak sound pressure data is performed by applying a semblance analysis and a joint recurrence quantification analysis. Linear measures are compared with complexity measures (nonlinear) based on statistics from the underlying joint recurrence plots. Results show that linear measures cannot be used to rank the noise performance of the four test pad configurations. On the other hand, the ranking of the noise performance of the test pad configurations based on the noise index agrees with that based on nonlinear measures: the higher the nonlinearity between the time-averaged friction coefficient and peak sound pressure, the worse the squeal. These results highlight the nonlinear character of brake squeal and indicate the potential of using nonlinear statistical analysis tools to analyse disc brake squeal.
Chen, Jinying; Yu, Hong
2017-04-01
Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care. The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word co-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines. FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<0.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. FIT can automatically identify EHR terms important to patients. It may help develop future interventions
Directory of Open Access Journals (Sweden)
Xiaoying Li
2018-01-01
Full Text Available Aberrant expression of microRNAs (miRNAs can be applied for the diagnosis, prognosis, and treatment of human diseases. Identifying the relationship between miRNA and human disease is important to further investigate the pathogenesis of human diseases. However, experimental identification of the associations between diseases and miRNAs is time-consuming and expensive. Computational methods are efficient approaches to determine the potential associations between diseases and miRNAs. This paper presents a new computational method based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction (SRMDAP. The AUC of 0.8838 based on leave-one-out cross-validation and case studies suggested the excellent performance of the SRMDAP in predicting miRNA-disease associations. SRMDAP could also predict diseases without any related miRNAs and miRNAs without any related diseases.
Statistical Methods for Environmental Pollution Monitoring
Energy Technology Data Exchange (ETDEWEB)
Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
1987-01-01
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.
Robust statistical methods with R
Jureckova, Jana
2005-01-01
Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...
Directory of Open Access Journals (Sweden)
Yin Yanshu
2017-12-01
Full Text Available In this paper, a location-based multiple point statistics method is developed to model a non-stationary reservoir. The proposed method characterizes the relationship between the sedimentary pattern and the deposit location using the relative central position distance function, which alleviates the requirement that the training image and the simulated grids have the same dimension. The weights in every direction of the distance function can be changed to characterize the reservoir heterogeneity in various directions. The local integral replacements of data events, structured random path, distance tolerance and multi-grid strategy are applied to reproduce the sedimentary patterns and obtain a more realistic result. This method is compared with the traditional Snesim method using a synthesized 3-D training image of Poyang Lake and a reservoir model of Shengli Oilfield in China. The results indicate that the new method can reproduce the non-stationary characteristics better than the traditional method and is more suitable for simulation of delta-front deposits. These results show that the new method is a powerful tool for modelling a reservoir with non-stationary characteristics.
Energy Technology Data Exchange (ETDEWEB)
Dordevic, N.; Wehrens, R. [IASMA Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all' Adige (Italy); Postma, G.J.; Buydens, L.M.C. [Radboud University Nijmegen, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500 GL Nijmegen (Netherlands); Camin, F., E-mail: federica.camin@fmach.it [IASMA Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all' Adige (Italy)
2012-12-13
Highlights: Black-Right-Pointing-Pointer The assessment of claims of origin is of enormous economic importance for DOC and DOCG wines. Black-Right-Pointing-Pointer The official method is based on univariate statistical tests of H, C and O isotopic ratios. Black-Right-Pointing-Pointer We consider 5220 Italian wine samples collected in the period 2000-2010. Black-Right-Pointing-Pointer Multivariate statistical analysis leads to much better specificity and easier detection of false claims of origin. Black-Right-Pointing-Pointer In the case of multi-modal data, mixture modelling provides additional improvements. - Abstract: Wine derives its economic value to a large extent from geographical origin, which has a significant impact on the quality of the wine. According to the food legislation, wines can be without geographical origin (table wine) and wines with origin. Wines with origin must have characteristics which are essential due to its region of production and must be produced, processed and prepared, exclusively within that region. The development of fast and reliable analytical methods for the assessment of claims of origin is very important. The current official method is based on the measurement of stable isotope ratios of water and alcohol in wine, which are influenced by climatic factors. The results in this paper are based on 5220 Italian wine samples collected in the period 2000-2010. We evaluate the univariate approach underlying the official method to assess claims of origin and propose several new methods to get better geographical discrimination between samples. It is shown that multivariate methods are superior to univariate approaches in that they show increased sensitivity and specificity. In cases where data are non-normally distributed, an approach based on mixture modelling provides additional improvements.
Tensor Factorization for Low-Rank Tensor Completion.
Zhou, Pan; Lu, Canyi; Lin, Zhouchen; Zhang, Chao
2018-03-01
Recently, a tensor nuclear norm (TNN) based method was proposed to solve the tensor completion problem, which has achieved state-of-the-art performance on image and video inpainting tasks. However, it requires computing tensor singular value decomposition (t-SVD), which costs much computation and thus cannot efficiently handle tensor data, due to its natural large scale. Motivated by TNN, we propose a novel low-rank tensor factorization method for efficiently solving the 3-way tensor completion problem. Our method preserves the low-rank structure of a tensor by factorizing it into the product of two tensors of smaller sizes. In the optimization process, our method only needs to update two smaller tensors, which can be more efficiently conducted than computing t-SVD. Furthermore, we prove that the proposed alternating minimization algorithm can converge to a Karush-Kuhn-Tucker point. Experimental results on the synthetic data recovery, image and video inpainting tasks clearly demonstrate the superior performance and efficiency of our developed method over state-of-the-arts including the TNN and matricization methods.
Low-Rank Sparse Coding for Image Classification
Zhang, Tianzhu; Ghanem, Bernard; Liu, Si; Xu, Changsheng; Ahuja, Narendra
2013-01-01
In this paper, we propose a low-rank sparse coding (LRSC) method that exploits local structure information among features in an image for the purpose of image-level classification. LRSC represents densely sampled SIFT descriptors, in a spatial neighborhood, collectively as low-rank, sparse linear combinations of code words. As such, it casts the feature coding problem as a low-rank matrix learning problem, which is different from previous methods that encode features independently. This LRSC has a number of attractive properties. (1) It encourages sparsity in feature codes, locality in codebook construction, and low-rankness for spatial consistency. (2) LRSC encodes local features jointly by considering their low-rank structure information, and is computationally attractive. We evaluate the LRSC by comparing its performance on a set of challenging benchmarks with that of 7 popular coding and other state-of-the-art methods. Our experiments show that by representing local features jointly, LRSC not only outperforms the state-of-the-art in classification accuracy but also improves the time complexity of methods that use a similar sparse linear representation model for feature coding.
Low-Rank Sparse Coding for Image Classification
Zhang, Tianzhu
2013-12-01
In this paper, we propose a low-rank sparse coding (LRSC) method that exploits local structure information among features in an image for the purpose of image-level classification. LRSC represents densely sampled SIFT descriptors, in a spatial neighborhood, collectively as low-rank, sparse linear combinations of code words. As such, it casts the feature coding problem as a low-rank matrix learning problem, which is different from previous methods that encode features independently. This LRSC has a number of attractive properties. (1) It encourages sparsity in feature codes, locality in codebook construction, and low-rankness for spatial consistency. (2) LRSC encodes local features jointly by considering their low-rank structure information, and is computationally attractive. We evaluate the LRSC by comparing its performance on a set of challenging benchmarks with that of 7 popular coding and other state-of-the-art methods. Our experiments show that by representing local features jointly, LRSC not only outperforms the state-of-the-art in classification accuracy but also improves the time complexity of methods that use a similar sparse linear representation model for feature coding.
Nonparametric statistics with applications to science and engineering
Kvam, Paul H
2007-01-01
A thorough and definitive book that fully addresses traditional and modern-day topics of nonparametric statistics This book presents a practical approach to nonparametric statistical analysis and provides comprehensive coverage of both established and newly developed methods. With the use of MATLAB, the authors present information on theorems and rank tests in an applied fashion, with an emphasis on modern methods in regression and curve fitting, bootstrap confidence intervals, splines, wavelets, empirical likelihood, and goodness-of-fit testing. Nonparametric Statistics with Applications to Science and Engineering begins with succinct coverage of basic results for order statistics, methods of categorical data analysis, nonparametric regression, and curve fitting methods. The authors then focus on nonparametric procedures that are becoming more relevant to engineering researchers and practitioners. The important fundamental materials needed to effectively learn and apply the discussed methods are also provide...
Gershenson, Carlos
Studies of rank distributions have been popular for decades, especially since the work of Zipf. For example, if we rank words of a given language by use frequency (most used word in English is 'the', rank 1; second most common word is 'of', rank 2), the distribution can be approximated roughly with a power law. The same applies for cities (most populated city in a country ranks first), earthquakes, metabolism, the Internet, and dozens of other phenomena. We recently proposed ``rank diversity'' to measure how ranks change in time, using the Google Books Ngram dataset. Studying six languages between 1800 and 2009, we found that the rank diversity curves of languages are universal, adjusted with a sigmoid on log-normal scale. We are studying several other datasets (sports, economies, social systems, urban systems, earthquakes, artificial life). Rank diversity seems to be universal, independently of the shape of the rank distribution. I will present our work in progress towards a general description of the features of rank change in time, along with simple models which reproduce it
Low rank alternating direction method of multipliers reconstruction for MR fingerprinting.
Assländer, Jakob; Cloos, Martijn A; Knoll, Florian; Sodickson, Daniel K; Hennig, Jürgen; Lattanzi, Riccardo
2018-01-01
The proposed reconstruction framework addresses the reconstruction accuracy, noise propagation and computation time for magnetic resonance fingerprinting. Based on a singular value decomposition of the signal evolution, magnetic resonance fingerprinting is formulated as a low rank (LR) inverse problem in which one image is reconstructed for each singular value under consideration. This LR approximation of the signal evolution reduces the computational burden by reducing the number of Fourier transformations. Also, the LR approximation improves the conditioning of the problem, which is further improved by extending the LR inverse problem to an augmented Lagrangian that is solved by the alternating direction method of multipliers. The root mean square error and the noise propagation are analyzed in simulations. For verification, in vivo examples are provided. The proposed LR alternating direction method of multipliers approach shows a reduced root mean square error compared to the original fingerprinting reconstruction, to a LR approximation alone and to an alternating direction method of multipliers approach without a LR approximation. Incorporating sensitivity encoding allows for further artifact reduction. The proposed reconstruction provides robust convergence, reduced computational burden and improved image quality compared to other magnetic resonance fingerprinting reconstruction approaches evaluated in this study. Magn Reson Med 79:83-96, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Stoenescu, Tudor M.; Woo, Simon S.
2009-01-01
In this work, we consider information dissemination and sharing in a distributed peer-to-peer (P2P highly dynamic communication network. In particular, we explore a network coding technique for transmission and a rank based peer selection method for network formation. The combined approach has been shown to improve information sharing and delivery to all users when considering the challenges imposed by the space network environments.
On Rank Driven Dynamical Systems
Veerman, J. J. P.; Prieto, F. J.
2014-08-01
We investigate a class of models related to the Bak-Sneppen (BS) model, initially proposed to study evolution. The BS model is extremely simple and yet captures some forms of "complex behavior" such as self-organized criticality that is often observed in physical and biological systems. In this model, random fitnesses in are associated to agents located at the vertices of a graph . Their fitnesses are ranked from worst (0) to best (1). At every time-step the agent with the worst fitness and some others with a priori given rank probabilities are replaced by new agents with random fitnesses. We consider two cases: The exogenous case where the new fitnesses are taken from an a priori fixed distribution, and the endogenous case where the new fitnesses are taken from the current distribution as it evolves. We approximate the dynamics by making a simplifying independence assumption. We use Order Statistics and Dynamical Systems to define a rank-driven dynamical system that approximates the evolution of the distribution of the fitnesses in these rank-driven models, as well as in the BS model. For this simplified model we can find the limiting marginal distribution as a function of the initial conditions. Agreement with experimental results of the BS model is excellent.
Bennett, Iain; Paracha, Noman; Abrams, Keith; Ray, Joshua
2018-01-01
Rank Preserving Structural Failure Time models are one of the most commonly used statistical methods to adjust for treatment switching in oncology clinical trials. The method is often applied in a decision analytic model without appropriately accounting for additional uncertainty when determining the allocation of health care resources. The aim of the study is to describe novel approaches to adequately account for uncertainty when using a Rank Preserving Structural Failure Time model in a decision analytic model. Using two examples, we tested and compared the performance of the novel Test-based method with the resampling bootstrap method and with the conventional approach of no adjustment. In the first example, we simulated life expectancy using a simple decision analytic model based on a hypothetical oncology trial with treatment switching. In the second example, we applied the adjustment method on published data when no individual patient data were available. Mean estimates of overall and incremental life expectancy were similar across methods. However, the bootstrapped and test-based estimates consistently produced greater estimates of uncertainty compared with the estimate without any adjustment applied. Similar results were observed when using the test based approach on a published data showing that failing to adjust for uncertainty led to smaller confidence intervals. Both the bootstrapping and test-based approaches provide a solution to appropriately incorporate uncertainty, with the benefit that the latter can implemented by researchers in the absence of individual patient data. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Learning of Rule Ensembles for Multiple Attribute Ranking Problems
Dembczyński, Krzysztof; Kotłowski, Wojciech; Słowiński, Roman; Szeląg, Marcin
In this paper, we consider the multiple attribute ranking problem from a Machine Learning perspective. We propose two approaches to statistical learning of an ensemble of decision rules from decision examples provided by the Decision Maker in terms of pairwise comparisons of some objects. The first approach consists in learning a preference function defining a binary preference relation for a pair of objects. The result of application of this function on all pairs of objects to be ranked is then exploited using the Net Flow Score procedure, giving a linear ranking of objects. The second approach consists in learning a utility function for single objects. The utility function also gives a linear ranking of objects. In both approaches, the learning is based on the boosting technique. The presented approaches to Preference Learning share good properties of the decision rule preference model and have good performance in the massive-data learning problems. As Preference Learning and Multiple Attribute Decision Aiding share many concepts and methodological issues, in the introduction, we review some aspects bridging these two fields. To illustrate the two approaches proposed in this paper, we solve with them a toy example concerning the ranking of a set of cars evaluated by multiple attributes. Then, we perform a large data experiment on real data sets. The first data set concerns credit rating. Since recent research in the field of Preference Learning is motivated by the increasing role of modeling preferences in recommender systems and information retrieval, we chose two other massive data sets from this area - one comes from movie recommender system MovieLens, and the other concerns ranking of text documents from 20 Newsgroups data set.
Content-based image retrieval with ontological ranking
Tsai, Shen-Fu; Tsai, Min-Hsuan; Huang, Thomas S.
2010-02-01
Images are a much more powerful medium of expression than text, as the adage says: "One picture is worth a thousand words." It is because compared with text consisting of an array of words, an image has more degrees of freedom and therefore a more complicated structure. However, the less limited structure of images presents researchers in the computer vision community a tough task of teaching machines to understand and organize images, especially when a limit number of learning examples and background knowledge are given. The advance of internet and web technology in the past decade has changed the way human gain knowledge. People, hence, can exchange knowledge with others by discussing and contributing information on the web. As a result, the web pages in the internet have become a living and growing source of information. One is therefore tempted to wonder whether machines can learn from the web knowledge base as well. Indeed, it is possible to make computer learn from the internet and provide human with more meaningful knowledge. In this work, we explore this novel possibility on image understanding applied to semantic image search. We exploit web resources to obtain links from images to keywords and a semantic ontology constituting human's general knowledge. The former maps visual content to related text in contrast to the traditional way of associating images with surrounding text; the latter provides relations between concepts for machines to understand to what extent and in what sense an image is close to the image search query. With the aid of these two tools, the resulting image search system is thus content-based and moreover, organized. The returned images are ranked and organized such that semantically similar images are grouped together and given a rank based on the semantic closeness to the input query. The novelty of the system is twofold: first, images are retrieved not only based on text cues but their actual contents as well; second, the grouping
Ranking Decision Making Units with Stochastic Data by Using Coefficient of Variation
Lotfi, F.; Nematollahi, N.; Behzadi, M.H.; Mirbolouki, M.
2010-01-01
Data Envelopment Analysis (DEA) is a non-parametric technique which is based on mathematical programming for evaluating the efficiency of a set of Decision Making Units (DMUs). Throughout applications, managers encounter with stochastic data and the necessity of having a method that is able to evaluate efficiency and rank efficient units has been under consideration. In this paper considering the concept of coefficient of variation among efficient DMUs, two ranking methods has been proposed. ...
Estimation of rank correlation for clustered data.
Rosner, Bernard; Glynn, Robert J
2017-06-30
It is well known that the sample correlation coefficient (R xy ) is the maximum likelihood estimator of the Pearson correlation (ρ xy ) for independent and identically distributed (i.i.d.) bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the maximum likelihood estimator of ρ xy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U_ of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (i) converting ranks of both X and Y to the probit scale, (ii) estimating the Pearson correlation between probit scores for X and Y, and (iii) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Trachomatous Scar Ranking: A Novel Outcome for Trachoma Studies.
Baldwin, Angela; Ryner, Alexander M; Tadesse, Zerihun; Shiferaw, Ayalew; Callahan, Kelly; Fry, Dionna M; Zhou, Zhaoxia; Lietman, Thomas M; Keenan, Jeremy D
2017-06-01
AbstractWe evaluated a new trachoma scarring ranking system with potential use in clinical research. The upper right tarsal conjunctivas of 427 individuals from Ethiopian villages with hyperendemic trachoma were photographed. An expert grader first assigned a scar grade to each photograph using the 1981 World Health Organization (WHO) grading system. Then, all photographs were ranked from least (rank = 1) to most scarring (rank = 427). Photographic grading found 79 (18.5%) conjunctivae without scarring (C0), 191 (44.7%) with minimal scarring (C1), 105 (24.6%) with moderate scarring (C2), and 52 (12.2%) with severe scarring (C3). The ranking method demonstrated good internal validity, exhibiting a monotonic increase in the median rank across the levels of the 1981 WHO grading system. Intrarater repeatability was better for the ranking method (intraclass correlation coefficient = 0.84, 95% CI = 0.74-0.94). Exhibiting better internal and external validity, this ranking method may be useful for evaluating the difference in scarring between groups of individuals.
Workshop on Analytical Methods in Statistics
Jurečková, Jana; Maciak, Matúš; Pešta, Michal
2017-01-01
This volume collects authoritative contributions on analytical methods and mathematical statistics. The methods presented include resampling techniques; the minimization of divergence; estimation theory and regression, eventually under shape or other constraints or long memory; and iterative approximations when the optimal solution is difficult to achieve. It also investigates probability distributions with respect to their stability, heavy-tailness, Fisher information and other aspects, both asymptotically and non-asymptotically. The book not only presents the latest mathematical and statistical methods and their extensions, but also offers solutions to real-world problems including option pricing. The selected, peer-reviewed contributions were originally presented at the workshop on Analytical Methods in Statistics, AMISTAT 2015, held in Prague, Czech Republic, November 10-13, 2015.
Energy consumption quota of public buildings based on statistical analysis
International Nuclear Information System (INIS)
Zhao Jing; Xin Yajuan; Tong Dingding
2012-01-01
The establishment of building energy consumption quota as a comprehensive indicator used to evaluate the actual energy consumption level is an important measure for promoting the development of building energy efficiency. This paper focused on the determination method of the quota, and firstly introduced the procedure of establishing energy consumption quota of public buildings including four important parts: collecting data, classifying and calculating EUIs, standardizing EUIs, determining the measure method of central tendency. The paper also illustrated the standardization process of EUI by actual calculation based on the samples of 10 commercial buildings and 19 hotel buildings. According to the analysis of the frequency distribution of standardized EUIs of sample buildings and combining the characteristics of each measure method of central tendency, comprehensive application of mode and percentage rank is selected to be the best method for determining the energy consumption quota of public buildings. Finally the paper gave some policy proposals on energy consumption quota to help achieve the goal of further energy conservation. - Highlights: ► We introduce the procedure of determining energy consumption quota (ECQ). ► We illustrate the standardization process of EUI by actual calculation of samples. ► Measures of central tendency are brought into determine the ECQ. ► Comprehensive application of mode and percentage rank is the best method for ECQ. ► Punitive or incentive measures for ECQ are proposed.
Low-rank Quasi-Newton updates for Robust Jacobian lagging in Newton methods
International Nuclear Information System (INIS)
Brown, J.; Brune, P.
2013-01-01
Newton-Krylov methods are standard tools for solving nonlinear problems. A common approach is to 'lag' the Jacobian when assembly or preconditioner setup is computationally expensive, in exchange for some degradation in the convergence rate and robustness. We show that this degradation may be partially mitigated by using the lagged Jacobian as an initial operator in a quasi-Newton method, which applies unassembled low-rank updates to the Jacobian until the next full reassembly. We demonstrate the effectiveness of this technique on problems in glaciology and elasticity. (authors)
Using Bibliographic Knowledge for Ranking in Scientific Publication Databases
Vesely, Martin; Le Meur, Jean-Yves
2008-01-01
Document ranking for scientific publications involves a variety of specialized resources (e.g. author or citation indexes) that are usually difficult to use within standard general purpose search engines that usually operate on large-scale heterogeneous document collections for which the required specialized resources are not always available for all the documents present in the collections. Integrating such resources into specialized information retrieval engines is therefore important to cope with community-specific user expectations that strongly influence the perception of relevance within the considered community. In this perspective, this paper extends the notion of ranking with various methods exploiting different types of bibliographic knowledge that represent a crucial resource for measuring the relevance of scientific publications. In our work, we experimentally evaluated the adequacy of two such ranking methods (one based on freshness, i.e. the publication date, and the other on a novel index, the ...
Ranking spreaders by decomposing complex networks
International Nuclear Information System (INIS)
Zeng, An; Zhang, Cheng-Jun
2013-01-01
Ranking the nodes' ability of spreading in networks is crucial for designing efficient strategies to hinder spreading in the case of diseases or accelerate spreading in the case of information dissemination. In the well-known k-shell method, nodes are ranked only according to the links between the remaining nodes (residual links) while the links connecting to the removed nodes (exhausted links) are entirely ignored. In this Letter, we propose a mixed degree decomposition (MDD) procedure in which both the residual degree and the exhausted degree are considered. By simulating the epidemic spreading process on real networks, we show that the MDD method can outperform the k-shell and degree methods in ranking spreaders.
Sparse subspace clustering for data with missing entries and high-rank matrix completion.
Fan, Jicong; Chow, Tommy W S
2017-09-01
Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Comparing classical and quantum PageRanks
Loke, T.; Tang, J. W.; Rodriguez, J.; Small, M.; Wang, J. B.
2017-01-01
Following recent developments in quantum PageRanking, we present a comparative analysis of discrete-time and continuous-time quantum-walk-based PageRank algorithms. Relative to classical PageRank and to different extents, the quantum measures better highlight secondary hubs and resolve ranking degeneracy among peripheral nodes for all networks we studied in this paper. For the discrete-time case, we investigated the periodic nature of the walker's probability distribution for a wide range of networks and found that the dominant period does not grow with the size of these networks. Based on this observation, we introduce a new quantum measure using the maximum probabilities of the associated walker during the first couple of periods. This is particularly important, since it leads to a quantum PageRanking scheme that is scalable with respect to network size.
Diagnosing and Ranking Retinopathy Disease Level Using Diabetic Fundus Image Recuperation Approach
Directory of Open Access Journals (Sweden)
K. Somasundaram
2015-01-01
Full Text Available Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED and Optimally Adjusted Morphological Operator (OAMO effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time.
Application of Turchin's method of statistical regularization
Zelenyi, Mikhail; Poliakova, Mariia; Nozik, Alexander; Khudyakov, Alexey
2018-04-01
During analysis of experimental data, one usually needs to restore a signal after it has been convoluted with some kind of apparatus function. According to Hadamard's definition this problem is ill-posed and requires regularization to provide sensible results. In this article we describe an implementation of the Turchin's method of statistical regularization based on the Bayesian approach to the regularization strategy.
Method ranks competing projects by priorities, risk
International Nuclear Information System (INIS)
Moeckel, D.R.
1993-01-01
A practical, objective guide for ranking projects based on risk-based priorities has been developed by Sun Pipe Line Co. The deliberately simple system guides decisions on how to allocate scarce company resources because all managers employ the same criteria in weighing potential risks to the company versus benefits. Managers at all levels are continuously having to comply with an ever growing amount of legislative and regulatory requirements while at the same time trying to run their businesses effectively. The system primarily is designed for use as a compliance oversight and tracking process to document, categorize, and follow-up on work concerning various issues or projects. That is, the system consists of an electronic database which is updated periodically, and is used by various levels of management to monitor progress of health, safety, environmental and compliance-related projects. Criteria used in determining a risk factor and assigning a priority also have been adapted and found useful for evaluating other types of projects. The process enables management to better define potential risks and/or loss of benefits that are being accepted when a project is rejected from an immediate work plan or budget. In times of financial austerity, it is extremely important that the right decisions are made at the right time
Brisseau, Lionel; Bussières, Jean-François; Bois, Denis; Vallée, Marc; Racine, Marie-Claude; Bonnici, André
2013-02-01
To establish a consensual and coherent ranking of healthcare programmes that involve the presence of ward-based and clinic-based clinical pharmacists, based on health outcome, health costs and safe delivery of care. This descriptive study was derived from a structured dialogue (Delphi technique) among directors of pharmacy department. We established a quantitative profile of healthcare programmes at five sites that involved the provision of ward-based and clinic-based pharmaceutical care. A summary table of evidence established a unique quality rating per inpatient (clinic-based) or outpatient (ward-based) healthcare programme. Each director rated the perceived impact of pharmaceutical care per inpatient or outpatient healthcare programme on three fields: health outcome, health costs and safe delivery of care. They agreed by consensus on the final ranking of healthcare programmes. A ranking was assigned for each of the 18 healthcare programmes for outpatient care and the 17 healthcare programmes for inpatient care involving the presence of pharmacists, based on health outcome, health costs and safe delivery of care. There was a good correlation between ranking based on data from a 2007-2008 Canadian report on hospital pharmacy practice and the ranking proposed by directors of pharmacy department. Given the often limited human and financial resources, managers should consider the best evidence available on a profession's impact to plan healthcare services within an organization. Data are few on ranking healthcare programmes in order to prioritize which healthcare programme would mostly benefit from the delivery of pharmaceutical care by ward-based and clinic-based pharmacists. © 2012 The Authors. IJPP © 2012 Royal Pharmaceutical Society.
Statistical methods in personality assessment research.
Schinka, J A; LaLone, L; Broeckel, J A
1997-06-01
Emerging models of personality structure and advances in the measurement of personality and psychopathology suggest that research in personality and personality assessment has entered a stage of advanced development, in this article we examine whether researchers in these areas have taken advantage of new and evolving statistical procedures. We conducted a review of articles published in the Journal of Personality, Assessment during the past 5 years. Of the 449 articles that included some form of data analysis, 12.7% used only descriptive statistics, most employed only univariate statistics, and fewer than 10% used multivariate methods of data analysis. We discuss the cost of using limited statistical methods, the possible reasons for the apparent reluctance to employ advanced statistical procedures, and potential solutions to this technical shortcoming.
Statistical Methods in Psychology Journals.
Willkinson, Leland
1999-01-01
Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)
The statistical process control methods - SPC
Directory of Open Access Journals (Sweden)
Floreková Ľubica
1998-03-01
Full Text Available Methods of statistical evaluation of quality SPC (item 20 of the documentation system of quality control of ISO norm, series 900 of various processes, products and services belong amongst basic qualitative methods that enable us to analyse and compare data pertaining to various quantitative parameters. Also they enable, based on the latter, to propose suitable interventions with the aim of improving these processes, products and services. Theoretical basis and applicatibily of the principles of the: - diagnostics of a cause and effects, - Paret analysis and Lorentz curve, - number distribution and frequency curves of random variable distribution, - Shewhart regulation charts, are presented in the contribution.
Color correction with blind image restoration based on multiple images using a low-rank model
Li, Dong; Xie, Xudong; Lam, Kin-Man
2014-03-01
We present a method that can handle the color correction of multiple photographs with blind image restoration simultaneously and automatically. We prove that the local colors of a set of images of the same scene exhibit the low-rank property locally both before and after a color-correction operation. This property allows us to correct all kinds of errors in an image under a low-rank matrix model without particular priors or assumptions. The possible errors may be caused by changes of viewpoint, large illumination variations, gross pixel corruptions, partial occlusions, etc. Furthermore, a new iterative soft-segmentation method is proposed for local color transfer using color influence maps. Due to the fact that the correct color information and the spatial information of images can be recovered using the low-rank model, more precise color correction and many other image-restoration tasks-including image denoising, image deblurring, and gray-scale image colorizing-can be performed simultaneously. Experiments have verified that our method can achieve consistent and promising results on uncontrolled real photographs acquired from the Internet and that it outperforms current state-of-the-art methods.
A scoring mechanism for the rank aggregation of network robustness
Yazdani, Alireza; Dueñas-Osorio, Leonardo; Li, Qilin
2013-10-01
To date, a number of metrics have been proposed to quantify inherent robustness of network topology against failures. However, each single metric usually only offers a limited view of network vulnerability to different types of random failures and targeted attacks. When applied to certain network configurations, different metrics rank network topology robustness in different orders which is rather inconsistent, and no single metric fully characterizes network robustness against different modes of failure. To overcome such inconsistency, this work proposes a multi-metric approach as the basis of evaluating aggregate ranking of network topology robustness. This is based on simultaneous utilization of a minimal set of distinct robustness metrics that are standardized so to give way to a direct comparison of vulnerability across networks with different sizes and configurations, hence leading to an initial scoring of inherent topology robustness. Subsequently, based on the inputs of initial scoring a rank aggregation method is employed to allocate an overall ranking of robustness to each network topology. A discussion is presented in support of the presented multi-metric approach and its applications to more realistically assess and rank network topology robustness.
Statistical methods for quality improvement
National Research Council Canada - National Science Library
Ryan, Thomas P
2011-01-01
...."-TechnometricsThis new edition continues to provide the most current, proven statistical methods for quality control and quality improvementThe use of quantitative methods offers numerous benefits...
Sparse Contextual Activation for Efficient Visual Re-Ranking.
Bai, Song; Bai, Xiang
2016-03-01
In this paper, we propose an extremely efficient algorithm for visual re-ranking. By considering the original pairwise distance in the contextual space, we develop a feature vector called sparse contextual activation (SCA) that encodes the local distribution of an image. Hence, re-ranking task can be simply accomplished by vector comparison under the generalized Jaccard metric, which has its theoretical meaning in the fuzzy set theory. In order to improve the time efficiency of re-ranking procedure, inverted index is successfully introduced to speed up the computation of generalized Jaccard metric. As a result, the average time cost of re-ranking for a certain query can be controlled within 1 ms. Furthermore, inspired by query expansion, we also develop an additional method called local consistency enhancement on the proposed SCA to improve the retrieval performance in an unsupervised manner. On the other hand, the retrieval performance using a single feature may not be satisfactory enough, which inspires us to fuse multiple complementary features for accurate retrieval. Based on SCA, a robust feature fusion algorithm is exploited that also preserves the characteristic of high time efficiency. We assess our proposed method in various visual re-ranking tasks. Experimental results on Princeton shape benchmark (3D object), WM-SRHEC07 (3D competition), YAEL data set B (face), MPEG-7 data set (shape), and Ukbench data set (image) manifest the effectiveness and efficiency of SCA.
Wang, Ling; Xia, Jie-lai; Yu, Li-li; Li, Chan-juan; Wang, Su-zhen
2008-06-01
To explore several numerical methods of ordinal variable in one-way ordinal contingency table and their interrelationship, and to compare corresponding statistical analysis methods such as Ridit analysis and rank sum test. Formula deduction was based on five simplified grading approaches including rank_r(i), ridit_r(i), ridit_r(ci), ridit_r(mi), and table scores. Practical data set was verified by SAS8.2 in clinical practice (to test the effect of Shiwei solution in treatment for chronic tracheitis). Because of the linear relationship of rank_r(i) = N ridit_r(i) + 1/2 = N ridit_r(ci) = (N + 1) ridit_r(mi), the exact chi2 values in Ridit analysis based on ridit_r(i), ridit_r(ci), and ridit_r(mi), were completely the same, and they were equivalent to the Kruskal-Wallis H test. Traditional Ridit analysis was based on ridit_r(i), and its corresponding chi2 value calculated with an approximate variance (1/12) was conservative. The exact chi2 test of Ridit analysis should be used when comparing multiple groups in the clinical researches because o